On Monday, 24 February 2014 at 02:05:31 UTC, Walter Bright wrote:
1. It provides information to the compiler about runtime
frequency that it cannot obtain otherwise. This is very useful
information for generating better code.
2. Making it a hard requirement then means the user will have
to put versioning in it. It becomes inherently non-portable.
There is no way to predict what some other version of some
other compiler on some other system will do.
I'm not sure what it is impossible to inline in some case, I've
never hit that limitation with ICC.
Like others I would like unconditional and explicit optimization
from the compiler.
3. In the end, the compiler should make the decision. Inlining
does not always result in faster code, as I pointed out in
another post.
Also when I use "force inline" it's very often to force
"not-inline" to reuse the same bit of code while the compiler
would have inlined it.
Each optimization here is taken a repeatable automated A-B test
with a 95% statistical significance on various inputs, and
forcing inline/not-inline has been an effective tool to reduce
the I-cache stress that plagues some very particular program
areas that the compiler doesn't differentiate. This can be
checked by looking at assembly or binary size afterwards.
I'm perfectly OK with the compiler doing what he wants when I
don't tell it to inline or not. AFAIK the C/C++ inline keyword is
mostly ignored by optimizing compilers, it's precisely a keyword
that is both overused and meaningless.
Perhaps the lesson is the word 'inline' carries certain
expectations with it, and the feature would be better
positioned as something like:
pragma(usage, often);
pragma(usage, rare);
To me it's not so much about usage frequency that about I-cache
misses. Some inlining can be nearly free (I-cache working set
small), or very costly (I-cache actively being the bottleneck
through repeated miss due to large working set).