http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

--- Comment #7 from Jan Hubicka <hubicka at ucw dot cz> 2011-05-27 18:47:56 UTC 
---
> 
> We used to play with inlining limits (gcc had some really bad decisions), but
> the meaning of the numbers kept changing from one gcc version to another, and
> the heuristics gcc used kept changing too. Which made it practically 
> impossible
> to use sanely - you could tweak it for one particular architecture, and one
> particular version of gcc, but it would then be worse for others.

Well, the --param limits are really meant to be more for GCC development than
for being adjusted to random values by random codebases, so this won't really
help indeed. The heuristics needs to keep envolving since the expectations
on it are changing,too (i.e. 10 years ago people didn't really cared about sane
boost/pooma performance and just year ago no one really loaded 200000 functions
into compiler at once as we do now with LTO on mozilla).

However concerning the stack growth, this was implemented on request of Andi
Kleen for the kernel, since the stack size constraints are special. There
is -fconserve-stack argument to avoid need to use magic --param values (that
are not that magic for stack usage after all given that they count bytes).
> 
> So I'd much rather have gcc have good heuristics by default, possibly helped 
> by
> the kinds of obvious hints we can give ("unlikely()" in particular is 
> something
> we can add for things like this).

Agreed.  In cases like this you might find handy the cold attribute that has
the effect of making function optimized for size as well as telling branch
predictor to predicts calls leading to call of cold function as unlikely.
It would save need sfor some of the unlikely() calls.
> 
> Yes, early stack estimation might not work all that well.

Yep, one would have to do something like computing the number of live
values at given place of function that is doable but moderately expensive
for such a special purpose situation.
> 
> That said, this is something where the call-site really can make a big
> difference. Not just the fact that the call site might be marked "unlikely()"
> (again, that's just the kernel making __builtin_expect() readable), but things
> like "none of the arguments are constants" could easily be a good heuristic to
> use as a basis for whether to inline or not.
> 
> IOW, start out with whatever 'large-stack-frame-growth' and
> 'large-function-growth' values, but if the call-site is in an unlikely region,
> cut those values in half (or whatever). And if none of the arguments are
> constants, cut it in half again.
> 
> This is an example of why giving these limits as compiler options really
> doesn't work: the choice should probably be much more dynamic than just a
> single number.
> 
> I dunno. As mentioned, we can fix this problem by just marking things noinline
> by hand. But I do think that there are fairly obvious cases where inlining
> really isn't worth it, and gcc might as well just get those cases right.

Well, this model won't work as suggested. The problem is that in GCC inliner
simplistic POV inlining is always a win, unless it hits the large
function/stack/unit bounds. Only tradeoff it understand is the code size
growth.

Here inlining causes regalloc to produce bigger stack frame because it is
stupid and doesn't know how to do shrink wraping.  (Bernd recently has posted
patches for this, I duno about their status and if they will help here) This is
important because the hot path through the outer function is extremely short
and the outer function is simple so it won't need the registers otherwise.

The large function/unit limits don't really worry about actual code quality,
just the fact that we don't want non-linear algorithms in the compiler to
become too prominent.  So the starting values are high for this purpose. Large
function is currently set to amusing constant of 2700insns. Dividing by 4 won't
do much help. The real reason is that we are really mixing two concepts
together (compiler nonlinearity and code quality considerations). This is not
good idea.

Some years ago I introduced the notion of hot and cold basic blocks to the
GCC inliner and told it to not inline functions into cold basic blocks unless
caller size (or overall program size) is expected to shrink. This has also
introduced number of regression I had to get through.  (think of not inlining
destructor of object in EH handling code that prevents the object from being
scalar replaced and optimized away).

At the moment I can come up with the following suggestions:

1) inline functions called once from cold basic blocks only when they are small
so caller size will shring (like inlining of small functions does)
2) introduce new function body size limit used only for cold functions called
once
3) try to somehow get very rough stack frame estimate into our current stack
frame growth limits.

I guess I will try 1) and see how it affects other code and if it is not
too bad we can stay with it.
I think 3) won't work in practice as stack growth limits are too large by
default and we really worry about cost of prologue rather than cost of stack
frame. 2) might be alternative if everything fails, its negative size is need
for another parameter and we already have too many.

In any case thanks for analysis and PR. I worried about this scenario at a time
inlining functions called once was introduced (about GCC 3.4 I think)
but since I did not find any benchmark that would regress because of this I
decided to worry about something else.  Actually I think this is the first PR
related to the topic of stack grow that is rather surprising. (we already
solved
problem with the outrageous stack frame growth that hit glibc and fact that
inlining sometimes makes us to mispredict hot part of program as cold because
there is large loop nest somewhere else that hit some fortran benchmarks)

Honza

Reply via email to