Re: Crucial C++ inlining broken under -Os

Jan Hubicka Fri, 02 Jul 2010 10:37:54 -0700

> Quoting Richard Guenther <richard.guent...@gmail.com>:
>
>> That is, we no longer optimistically assume that comdat functions
>> can be eliminated if there are no callers in the local TU in 4.5
>> (but we did in previous releases).
>
> But if the function is very simple, the only reason to keep it would be
> if its address was taken somewhere, or if we tailcall it.


Since there seems to be bit confussion, perhaps it would make sense to summarize
how the whole process works.

Inliner estimates whole program size change by inlining all invocations of each
function (overall_growth) and inline all functions that results in expected
shrinking.

The process is as follows.  In inline_param3 dump we get estimates for every
statement time and size:

  Analyzing function body size: Container::Container()
    freq:  1000 size:  3 time: 12 D.2145_1 = operator new (4);
    freq:  1000 size:  1 time:  1 MEM[(struct Container *)this_3(D)].member = 
D.2145_1;
      Likely eliminated
    freq:  1000 size:  0 time:  0 return;
      Likely eliminated
  Overall function body time: 13-1 size: 4-1

So in our simplified vision, Container() function will occupy 4 units of size
and execute for 13 units of time (not completely related to real bytes of
cycles, since our IL is too highlevel at this point).

Some statements are assumed to go away after inlining.  This is the case of
memory store that we expect will somehow get combined after inlining. This is
just a guess that attempts to convince inliner to get rid of more C++
abstraction penalty and allow more scalar replacement.  So We believe that by
inlining function we save the store to .member field.

Next the function call overhead of the function is accounted (since inlining
removes one call) and we get:

  With function call overhead time: 13-12 size: 4-3

So inliner thinks that by inlining we save 12 units of execution time
and we increase code size by 1 unit (4-3).  The overall time (13) is not
really used.

The one extra byte is for passing value of 4 into new() call.

When inlining for size (that happens for all calls considered to be hot that is
just all calls at -Os), the heruistc actually compute estimate program size
change and inline function when inlining it to specific caller reduce code
size.  This never happens for Container() because code of caller grows (it
needs to pass extra value of 4).

Next we try to see if inlining into all callers would reduce program size by
eliminating the offline copy.  This would hit for Container if it was static
because it is called just once and the growth in caller by 1 byte is smaller
than the overall size of Container().  Because Continer is COMDAT, we don't do
that so we never inline it.  This is seen later in .inline dump:

Considering Container::Container() with 4 size
 to be inlined into int gimme() in t.C:26
 Estimated growth after inlined into all callees is +1 insns.
 Estimated badness is 2, frequency 1.00.
 inline_failed:call is unlikely and code size would grow.

The behaviour change is about COMDAT functions that are larger than call
overhead but either called just once or small enough so code growth caused
by inlining is smaller than the function body size itself.  In these cases
we made the assumption that overall program size will change and inlined
in previous GCC releases.

This asusmption is not correct (it is correct for static functions and also for
size of .o file, but not for whole binary) and the problem can be demonstrated
by making very large comdat function that is used once in very many units.
Thus I've changed the behaviour in GCC 4.5 since it is more safe.

So to get around one need either -fwhole-program, or use always_inline 
attribute,
or if the actual size of .o file shrinks after inlining because of other 
optimizations
we can see if we can extend heruistics to forecast this and account in inlining 
decisions.

The last alternative is what I would be happy to look into, but in this testcase
we don't get any simplification, so local behaviour of inliner is correct.

I guess we might experiment with allowing some very limited code size growth
for inlining COMDAT functions if this turns out to be real problem. ALso we 
might add
some biass into the logic accounting removal of offline copy: obviously offline
copy is little bit bigger than the instructions themselves having 
prologue/epilogue
and alignment. This would help static functions, but accounting this 
realistically
is tricky becuase the cost are architecture dependent.

I might also make patch for you to revert this behaviour.  However it would be
interesting to have -finline-limit testcase.  It is bit surprising this changes
behaviour for you: -finline-limit is now obsolette way of controlling
inline-insns-single and inline-insns-auto parameters.  Setting it to 50 has the
effect of reducing them to 25. (from 400 and 50 respectively).

Those limits limit function size of functions to be considered as inlining
candidates.  At -Os you should always get inlining trottled down by the above
logic computing effect on overall code size, so only effect it could have is to
prevent relatively huge functions to be inline candidates.

This should not have effect in theory, since those functions are going to be
inlined at -Os only if they are called once (and this is independent of those
argumetns) or if the "Likely eliminated" gets completely crazy and estimates
the function to basically disappear.

So perhaps we are actually seeing too much of inlining of functions that are
not really worthwhile?

Thanks for the effort!
Honza

Re: Crucial C++ inlining broken under -Os

Reply via email to