> Quoting Richard Guenther <richard.guent...@gmail.com>: > >> That is, we no longer optimistically assume that comdat functions >> can be eliminated if there are no callers in the local TU in 4.5 >> (but we did in previous releases). > > But if the function is very simple, the only reason to keep it would be > if its address was taken somewhere, or if we tailcall it.
Since there seems to be bit confussion, perhaps it would make sense to summarize how the whole process works. Inliner estimates whole program size change by inlining all invocations of each function (overall_growth) and inline all functions that results in expected shrinking. The process is as follows. In inline_param3 dump we get estimates for every statement time and size: Analyzing function body size: Container::Container() freq: 1000 size: 3 time: 12 D.2145_1 = operator new (4); freq: 1000 size: 1 time: 1 MEM[(struct Container *)this_3(D)].member = D.2145_1; Likely eliminated freq: 1000 size: 0 time: 0 return; Likely eliminated Overall function body time: 13-1 size: 4-1 So in our simplified vision, Container() function will occupy 4 units of size and execute for 13 units of time (not completely related to real bytes of cycles, since our IL is too highlevel at this point). Some statements are assumed to go away after inlining. This is the case of memory store that we expect will somehow get combined after inlining. This is just a guess that attempts to convince inliner to get rid of more C++ abstraction penalty and allow more scalar replacement. So We believe that by inlining function we save the store to .member field. Next the function call overhead of the function is accounted (since inlining removes one call) and we get: With function call overhead time: 13-12 size: 4-3 So inliner thinks that by inlining we save 12 units of execution time and we increase code size by 1 unit (4-3). The overall time (13) is not really used. The one extra byte is for passing value of 4 into new() call. When inlining for size (that happens for all calls considered to be hot that is just all calls at -Os), the heruistc actually compute estimate program size change and inline function when inlining it to specific caller reduce code size. This never happens for Container() because code of caller grows (it needs to pass extra value of 4). Next we try to see if inlining into all callers would reduce program size by eliminating the offline copy. This would hit for Container if it was static because it is called just once and the growth in caller by 1 byte is smaller than the overall size of Container(). Because Continer is COMDAT, we don't do that so we never inline it. This is seen later in .inline dump: Considering Container::Container() with 4 size to be inlined into int gimme() in t.C:26 Estimated growth after inlined into all callees is +1 insns. Estimated badness is 2, frequency 1.00. inline_failed:call is unlikely and code size would grow. The behaviour change is about COMDAT functions that are larger than call overhead but either called just once or small enough so code growth caused by inlining is smaller than the function body size itself. In these cases we made the assumption that overall program size will change and inlined in previous GCC releases. This asusmption is not correct (it is correct for static functions and also for size of .o file, but not for whole binary) and the problem can be demonstrated by making very large comdat function that is used once in very many units. Thus I've changed the behaviour in GCC 4.5 since it is more safe. So to get around one need either -fwhole-program, or use always_inline attribute, or if the actual size of .o file shrinks after inlining because of other optimizations we can see if we can extend heruistics to forecast this and account in inlining decisions. The last alternative is what I would be happy to look into, but in this testcase we don't get any simplification, so local behaviour of inliner is correct. I guess we might experiment with allowing some very limited code size growth for inlining COMDAT functions if this turns out to be real problem. ALso we might add some biass into the logic accounting removal of offline copy: obviously offline copy is little bit bigger than the instructions themselves having prologue/epilogue and alignment. This would help static functions, but accounting this realistically is tricky becuase the cost are architecture dependent. I might also make patch for you to revert this behaviour. However it would be interesting to have -finline-limit testcase. It is bit surprising this changes behaviour for you: -finline-limit is now obsolette way of controlling inline-insns-single and inline-insns-auto parameters. Setting it to 50 has the effect of reducing them to 25. (from 400 and 50 respectively). Those limits limit function size of functions to be considered as inlining candidates. At -Os you should always get inlining trottled down by the above logic computing effect on overall code size, so only effect it could have is to prevent relatively huge functions to be inline candidates. This should not have effect in theory, since those functions are going to be inlined at -Os only if they are called once (and this is independent of those argumetns) or if the "Likely eliminated" gets completely crazy and estimates the function to basically disappear. So perhaps we are actually seeing too much of inlining of functions that are not really worthwhile? Thanks for the effort! Honza