On Tue, Oct 18, 2011 at 1:45 AM, Maxim Kuvyrkov <ma...@codesourcery.com> wrote:
> On 13/10/2011, at 12:58 AM, Richard Guenther wrote:
>
>> On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov <ma...@codesourcery.com> 
>> wrote:
>>> The following patch adds new knob to make GCC perform several iterations of 
>>> early optimizations and inlining.
>>>
>>> This is for dont-care-about-compile-time-optimize-all-you-can scenarios.  
>>> Performing several iterations of optimizations does significantly improve 
>>> code speed on a certain proprietary source base.  Some hand-tuning of the 
>>> parameter value is required to get optimum performance.  Another good use 
>>> for this option is for search and ad-hoc analysis of cases where GCC misses 
>>> optimization opportunities.
>>>
>>> With the default setting of '1', nothing is changed from the current status 
>>> quo.
>>>
>>> The patch was bootstrapped and regtested with 3 iterations set by default 
>>> on i686-linux-gnu.  The only failures in regression testsuite were due to 
>>> latent bugs in handling of EH information, which are being discussed in a 
>>> different thread.
>>>
>>> Performance impact on the standard benchmarks is not conclusive, there are 
>>> improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*].  
>>> SPEC2006 benchmarks will take another day or two to complete and I will 
>>> update the spreadsheet then.  The benchmarks were run on a Core2 system for 
>>> all combinations of {-m32/-m64}{-O2/-O3}.
>>>
>>> Effect on compilation time is fairly predictable, about 10% compile time 
>>> increase with 3 iterations.
>>>
>>> OK for trunk?
>>
>> I don't think this is a good idea, especially in the form you implemented it.
>>
>> If we'd want to iterate early optimizations we'd want to do it by iterating
>> an IPA pass so that we benefit from more precise size estimates
>> when trying to inline a function the second time.
>
> Could you elaborate on this a bit?  Early optimizations are gimple passes, so 
> I'm missing your point here.

pass_early_local_passes is an IPA pass, you want to iterate
fn1, fn2, fn1, fn2, ..., not fn1, fn1 ..., fn2, fn2 ... precisely for better
inlining.  Thus you need to split pass_early_local_passes into pieces
so you can iterate one of the IPA pieces.

>> Also statically
>> scheduling the passes will mess up dump files and you have no
>> chance of say, noticing that nothing changed for function f and its
>> callees in iteration N and thus you can skip processing them in
>> iteration N + 1.
>
> Yes, these are the shortcomings.  The dump files name changes can be fixed, 
> e.g., by adding a suffix to the passes on iterations after the first one.  
> The analysis to avoid unnecessary iterations is more complex problem.

Sure.  I analyzed early passes by manually duplicating them and
test that they do nothing for tramp3d, which they pretty much all did
at some point.

>>
>> So, at least you should split the pass_early_local_passes IPA pass
>> into three, you'd iterate over the 2nd (definitely not over 
>> pass_split_functions
>> though), the third would be pass_profile and pass_split_functions only.
>> And you'd iterate from the place the 2nd IPA pass is executed, not
>> by scheduling them N times.
>
> OK, I will look into this.
>
>>
>> Then you'd have to analyze the compile-time impact of the IPA
>> splitting on its own when not iterating.  Then you should look
>> at what actually was the optimizations that were performed
>> that lead to the improvement (I can see some indirect inlining
>> happening, but everything else would be a bug in present
>> optimizers in the early pipeline - they are all designed to be
>> roughly independent on each other and _not_ expose new
>> opportunities by iteration).  Thus - testcases?
>
> The initial motivation for the patch was to enable more indirect inlining and 
> devirtualization opportunities.

Hm.

> Since then I found the patch to be helpful in searching for optimization 
> opportunities and bugs.  E.g., SPEC2006's 471.omnetpp drops 20% with 2 
> additional iterations of early optimizations [*].  Given that applying more 
> optimizations should, theoretically, not decrease performance, there is 
> likely a very real bug or deficiency behind that.

It is likely early SRA that messes up, or maybe convert switch.  Early
passes should be really restricted to always profitable cleanups.

Your experiment looks useful to track down these bugs, but in general
I don't think we want to expose iterating early passes.

Richard.

> Thank you,
>
> [*] 
> https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFE&hl=en_US
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>
>
>

Reply via email to