On Mon, Oct 20, 2014 at 1:32 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li <davi...@google.com> 
> wrote:
>> On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li <davi...@google.com> 
>> wrote:
>>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>> The difference in instrumentation runtime is huge -- as topn profiler
>>>>> is pretty expensive to run.
>>>>>
>>>>> With FDO, it is probably better to make early inlining more aggressive
>>>>> in order to get more context sensitive profiling.
>>>>
>>>> I agree with that, I just would like to understand where increasing the 
>>>> iterations
>>>> helps and if we can handle it without iterating (because Richi originally 
>>>> requested to
>>>> drop the iteration for correcness issues)
>
> Well, I requested to do any iteration with an IPA view in mind.  That is,
> iterate for cgraph cycles for example where currently we face the situation
> that at least one function is inlined unoptimized.  For this we'd like to
> first optimize without inlining (well, maybe inlining doesn't hurt)

yes -- inlining decision made without callee cleanup is more
conservative and should not hurt.

>and then
> inline (and re-optimize if we inlined).
>
> Indirect edges are more interesting, but basically you'd want to re-inline
> once you discover new direct calls during early opts (but then make
> sure to do that only after the direct callee was early-optimized first).
>

It would be interesting to inline the newly introduced direct calls if
the callsites also have function pointer arguments that are known in
the call context.

> Thus it would be nice if somebody could improve on the currently very
> simple function ordering we apply early opts, integrating "iteration"
> in a better way (not iterating over all functions but only where it
> might make a difference, focused on inlining).
>
>>>> Do you have some examples?
>>>
>>> We can do FDO experiment by shutting down einline. (Note that
>>> increasing iteration to 2 did not actually improve performance with
>>> our benchmarks).
>>
>> Early inlining itself has large performance impact for FDO (the
>> runtime of the profile-use build). With it disabled, the FDO
>> performance drops by >2% on average. The degradation is seen across
>> all benchmarks except for one.
>
> Only 2%?  You are lucky ;)

2% average is considered pretty significant for optimized build
runtime performance.


> For tramp3d introducing early inlining
> made a difference of 100000% ;)  (yes, statistically for tramp3d
> we have for each assembler instruction generated 100 calls in the
> initial code ... wheee C++ template metaprogramming!)

Is this 100000% difference from instrumentation build or optimized
build runtime?

>
> So indeed early inlining was absoultely required to make FDO usable at all.

thanks,

David
>
> Richard.
>
>> David
>>
>>
>>>
>>> David
>>>
>>>> Honza
>>>>>
>>>>> David
>>>>>
>>>>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>> >> Increasing the number of early inliner iterations from 1 to 2 enables 
>>>>> >> more
>>>>> >> indirect calls to be promoted/inlined before instrumentation. This in 
>>>>> >> turn
>>>>> >> reduces the instrumentation overhead, particularly for more expensive 
>>>>> >> indirect
>>>>> >> call topn profiling.
>>>>> >
>>>>> > How much difference you get here? One posibility would be also to run 
>>>>> > specialized
>>>>> > ipa-cp before profile instrumentation.
>>>>> >
>>>>> > Honza
>>>>> >>
>>>>> >> Passes internal testing and regression tests. Ok for google/4_9?
>>>>> >>
>>>>> >> 2014-10-18  Teresa Johnson  <tejohn...@google.com>
>>>>> >>
>>>>> >>         Google ref b/17934523
>>>>> >>         * opts.c (finish_options): Increase 
>>>>> >> max-early-inliner-iterations to 2
>>>>> >>         for profile-gen and profile-use builds.
>>>>> >>
>>>>> >> Index: opts.c
>>>>> >> ===================================================================
>>>>> >> --- opts.c      (revision 216286)
>>>>> >> +++ opts.c      (working copy)
>>>>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g
>>>>> >>          opts->x_param_values, opts_set->x_param_values);
>>>>> >>      }
>>>>> >>
>>>>> >> +  if (opts->x_profile_arc_flag
>>>>> >> +      || opts->x_flag_branch_probabilities)
>>>>> >> +    {
>>>>> >> +      maybe_set_param_value
>>>>> >> +       (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
>>>>> >> +        opts->x_param_values, opts_set->x_param_values);
>>>>> >> +    }
>>>>> >> +
>>>>> >>    if (!(opts->x_flag_auto_profile
>>>>> >>          || (opts->x_profile_arc_flag || 
>>>>> >> opts->x_flag_branch_probabilities)))
>>>>> >>      {
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 
>>>>> >> 408-460-2413

Reply via email to