On Mon, Oct 20, 2014 at 5:53 PM, Xinliang David Li <davi...@google.com> wrote:
> On Mon, Oct 20, 2014 at 1:32 AM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li <davi...@google.com> 
>> wrote:
>>> On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li <davi...@google.com> 
>>> wrote:
>>>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>>> The difference in instrumentation runtime is huge -- as topn profiler
>>>>>> is pretty expensive to run.
>>>>>>
>>>>>> With FDO, it is probably better to make early inlining more aggressive
>>>>>> in order to get more context sensitive profiling.
>>>>>
>>>>> I agree with that, I just would like to understand where increasing the 
>>>>> iterations
>>>>> helps and if we can handle it without iterating (because Richi originally 
>>>>> requested to
>>>>> drop the iteration for correcness issues)
>>
>> Well, I requested to do any iteration with an IPA view in mind.  That is,
>> iterate for cgraph cycles for example where currently we face the situation
>> that at least one function is inlined unoptimized.  For this we'd like to
>> first optimize without inlining (well, maybe inlining doesn't hurt)
>
> yes -- inlining decision made without callee cleanup is more
> conservative and should not hurt.
>
>>and then
>> inline (and re-optimize if we inlined).
>>
>> Indirect edges are more interesting, but basically you'd want to re-inline
>> once you discover new direct calls during early opts (but then make
>> sure to do that only after the direct callee was early-optimized first).
>>
>
> It would be interesting to inline the newly introduced direct calls if
> the callsites also have function pointer arguments that are known in
> the call context.
>
>> Thus it would be nice if somebody could improve on the currently very
>> simple function ordering we apply early opts, integrating "iteration"
>> in a better way (not iterating over all functions but only where it
>> might make a difference, focused on inlining).
>>
>>>>> Do you have some examples?
>>>>
>>>> We can do FDO experiment by shutting down einline. (Note that
>>>> increasing iteration to 2 did not actually improve performance with
>>>> our benchmarks).
>>>
>>> Early inlining itself has large performance impact for FDO (the
>>> runtime of the profile-use build). With it disabled, the FDO
>>> performance drops by >2% on average. The degradation is seen across
>>> all benchmarks except for one.
>>
>> Only 2%?  You are lucky ;)
>
> 2% average is considered pretty significant for optimized build
> runtime performance.
>
>
>> For tramp3d introducing early inlining
>> made a difference of 100000% ;)  (yes, statistically for tramp3d
>> we have for each assembler instruction generated 100 calls in the
>> initial code ... wheee C++ template metaprogramming!)
>
> Is this 100000% difference from instrumentation build or optimized
> build runtime?

It's from instrumentation build.  I don't remember any numbers for the
improvement on optimized build with FDO vs. non-FDO.

Richard.

>>
>> So indeed early inlining was absoultely required to make FDO usable at all.
>
> thanks,
>
> David
>>
>> Richard.
>>
>>> David
>>>
>>>
>>>>
>>>> David
>>>>
>>>>> Honza
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>>> >> Increasing the number of early inliner iterations from 1 to 2 enables 
>>>>>> >> more
>>>>>> >> indirect calls to be promoted/inlined before instrumentation. This in 
>>>>>> >> turn
>>>>>> >> reduces the instrumentation overhead, particularly for more expensive 
>>>>>> >> indirect
>>>>>> >> call topn profiling.
>>>>>> >
>>>>>> > How much difference you get here? One posibility would be also to run 
>>>>>> > specialized
>>>>>> > ipa-cp before profile instrumentation.
>>>>>> >
>>>>>> > Honza
>>>>>> >>
>>>>>> >> Passes internal testing and regression tests. Ok for google/4_9?
>>>>>> >>
>>>>>> >> 2014-10-18  Teresa Johnson  <tejohn...@google.com>
>>>>>> >>
>>>>>> >>         Google ref b/17934523
>>>>>> >>         * opts.c (finish_options): Increase 
>>>>>> >> max-early-inliner-iterations to 2
>>>>>> >>         for profile-gen and profile-use builds.
>>>>>> >>
>>>>>> >> Index: opts.c
>>>>>> >> ===================================================================
>>>>>> >> --- opts.c      (revision 216286)
>>>>>> >> +++ opts.c      (working copy)
>>>>>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct 
>>>>>> >> g
>>>>>> >>          opts->x_param_values, opts_set->x_param_values);
>>>>>> >>      }
>>>>>> >>
>>>>>> >> +  if (opts->x_profile_arc_flag
>>>>>> >> +      || opts->x_flag_branch_probabilities)
>>>>>> >> +    {
>>>>>> >> +      maybe_set_param_value
>>>>>> >> +       (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2,
>>>>>> >> +        opts->x_param_values, opts_set->x_param_values);
>>>>>> >> +    }
>>>>>> >> +
>>>>>> >>    if (!(opts->x_flag_auto_profile
>>>>>> >>          || (opts->x_profile_arc_flag || 
>>>>>> >> opts->x_flag_branch_probabilities)))
>>>>>> >>      {
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 
>>>>>> >> 408-460-2413

Reply via email to