On Mon, Oct 20, 2014 at 1:32 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li <davi...@google.com> > wrote: >> On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li <davi...@google.com> >> wrote: >>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>> The difference in instrumentation runtime is huge -- as topn profiler >>>>> is pretty expensive to run. >>>>> >>>>> With FDO, it is probably better to make early inlining more aggressive >>>>> in order to get more context sensitive profiling. >>>> >>>> I agree with that, I just would like to understand where increasing the >>>> iterations >>>> helps and if we can handle it without iterating (because Richi originally >>>> requested to >>>> drop the iteration for correcness issues) > > Well, I requested to do any iteration with an IPA view in mind. That is, > iterate for cgraph cycles for example where currently we face the situation > that at least one function is inlined unoptimized. For this we'd like to > first optimize without inlining (well, maybe inlining doesn't hurt)
yes -- inlining decision made without callee cleanup is more conservative and should not hurt. >and then > inline (and re-optimize if we inlined). > > Indirect edges are more interesting, but basically you'd want to re-inline > once you discover new direct calls during early opts (but then make > sure to do that only after the direct callee was early-optimized first). > It would be interesting to inline the newly introduced direct calls if the callsites also have function pointer arguments that are known in the call context. > Thus it would be nice if somebody could improve on the currently very > simple function ordering we apply early opts, integrating "iteration" > in a better way (not iterating over all functions but only where it > might make a difference, focused on inlining). > >>>> Do you have some examples? >>> >>> We can do FDO experiment by shutting down einline. (Note that >>> increasing iteration to 2 did not actually improve performance with >>> our benchmarks). >> >> Early inlining itself has large performance impact for FDO (the >> runtime of the profile-use build). With it disabled, the FDO >> performance drops by >2% on average. The degradation is seen across >> all benchmarks except for one. > > Only 2%? You are lucky ;) 2% average is considered pretty significant for optimized build runtime performance. > For tramp3d introducing early inlining > made a difference of 100000% ;) (yes, statistically for tramp3d > we have for each assembler instruction generated 100 calls in the > initial code ... wheee C++ template metaprogramming!) Is this 100000% difference from instrumentation build or optimized build runtime? > > So indeed early inlining was absoultely required to make FDO usable at all. thanks, David > > Richard. > >> David >> >> >>> >>> David >>> >>>> Honza >>>>> >>>>> David >>>>> >>>>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>> >> Increasing the number of early inliner iterations from 1 to 2 enables >>>>> >> more >>>>> >> indirect calls to be promoted/inlined before instrumentation. This in >>>>> >> turn >>>>> >> reduces the instrumentation overhead, particularly for more expensive >>>>> >> indirect >>>>> >> call topn profiling. >>>>> > >>>>> > How much difference you get here? One posibility would be also to run >>>>> > specialized >>>>> > ipa-cp before profile instrumentation. >>>>> > >>>>> > Honza >>>>> >> >>>>> >> Passes internal testing and regression tests. Ok for google/4_9? >>>>> >> >>>>> >> 2014-10-18 Teresa Johnson <tejohn...@google.com> >>>>> >> >>>>> >> Google ref b/17934523 >>>>> >> * opts.c (finish_options): Increase >>>>> >> max-early-inliner-iterations to 2 >>>>> >> for profile-gen and profile-use builds. >>>>> >> >>>>> >> Index: opts.c >>>>> >> =================================================================== >>>>> >> --- opts.c (revision 216286) >>>>> >> +++ opts.c (working copy) >>>>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g >>>>> >> opts->x_param_values, opts_set->x_param_values); >>>>> >> } >>>>> >> >>>>> >> + if (opts->x_profile_arc_flag >>>>> >> + || opts->x_flag_branch_probabilities) >>>>> >> + { >>>>> >> + maybe_set_param_value >>>>> >> + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, >>>>> >> + opts->x_param_values, opts_set->x_param_values); >>>>> >> + } >>>>> >> + >>>>> >> if (!(opts->x_flag_auto_profile >>>>> >> || (opts->x_profile_arc_flag || >>>>> >> opts->x_flag_branch_probabilities))) >>>>> >> { >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | >>>>> >> 408-460-2413