On Mon, Oct 20, 2014 at 5:53 PM, Xinliang David Li <davi...@google.com> wrote: > On Mon, Oct 20, 2014 at 1:32 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li <davi...@google.com> >> wrote: >>> On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li <davi...@google.com> >>> wrote: >>>> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>>> The difference in instrumentation runtime is huge -- as topn profiler >>>>>> is pretty expensive to run. >>>>>> >>>>>> With FDO, it is probably better to make early inlining more aggressive >>>>>> in order to get more context sensitive profiling. >>>>> >>>>> I agree with that, I just would like to understand where increasing the >>>>> iterations >>>>> helps and if we can handle it without iterating (because Richi originally >>>>> requested to >>>>> drop the iteration for correcness issues) >> >> Well, I requested to do any iteration with an IPA view in mind. That is, >> iterate for cgraph cycles for example where currently we face the situation >> that at least one function is inlined unoptimized. For this we'd like to >> first optimize without inlining (well, maybe inlining doesn't hurt) > > yes -- inlining decision made without callee cleanup is more > conservative and should not hurt. > >>and then >> inline (and re-optimize if we inlined). >> >> Indirect edges are more interesting, but basically you'd want to re-inline >> once you discover new direct calls during early opts (but then make >> sure to do that only after the direct callee was early-optimized first). >> > > It would be interesting to inline the newly introduced direct calls if > the callsites also have function pointer arguments that are known in > the call context. > >> Thus it would be nice if somebody could improve on the currently very >> simple function ordering we apply early opts, integrating "iteration" >> in a better way (not iterating over all functions but only where it >> might make a difference, focused on inlining). >> >>>>> Do you have some examples? >>>> >>>> We can do FDO experiment by shutting down einline. (Note that >>>> increasing iteration to 2 did not actually improve performance with >>>> our benchmarks). >>> >>> Early inlining itself has large performance impact for FDO (the >>> runtime of the profile-use build). With it disabled, the FDO >>> performance drops by >2% on average. The degradation is seen across >>> all benchmarks except for one. >> >> Only 2%? You are lucky ;) > > 2% average is considered pretty significant for optimized build > runtime performance. > > >> For tramp3d introducing early inlining >> made a difference of 100000% ;) (yes, statistically for tramp3d >> we have for each assembler instruction generated 100 calls in the >> initial code ... wheee C++ template metaprogramming!) > > Is this 100000% difference from instrumentation build or optimized > build runtime?
It's from instrumentation build. I don't remember any numbers for the improvement on optimized build with FDO vs. non-FDO. Richard. >> >> So indeed early inlining was absoultely required to make FDO usable at all. > > thanks, > > David >> >> Richard. >> >>> David >>> >>> >>>> >>>> David >>>> >>>>> Honza >>>>>> >>>>>> David >>>>>> >>>>>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>>> >> Increasing the number of early inliner iterations from 1 to 2 enables >>>>>> >> more >>>>>> >> indirect calls to be promoted/inlined before instrumentation. This in >>>>>> >> turn >>>>>> >> reduces the instrumentation overhead, particularly for more expensive >>>>>> >> indirect >>>>>> >> call topn profiling. >>>>>> > >>>>>> > How much difference you get here? One posibility would be also to run >>>>>> > specialized >>>>>> > ipa-cp before profile instrumentation. >>>>>> > >>>>>> > Honza >>>>>> >> >>>>>> >> Passes internal testing and regression tests. Ok for google/4_9? >>>>>> >> >>>>>> >> 2014-10-18 Teresa Johnson <tejohn...@google.com> >>>>>> >> >>>>>> >> Google ref b/17934523 >>>>>> >> * opts.c (finish_options): Increase >>>>>> >> max-early-inliner-iterations to 2 >>>>>> >> for profile-gen and profile-use builds. >>>>>> >> >>>>>> >> Index: opts.c >>>>>> >> =================================================================== >>>>>> >> --- opts.c (revision 216286) >>>>>> >> +++ opts.c (working copy) >>>>>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct >>>>>> >> g >>>>>> >> opts->x_param_values, opts_set->x_param_values); >>>>>> >> } >>>>>> >> >>>>>> >> + if (opts->x_profile_arc_flag >>>>>> >> + || opts->x_flag_branch_probabilities) >>>>>> >> + { >>>>>> >> + maybe_set_param_value >>>>>> >> + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, >>>>>> >> + opts->x_param_values, opts_set->x_param_values); >>>>>> >> + } >>>>>> >> + >>>>>> >> if (!(opts->x_flag_auto_profile >>>>>> >> || (opts->x_profile_arc_flag || >>>>>> >> opts->x_flag_branch_probabilities))) >>>>>> >> { >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | >>>>>> >> 408-460-2413