> On 24 Jun 2025, at 7:43 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > External email: Use caution opening links or attachments > > > Hi, > this pass removes early-inlining from afdo pass since all inlining > should now happen from early inliner. I tedted this on spec and there > are 3 inlines happening here which are blocked at early-inline time by > hitting large function growth limit. We probably want to bypass that > limit, I will look into that incrementaly.
Thanks for doing this. Is the inlining difference here is due to annotation that happens in auto-profile pass in the earlier implementation? One unrelated question about scaling profiles. We seem to scale-up AFDO with and_count_scale and scale down local_profile in some other cases. Should we instead scale up AFDO profile to local_profile scale. Lot of the inlining and other parameters seem to work well with that. Thanks, Kugan > > This should make the non-inlined function profile merging hopefully > easier. > > It may still make sense to separate afdo inliner from early inliner to > solve the non-transitivity issues which is not that hard to do with > current code orgnaization. However this should be separate IPA pass > rather then another part of afdo pass, since it can be coneptually > separate. > > Boostrapped/regtested x86_64-linux, will commit it shortly. > > Honza > > gcc/ChangeLog: > > * auto-profile.cc: Update toplevel comment. > (early_inline): Remove. > (auto_profile): Don't do early inlining. > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc > index 8a1d9f878c6..3f8310e6324 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -76,21 +76,30 @@ along with GCC; see the file COPYING3. If not see > standalone symbol, or a clone of a function that is inlined into another > function. > > - Phase 2: Early inline + value profile transformation. > - Early inline uses autofdo_source_profile to find if a callsite is: > + Phase 2: AFDO inline + value profile transformation. > + This happens during early optimization. > + During early inlning AFDO inliner is executed which > + uses autofdo_source_profile to find if a callsite is: > * inlined in the profiled binary. > * callee body is hot in the profiling run. > If both condition satisfies, early inline will inline the callsite > regardless of the code growth. > - Phase 2 is an iterative process. During each iteration, we also check > - if an indirect callsite is promoted and inlined in the profiling run. > - If yes, vpt will happen to force promote it and in the next iteration, > - einline will inline the promoted callsite in the next iteration. > + > + Performing this early has benefit of doing early optimizations > + before read IPA passe and getting more "context sensitivity" of > + the profile read. Profile of inlined functions may differ > + significantly form one inline instance to another and from the > + offline version. > + > + This is controlled by -fauto-profile-inlinig and is independent > + of -fearly-inlining. > > Phase 3: Annotate control flow graph. > AutoFDO uses a separate pass to: > * Annotate basic block count > * Estimate branch probability > + * Use earlier static profile to fill in the gaps > + if AFDO profile is ambigous > > After the above 3 phases, all profile is readily annotated on the GCC IR. > AutoFDO tries to reuse all FDO infrastructure as much as possible to make > @@ -2217,18 +2226,6 @@ afdo_annotate_cfg (void) > free_dominance_info (CDI_POST_DOMINATORS); > } > > -/* Wrapper function to invoke early inliner. */ > - > -static unsigned int > -early_inline () > -{ > - compute_fn_summary (cgraph_node::get (current_function_decl), true); > - unsigned int todo = early_inliner (cfun); > - if (todo & TODO_update_ssa_any) > - update_ssa (TODO_update_ssa); > - return todo; > -} > - > /* Use AutoFDO profile to annoate the control flow graph. > Return the todo flag. */ > > @@ -2254,15 +2251,9 @@ auto_profile (void) > > push_cfun (DECL_STRUCT_FUNCTION (node->decl)); > > - unsigned int todo = early_inline (); > autofdo::afdo_annotate_cfg (); > compute_function_frequency (); > > - /* Local pure-const may imply need to fixup the cfg. */ > - todo |= execute_fixup_cfg (); > - if (todo & TODO_cleanup_cfg) > - cleanup_tree_cfg (); > - > free_dominance_info (CDI_DOMINATORS); > free_dominance_info (CDI_POST_DOMINATORS); > cgraph_edge::rebuild_edges ();