On Wed, Feb 12, 2014 at 8:02 AM, Teresa Johnson <tejohn...@google.com> wrote: > On Wed, Feb 12, 2014 at 6:45 AM, Teresa Johnson <tejohn...@google.com> wrote: >> On Tue, Feb 11, 2014 at 6:13 PM, Xinliang David Li <davi...@google.com> >> wrote: >>> On Tue, Feb 11, 2014 at 5:36 PM, Teresa Johnson <tejohn...@google.com> >>> wrote: >>>> On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li <davi...@google.com> >>>> wrote: >>>>> Why is call graph needed to determine whether to drop the profile? >>>> >>>> Because we detect this situation by looking for cases where the call >>>> edge counts greatly exceed the callee node count. >>>> >>>>> >>>>> If that is needed, it might be possible to leverage the ipa_profile >>>>> pass as it will walk through all function nodes to do profile >>>>> annotation. With this you can make decision to drop callee profile in >>>>> caller's context. >>>> >>>> There are 2 ipa profiling passes, which are somewhat confusingly named >>>> (to me at least. =). This is being done during the first. >>>> >>>> The first is pass_ipa_tree_profile in tree-profile.c, but is a >>>> SIMPLE_IPA_PASS and has the name "profile" in the dump. The second is >>>> pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the >>>> name "profile_estimate" in the dump. I assume you are suggesting to >>>> move this into the latter? But I'm not clear on what benefit that >>>> gives - the functions are not being traversed in order, so there is >>>> still the issue of needing to rebuild the cgraph after dropping >>>> profiles, which might be best done earlier as I have in the patch. >>> >>> >>> I meant the tree-profile one. I think this might work: after all the >>> function's profile counts are annotated, add another walk of the the >>> call graph nodes to drop bad profiles before the the call graph is >>> rebuilt (Call graph does exist at that point). >> >> Ok, so it is already done in tree-profile. But it sounds like you are >> suggesting reordering it to just above where we update the calls and >> rebuild the cgraph the first time? As you noted in a follow-on email >> to me, the cgraph edges don't have the profile counts at that point >> (and neither do the nodes), so I would need to compare the count on >> the call's bb to the entry bb count of the callee. That should be >> doable, let me take a stab at it. > > This works well. Tested on omnetpp which has some dropped profiles and > ensured that the behavior and output of the ipa tree profile phase is > the same. Re-running bootstrap and regression tests. > > Here's the new patch. The only changes from the earlier patch are in > handle_missing_profiles, where we now get the counts off of the entry > and call stmt bbs, and in tree_profiling, where we call > handle_missing_profiles earlier and I have removed the outlined cgraph > rebuilding code since it doesn't need to be reinvoked. > > Honza, does this look ok for trunk when stage 1 reopens? David, I can > send a similar patch for review to google-4_8 if it looks good. > > Thanks, > Teresa > > 2014-02-12 Teresa Johnson <tejohn...@google.com> > > * graphite.c (graphite_finalize): Pass new parameter. > * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. > * predict.c (tree_estimate_probability): New parameter. > (tree_estimate_probability_worker): Renamed from > tree_estimate_probability_driver. > (tree_reestimate_probability): New function. > (tree_estimate_probability_driver): Invoke > tree_estimate_probability_worker. > (freqs_to_counts): Move here from tree-inline.c. > (drop_profile): Re-estimated profiles when dropping counts. > (handle_missing_profiles): Drop for some non-zero functions as well, > get counts from bbs to support invocation before cgraph rebuild. > (counts_to_freqs): Remove code obviated by reestimation. > * predict.h (tree_estimate_probability): Update declaration. > * tree-inline.c (freqs_to_counts): Move to predict.c. > (copy_cfg_body): Remove code obviated by reestimation. > * tree-profile.c (tree_profiling): Invoke handle_missing_profiles > before cgraph rebuild. > > Index: graphite.c > =================================================================== > --- graphite.c (revision 207436) > +++ graphite.c (working copy) > @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) > cleanup_tree_cfg (); > profile_status_for_fn (cfun) = PROFILE_ABSENT; > release_recorded_exits (); > - tree_estimate_probability (); > + tree_estimate_probability (false); > } > > cloog_state_free (cloog_state); > Index: params.def > =================================================================== > --- params.def (revision 207436) > +++ params.def (working copy) > @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, > "Maximal estimated outcome of branch considered predictable", > 2, 0, 50) > > +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, > + "min-caller-reestimate-ratio", > + "Minimum caller-to-callee node count ratio to force > reestimated branch " > + "probabilities in callee (where 0 means only when callee > count is 0)", > + 10, 0, 0) > + > DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, > "inline-min-speedup", > "The minimal estimated speedup allowing inliner to ignore > inline-insns-single and inline-isnsns-auto", > Index: predict.c > =================================================================== > --- predict.c (revision 207436) > +++ predict.c (working copy) > @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb) > > /* Predict branch probabilities and estimate profile of the tree CFG. > This function can be called from the loop optimizers to recompute > - the profile information. */ > + the profile information. When REDO is true then we are forcing > + re-estimation of the probabilities because the profile was deemed > + insufficient. */ > > void > -tree_estimate_probability (void) > +tree_estimate_probability (bool redo) > { > basic_block bb; > > @@ -2390,7 +2392,8 @@ void > connect_infinite_loops_to_exit (); > /* We use loop_niter_by_eval, which requires that the loops have > preheaders. */ > - create_preheaders (CP_SIMPLE_PREHEADERS); > + if (!redo) > + create_preheaders (CP_SIMPLE_PREHEADERS); > calculate_dominance_info (CDI_POST_DOMINATORS); > > bb_predictions = pointer_map_create (); > @@ -2412,16 +2415,16 @@ void > pointer_map_destroy (bb_predictions); > bb_predictions = NULL; > > - estimate_bb_frequencies (false); > + estimate_bb_frequencies (redo); > free_dominance_info (CDI_POST_DOMINATORS); > remove_fake_exit_edges (); > } > > /* Predict branch probabilities and estimate profile of the tree CFG. > - This is the driver function for PASS_PROFILE. */ > + When REDO is true, we are forcing reestimation of the probabilities. */ > > -static unsigned int > -tree_estimate_probability_driver (void) > +static void > +tree_estimate_probability_worker (bool redo) > { > unsigned nb_loops; > > @@ -2435,7 +2438,7 @@ void > if (nb_loops > 1) > scev_initialize (); > > - tree_estimate_probability (); > + tree_estimate_probability (redo); > > if (nb_loops > 1) > scev_finalize (); > @@ -2445,6 +2448,34 @@ void > gimple_dump_cfg (dump_file, dump_flags); > if (profile_status_for_fn (cfun) == PROFILE_ABSENT) > profile_status_for_fn (cfun) = PROFILE_GUESSED; > +} > + > +/* Force re-estimation of the probabilities, because the profile was > + deemed insufficient. */ > + > +static void > +tree_reestimate_probability (void) > +{ > + basic_block bb; > + edge e; > + edge_iterator ei; > + > + /* Need to reset the counts to ensure probabilities are recomputed. */ > + FOR_EACH_BB_FN (bb, cfun) > + { > + bb->count = 0; > + FOR_EACH_EDGE (e, ei, bb->succs) > + e->count = 0; > + } > + tree_estimate_probability_worker (true); > +} > + > +/* Estimate probabilities. > + This is the driver function for PASS_PROFILE. */ > +static unsigned int > +tree_estimate_probability_driver (void) > +{ > + tree_estimate_probability_worker (false); > return 0; > } > ^L > @@ -2765,6 +2796,28 @@ estimate_loops (void) > BITMAP_FREE (tovisit); > } > > +/* Convert estimated frequencies into counts for NODE, scaling COUNT > + with each bb's frequency. Used when NODE has an entry count that > + is much lower than the caller edges reaching it. See the comments > + for handle_missing_profiles() for when this can happen for COMDATs. */ > + > +void > +freqs_to_counts (struct cgraph_node *node, gcov_type count) > +{ > + basic_block bb; > + edge_iterator ei; > + edge e; > + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + FOR_ALL_BB_FN(bb, fn) > + { > + bb->count = apply_scale (count, > + GCOV_COMPUTE_SCALE (bb->frequency, > BB_FREQ_MAX)); > + FOR_EACH_EDGE (e, ei, bb->succs) > + e->count = apply_probability (e->src->count, e->probability); > + } > +} > + > /* Drop the profile for NODE to guessed, and update its frequency based on > whether it is expected to be hot given the CALL_COUNT. */ > > @@ -2772,6 +2825,9 @@ static void > drop_profile (struct cgraph_node *node, gcov_type call_count) > { > struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + if (profile_status_for_fn (fn) == PROFILE_GUESSED) > + return; > /* In the case where this was called by another function with a > dropped profile, call_count will be 0. Since there are no > non-zero call counts to this function, we don't know for sure > @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type > > if (dump_file) > fprintf (dump_file, > - "Dropping 0 profile for %s/%i. %s based on calls.\n", > + "Dropping %ld profile for %s/%i. %s based on calls.\n", > + node->count, > node->name (), node->order, > hot ? "Function is hot" : "Function is normal"); > /* We only expect to miss profiles for functions that are reached > @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type > node->name (), node->order); > } > > + /* Re-estimate the probabilities for function and use the estimated > + frequencies to compute the counts. */ > + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); > + tree_reestimate_probability (); > + freqs_to_counts (node, call_count); > + if (dump_file) > + { > + fprintf (dump_file, "After re-estimating probabilies and counts\n"); > + gimple_dump_cfg (dump_file, > dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS); > + } > + pop_cfun (); > + > profile_status_for_fn (fn) > = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); > node->frequency > @@ -2815,15 +2884,29 @@ drop_profile (struct cgraph_node *node, gcov_type > /* In the case of COMDAT routines, multiple object files will contain the > same > function and the linker will select one for the binary. In that case > all the other copies from the profile instrument binary will be missing > - profile counts. Look for cases where this happened, due to non-zero > + profile counts. This can confuse downstream optimizations such as > + function splitting. > + > + Look for cases where this happened, due to non-zero > call counts going to 0-count functions, and drop the profile to guessed > so that we can use the estimated probabilities and avoid optimizing only > - for size. > + for size. In the case where the COMDAT was inlined in some locations > + within the file and not others, the profile count will be non-zero due > + to the inlined instances, but may still be significantly smaller than the > + call edges for the non-inlined instances. Detect that case when requested > + and reestimate probabilities, since the counts will not necessarily > reflect > + the behavior along the more frequent call paths. > > The other case where the profile may be missing is when the routine > is not going to be emitted to the object file, e.g. for "extern template" > class methods. Those will be marked DECL_EXTERNAL. Emit a warning in > - all other cases of non-zero calls to 0-count functions. */ > + all other cases of non-zero calls to 0-count functions. > + > + This is now invoked before rebuilding the cgraph after reading profile > + counts, so the cgraph edge and node counts are still 0. Therefore we > + need to look at the counts on the entry bbs and the call stmt bbs. > + That way we can avoid needing to rebuild the cgraph again to reflect > + the nodes with dropped profiles. */ > > void > handle_missing_profiles (void) > @@ -2832,9 +2915,11 @@ handle_missing_profiles (void) > int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); > vec<struct cgraph_node *> worklist; > worklist.create (64); > + int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO); > > - /* See if 0 count function has non-0 count callers. In this case we > - lost some profile. Drop its function profile to PROFILE_GUESSED. */ > + /* See if 0 or low count function has higher count caller edges. In this > + case we lost some profile. Drop its function profile to > + PROFILE_GUESSED. */ > FOR_EACH_DEFINED_FUNCTION (node) > { > struct cgraph_edge *e; > @@ -2842,48 +2927,75 @@ handle_missing_profiles (void) > gcov_type max_tp_first_run = 0; > struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > > - if (node->count) > - continue; > for (e = node->callers; e; e = e->next_caller) > { > - call_count += e->count; > + call_count += gimple_bb (e->call_stmt)->count; > > if (e->caller->tp_first_run > max_tp_first_run) > max_tp_first_run = e->caller->tp_first_run; > }
Should non comdat function be skipped? > > + if (!fn || !fn->cfg) > + continue; > + > + gcov_type node_count = ENTRY_BLOCK_PTR_FOR_FN (fn)->count; > + > + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop > + profiles for 0-count functions called by non-zero call edges. */ > + if ((!min_reest_ratio && node_count > 0) > + || (min_reest_ratio && node_count * min_reest_ratio > call_count)) > + continue; > + > /* If time profile is missing, let assign the maximum that comes from > caller functions. */ > if (!node->tp_first_run && max_tp_first_run) > node->tp_first_run = max_tp_first_run + 1; > > if (call_count > - && fn && fn->cfg > && (call_count * unlikely_count_fraction >= profile_info->runs)) > { > drop_profile (node, call_count); > worklist.safe_push (node); > } > } > - > - /* Propagate the profile dropping to other 0-count COMDATs that are > + /* Propagate the profile dropping to other low-count COMDATs that are > potentially called by COMDATs we already dropped the profile on. */ > while (worklist.length () > 0) > { > struct cgraph_edge *e; > > node = worklist.pop (); > + struct function *node_fn = DECL_STRUCT_FUNCTION (node->decl); > + gcc_assert (node_fn && node_fn->cfg); > + gcov_type node_count = ENTRY_BLOCK_PTR_FOR_FN (node_fn)->count; > for (e = node->callees; e; e = e->next_caller) > { > struct cgraph_node *callee = e->callee; > struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); > + if (!fn || !fn->cfg) > + continue; > + gcov_type callee_count = ENTRY_BLOCK_PTR_FOR_FN (fn)->count; > > - if (callee->count > 0) > + /* When min_reest_ratio is non-zero, if we get here we dropped > + a caller's profile since it was significantly smaller than its > + call edge. Drop the profile on any callees whose node count is > + now exceeded by the new caller node count. */ > + if ((!min_reest_ratio && callee_count > 0) > + || (min_reest_ratio && callee_count >= node_count)) > continue; > - if (DECL_COMDAT (callee->decl) && fn && fn->cfg > + > + gcov_type call_count = 0; > + if (min_reest_ratio > 0) > + { > + struct cgraph_edge *e2; > + for (e2 = node->callers; e2; e2 = e2->next_caller) > + call_count += gimple_bb (e2->call_stmt)->count; > + } > + > + if (DECL_COMDAT (callee->decl) > && profile_status_for_fn (fn) == PROFILE_READ) > { > - drop_profile (node, 0); > + drop_profile (node, call_count); > worklist.safe_push (callee); > } Should the comdat check be done earlier? David > } > @@ -2900,12 +3012,6 @@ counts_to_freqs (void) > gcov_type count_max, true_count_max = 0; > basic_block bb; > > - /* Don't overwrite the estimated frequencies when the profile for > - the function is missing. We may drop this function PROFILE_GUESSED > - later in drop_profile (). */ > - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count) > - return 0; > - > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) > true_count_max = MAX (bb->count, true_count_max); > > Index: predict.h > =================================================================== > --- predict.h (revision 207436) > +++ predict.h (working copy) > @@ -51,7 +51,7 @@ extern void handle_missing_profiles (void); > extern void estimate_bb_frequencies (bool); > extern const char *predictor_name (enum br_predictor); > extern tree build_predict_expr (enum br_predictor, enum prediction); > -extern void tree_estimate_probability (void); > +extern void tree_estimate_probability (bool); > extern void compute_function_frequency (void); > extern void rebuild_frequencies (void); > > Index: tree-inline.c > =================================================================== > --- tree-inline.c (revision 207436) > +++ tree-inline.c (working copy) > @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo > } > } > > -/* Convert estimated frequencies into counts for NODE, scaling COUNT > - with each bb's frequency. Used when NODE has a 0-weight entry > - but we are about to inline it into a non-zero count call bb. > - See the comments for handle_missing_profiles() in predict.c for > - when this can happen for COMDATs. */ > - > -void > -freqs_to_counts (struct cgraph_node *node, gcov_type count) > -{ > - basic_block bb; > - edge_iterator ei; > - edge e; > - struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > - > - FOR_ALL_BB_FN(bb, fn) > - { > - bb->count = apply_scale (count, > - GCOV_COMPUTE_SCALE (bb->frequency, > BB_FREQ_MAX)); > - FOR_EACH_EDGE (e, ei, bb->succs) > - e->count = apply_probability (e->src->count, e->probability); > - } > -} > - > /* Make a copy of the body of FN so that it can be inserted inline in > another function. Walks FN via CFG, returns new fndecl. */ > > @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun > int incoming_frequency = 0; > gcov_type incoming_count = 0; > > - /* This can happen for COMDAT routines that end up with 0 counts > - despite being called (see the comments for handle_missing_profiles() > - in predict.c as to why). Apply counts to the blocks in the callee > - before inlining, using the guessed edge frequencies, so that we don't > - end up with a 0-count inline body which can confuse downstream > - optimizations such as function splitting. */ > - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count) > - { > - /* Apply the larger of the call bb count and the total incoming > - call edge count to the callee. */ > - gcov_type in_count = 0; > - struct cgraph_edge *in_edge; > - for (in_edge = id->src_node->callers; in_edge; > - in_edge = in_edge->next_caller) > - in_count += in_edge->count; > - freqs_to_counts (id->src_node, count > in_count ? count : in_count); > - } > - > if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count) > count_scale > = GCOV_COMPUTE_SCALE (count, > @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun > else > count_scale = REG_BR_PROB_BASE; > > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + "Scaling entry count %ld to %ld with scale %ld while inlining " > + "%s into %s\n", > + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale, > + id->src_node->name (), id->dst_node->name ()); > + > /* Register specific tree functions. */ > gimple_register_cfg_hooks (); > > Index: tree-profile.c > =================================================================== > --- tree-profile.c (revision 207436) > +++ tree-profile.c (working copy) > @@ -621,6 +621,8 @@ tree_profiling (void) > cgraph_set_pure_flag (node, false, false); > } > > + handle_missing_profiles (); > + > /* Update call statements and rebuild the cgraph. */ > FOR_EACH_DEFINED_FUNCTION (node) > { > @@ -657,8 +659,6 @@ tree_profiling (void) > pop_cfun (); > } > > - handle_missing_profiles (); > - > del_node_map (); > return 0; > } > >> >> Thanks, >> Teresa >> >>> >>> David >>>> >>>> Teresa >>>> >>>>> >>>>> David >>>>> >>>>> On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson <tejohn...@google.com> >>>>> wrote: >>>>>> On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li <davi...@google.com> >>>>>> wrote: >>>>>>> Is it better to add some logic in counts_to_freq to determine if the >>>>>>> profile count needs to be dropped completely to force profile >>>>>>> estimation? >>>>>> >>>>>> This is the problem I was mentioning below where we call >>>>>> counts_to_freqs before we have the cgraph and can tell that we will >>>>>> need to drop the profile. When we were only dropping the profile for >>>>>> functions with all 0 counts, we simply avoided doing the >>>>>> counts_to_freqs when the counts were all 0, which works since the 0 >>>>>> counts don't leave things in an inconsistent state (counts vs >>>>>> estimated frequencies). >>>>>> >>>>>> Teresa >>>>>> >>>>>>> >>>>>>> David >>>>>>> >>>>>>> On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <tejohn...@google.com> >>>>>>> wrote: >>>>>>>> This patch attempts to address the lost profile issue for COMDATs in >>>>>>>> more circumstances, exposed by function splitting. >>>>>>>> >>>>>>>> My earlier patch handled the case where the comdat had 0 counts since >>>>>>>> the linker kept the copy in a different module. In that case we >>>>>>>> prevent the guessed frequencies on 0-count functions from being >>>>>>>> dropped by counts_to_freq, and then later mark any reached via >>>>>>>> non-zero callgraph edges as guessed. Finally, when one such 0-count >>>>>>>> comdat is inlined the call count is propagated to the callee blocks >>>>>>>> using the guessed probabilities. >>>>>>>> >>>>>>>> However, in this case, there was a comdat that had a very small >>>>>>>> non-zero count, that was being inlined to a much hotter callsite. This >>>>>>>> could happen when there was a copy that was ipa-inlined >>>>>>>> in the profile gen compile, so the copy in that module gets some >>>>>>>> non-zero counts from the ipa inlined instance, but when the out of >>>>>>>> line copy was eliminated by the linker (selected from a different >>>>>>>> module). In this case the inliner was scaling the bb counts up quite a >>>>>>>> lot when inlining. The problem is that you most likely can't trust >>>>>>>> that the 0 count bbs in such a case are really not executed by the >>>>>>>> callsite it is being into, since the counts are very small and >>>>>>>> correspond to a different callsite. In some internal C++ applications >>>>>>>> I am seeing that we execute out of the split cold portion of COMDATs >>>>>>>> for this reason. >>>>>>>> >>>>>>>> This problem is more complicated to address than the 0-count instance, >>>>>>>> because we need the cgraph to determine which functions to drop the >>>>>>>> profile on, and at that point the estimated frequencies have already >>>>>>>> been overwritten by counts_to_freqs. To handle this broader case, I >>>>>>>> have changed the drop_profile routine to simply re-estimate the >>>>>>>> probabilities/frequencies (and translate these into counts scaled from >>>>>>>> the incoming call edge counts). This unfortunately necessitates >>>>>>>> rebuilding the cgraph, to propagate the new synthesized counts and >>>>>>>> avoid checking failures downstream. But it will only be rebuilt if we >>>>>>>> dropped any profiles. With this solution, some of the older approach >>>>>>>> can be removed (e.g. propagating counts using the guessed >>>>>>>> probabilities during inlining). >>>>>>>> >>>>>>>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. >>>>>>>> Also tested on >>>>>>>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Teresa >>>>>>>> >>>>>>>> 2014-02-10 Teresa Johnson <tejohn...@google.com> >>>>>>>> >>>>>>>> * graphite.c (graphite_finalize): Pass new parameter. >>>>>>>> * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. >>>>>>>> * predict.c (tree_estimate_probability): New parameter. >>>>>>>> (tree_estimate_probability_worker): Renamed from >>>>>>>> tree_estimate_probability_driver. >>>>>>>> (tree_reestimate_probability): New function. >>>>>>>> (tree_estimate_probability_driver): Invoke >>>>>>>> tree_estimate_probability_worker. >>>>>>>> (freqs_to_counts): Move from tree-inline.c. >>>>>>>> (drop_profile): Re-estimated profiles when dropping counts. >>>>>>>> (handle_missing_profiles): Drop for some non-zero functions as >>>>>>>> well. >>>>>>>> (counts_to_freqs): Remove code obviated by reestimation. >>>>>>>> * predict.h (handle_missing_profiles): Update declartion. >>>>>>>> (tree_estimate_probability): Ditto. >>>>>>>> * tree-inline.c (freqs_to_counts): Move to predict.c. >>>>>>>> (copy_cfg_body): Remove code obviated by reestimation. >>>>>>>> * tree-profile.c (gimple_gen_ior_profiler): >>>>>>>> (rebuild_cgraph): Code extracted from tree_profiling to >>>>>>>> rebuild cgraph. >>>>>>>> (tree_profiling): Invoke rebuild_cgraph as needed. >>>>>>>> >>>>>>>> Index: graphite.c >>>>>>>> =================================================================== >>>>>>>> --- graphite.c (revision 207436) >>>>>>>> +++ graphite.c (working copy) >>>>>>>> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) >>>>>>>> cleanup_tree_cfg (); >>>>>>>> profile_status_for_fn (cfun) = PROFILE_ABSENT; >>>>>>>> release_recorded_exits (); >>>>>>>> - tree_estimate_probability (); >>>>>>>> + tree_estimate_probability (false); >>>>>>>> } >>>>>>>> >>>>>>>> cloog_state_free (cloog_state); >>>>>>>> Index: params.def >>>>>>>> =================================================================== >>>>>>>> --- params.def (revision 207436) >>>>>>>> +++ params.def (working copy) >>>>>>>> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, >>>>>>>> "Maximal estimated outcome of branch considered predictable", >>>>>>>> 2, 0, 50) >>>>>>>> >>>>>>>> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, >>>>>>>> + "min-caller-reestimate-ratio", >>>>>>>> + "Minimum caller-to-callee node count ratio to force >>>>>>>> reestimated branch " >>>>>>>> + "probabilities in callee (where 0 means only when callee >>>>>>>> count is 0)", >>>>>>>> + 10, 0, 0) >>>>>>>> + >>>>>>>> DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, >>>>>>>> "inline-min-speedup", >>>>>>>> "The minimal estimated speedup allowing inliner to ignore >>>>>>>> inline-insns-single and inline-isnsns-auto", >>>>>>>> Index: predict.c >>>>>>>> =================================================================== >>>>>>>> --- predict.c (revision 207436) >>>>>>>> +++ predict.c (working copy) >>>>>>>> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb) >>>>>>>> >>>>>>>> /* Predict branch probabilities and estimate profile of the tree CFG. >>>>>>>> This function can be called from the loop optimizers to recompute >>>>>>>> - the profile information. */ >>>>>>>> + the profile information. When REDO is true then we are forcing >>>>>>>> + re-estimation of the probabilities because the profile was deemed >>>>>>>> + insufficient. */ >>>>>>>> >>>>>>>> void >>>>>>>> -tree_estimate_probability (void) >>>>>>>> +tree_estimate_probability (bool redo) >>>>>>>> { >>>>>>>> basic_block bb; >>>>>>>> >>>>>>>> @@ -2390,7 +2392,8 @@ void >>>>>>>> connect_infinite_loops_to_exit (); >>>>>>>> /* We use loop_niter_by_eval, which requires that the loops have >>>>>>>> preheaders. */ >>>>>>>> - create_preheaders (CP_SIMPLE_PREHEADERS); >>>>>>>> + if (!redo) >>>>>>>> + create_preheaders (CP_SIMPLE_PREHEADERS); >>>>>>>> calculate_dominance_info (CDI_POST_DOMINATORS); >>>>>>>> >>>>>>>> bb_predictions = pointer_map_create (); >>>>>>>> @@ -2412,16 +2415,16 @@ void >>>>>>>> pointer_map_destroy (bb_predictions); >>>>>>>> bb_predictions = NULL; >>>>>>>> >>>>>>>> - estimate_bb_frequencies (false); >>>>>>>> + estimate_bb_frequencies (redo); >>>>>>>> free_dominance_info (CDI_POST_DOMINATORS); >>>>>>>> remove_fake_exit_edges (); >>>>>>>> } >>>>>>>> >>>>>>>> /* Predict branch probabilities and estimate profile of the tree CFG. >>>>>>>> - This is the driver function for PASS_PROFILE. */ >>>>>>>> + When REDO is true, we are forcing reestimation of the >>>>>>>> probabilities. */ >>>>>>>> >>>>>>>> -static unsigned int >>>>>>>> -tree_estimate_probability_driver (void) >>>>>>>> +static void >>>>>>>> +tree_estimate_probability_worker (bool redo) >>>>>>>> { >>>>>>>> unsigned nb_loops; >>>>>>>> >>>>>>>> @@ -2435,7 +2438,7 @@ void >>>>>>>> if (nb_loops > 1) >>>>>>>> scev_initialize (); >>>>>>>> >>>>>>>> - tree_estimate_probability (); >>>>>>>> + tree_estimate_probability (redo); >>>>>>>> >>>>>>>> if (nb_loops > 1) >>>>>>>> scev_finalize (); >>>>>>>> @@ -2445,6 +2448,34 @@ void >>>>>>>> gimple_dump_cfg (dump_file, dump_flags); >>>>>>>> if (profile_status_for_fn (cfun) == PROFILE_ABSENT) >>>>>>>> profile_status_for_fn (cfun) = PROFILE_GUESSED; >>>>>>>> +} >>>>>>>> + >>>>>>>> +/* Force re-estimation of the probabilities, because the profile was >>>>>>>> + deemed insufficient. */ >>>>>>>> + >>>>>>>> +static void >>>>>>>> +tree_reestimate_probability (void) >>>>>>>> +{ >>>>>>>> + basic_block bb; >>>>>>>> + edge e; >>>>>>>> + edge_iterator ei; >>>>>>>> + >>>>>>>> + /* Need to reset the counts to ensure probabilities are recomputed. >>>>>>>> */ >>>>>>>> + FOR_EACH_BB_FN (bb, cfun) >>>>>>>> + { >>>>>>>> + bb->count = 0; >>>>>>>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>>>>>> + e->count = 0; >>>>>>>> + } >>>>>>>> + tree_estimate_probability_worker (true); >>>>>>>> +} >>>>>>>> + >>>>>>>> +/* Estimate probabilities. >>>>>>>> + This is the driver function for PASS_PROFILE. */ >>>>>>>> +static unsigned int >>>>>>>> +tree_estimate_probability_driver (void) >>>>>>>> +{ >>>>>>>> + tree_estimate_probability_worker (false); >>>>>>>> return 0; >>>>>>>> } >>>>>>>> ^L >>>>>>>> @@ -2765,6 +2796,28 @@ estimate_loops (void) >>>>>>>> BITMAP_FREE (tovisit); >>>>>>>> } >>>>>>>> >>>>>>>> +/* Convert estimated frequencies into counts for NODE, scaling COUNT >>>>>>>> + with each bb's frequency. Used when NODE has an entry count that >>>>>>>> + is much lower than the caller edges reaching it. See the comments >>>>>>>> + for handle_missing_profiles() for when this can happen for >>>>>>>> COMDATs. */ >>>>>>>> + >>>>>>>> +void >>>>>>>> +freqs_to_counts (struct cgraph_node *node, gcov_type count) >>>>>>>> +{ >>>>>>>> + basic_block bb; >>>>>>>> + edge_iterator ei; >>>>>>>> + edge e; >>>>>>>> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>>>> + >>>>>>>> + FOR_ALL_BB_FN(bb, fn) >>>>>>>> + { >>>>>>>> + bb->count = apply_scale (count, >>>>>>>> + GCOV_COMPUTE_SCALE (bb->frequency, >>>>>>>> BB_FREQ_MAX)); >>>>>>>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>>>>>> + e->count = apply_probability (e->src->count, e->probability); >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> /* Drop the profile for NODE to guessed, and update its frequency >>>>>>>> based on >>>>>>>> whether it is expected to be hot given the CALL_COUNT. */ >>>>>>>> >>>>>>>> @@ -2772,6 +2825,9 @@ static void >>>>>>>> drop_profile (struct cgraph_node *node, gcov_type call_count) >>>>>>>> { >>>>>>>> struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>>>> + >>>>>>>> + if (profile_status_for_fn (fn) == PROFILE_GUESSED) >>>>>>>> + return; >>>>>>>> /* In the case where this was called by another function with a >>>>>>>> dropped profile, call_count will be 0. Since there are no >>>>>>>> non-zero call counts to this function, we don't know for sure >>>>>>>> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type >>>>>>>> >>>>>>>> if (dump_file) >>>>>>>> fprintf (dump_file, >>>>>>>> - "Dropping 0 profile for %s/%i. %s based on calls.\n", >>>>>>>> + "Dropping %ld profile for %s/%i. %s based on calls.\n", >>>>>>>> + node->count, >>>>>>>> node->name (), node->order, >>>>>>>> hot ? "Function is hot" : "Function is normal"); >>>>>>>> /* We only expect to miss profiles for functions that are reached >>>>>>>> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, >>>>>>>> gcov_type >>>>>>>> node->name (), node->order); >>>>>>>> } >>>>>>>> >>>>>>>> + /* Re-estimate the probabilities for function and use the estimated >>>>>>>> + frequencies to compute the counts. */ >>>>>>>> + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >>>>>>>> + tree_reestimate_probability (); >>>>>>>> + freqs_to_counts (node, call_count); >>>>>>>> + if (dump_file) >>>>>>>> + { >>>>>>>> + fprintf (dump_file, "After re-estimating probabilies and >>>>>>>> counts\n"); >>>>>>>> + gimple_dump_cfg (dump_file, >>>>>>>> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS); >>>>>>>> + } >>>>>>>> + pop_cfun (); >>>>>>>> + >>>>>>>> profile_status_for_fn (fn) >>>>>>>> = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >>>>>>>> node->frequency >>>>>>>> @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, >>>>>>>> gcov_type >>>>>>>> /* In the case of COMDAT routines, multiple object files will contain >>>>>>>> the same >>>>>>>> function and the linker will select one for the binary. In that >>>>>>>> case >>>>>>>> all the other copies from the profile instrument binary will be >>>>>>>> missing >>>>>>>> - profile counts. Look for cases where this happened, due to non-zero >>>>>>>> + profile counts. This can confuse downstream optimizations such as >>>>>>>> + function splitting. >>>>>>>> + >>>>>>>> + Look for cases where this happened, due to non-zero >>>>>>>> call counts going to 0-count functions, and drop the profile to >>>>>>>> guessed >>>>>>>> so that we can use the estimated probabilities and avoid >>>>>>>> optimizing only >>>>>>>> - for size. >>>>>>>> + for size. In the case where the COMDAT was inlined in some >>>>>>>> locations >>>>>>>> + within the file and not others, the profile count will be non-zero >>>>>>>> due >>>>>>>> + to the inlined instances, but may still be significantly smaller >>>>>>>> than the >>>>>>>> + call edges for the non-inlined instances. Detect that case when >>>>>>>> requested >>>>>>>> + and reestimate probabilities, since the counts will not >>>>>>>> necessarily reflect >>>>>>>> + the behavior along the more frequent call paths. >>>>>>>> >>>>>>>> The other case where the profile may be missing is when the routine >>>>>>>> is not going to be emitted to the object file, e.g. for "extern >>>>>>>> template" >>>>>>>> class methods. Those will be marked DECL_EXTERNAL. Emit a warning >>>>>>>> in >>>>>>>> all other cases of non-zero calls to 0-count functions. */ >>>>>>>> >>>>>>>> -void >>>>>>>> +bool >>>>>>>> handle_missing_profiles (void) >>>>>>>> { >>>>>>>> struct cgraph_node *node; >>>>>>>> int unlikely_count_fraction = PARAM_VALUE >>>>>>>> (UNLIKELY_BB_COUNT_FRACTION); >>>>>>>> vec<struct cgraph_node *> worklist; >>>>>>>> worklist.create (64); >>>>>>>> + int min_reest_ratio = PARAM_VALUE >>>>>>>> (PARAM_MIN_CALLER_REESTIMATE_RATIO); >>>>>>>> + bool changed = false; >>>>>>>> >>>>>>>> - /* See if 0 count function has non-0 count callers. In this case we >>>>>>>> - lost some profile. Drop its function profile to >>>>>>>> PROFILE_GUESSED. */ >>>>>>>> + /* See if 0 or low count function has higher count caller edges. >>>>>>>> In this >>>>>>>> + case we lost some profile. Drop its function profile to >>>>>>>> + PROFILE_GUESSED. */ >>>>>>>> FOR_EACH_DEFINED_FUNCTION (node) >>>>>>>> { >>>>>>>> struct cgraph_edge *e; >>>>>>>> @@ -2842,8 +2922,6 @@ handle_missing_profiles (void) >>>>>>>> gcov_type max_tp_first_run = 0; >>>>>>>> struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>>>> >>>>>>>> - if (node->count) >>>>>>>> - continue; >>>>>>>> for (e = node->callers; e; e = e->next_caller) >>>>>>>> { >>>>>>>> call_count += e->count; >>>>>>>> @@ -2852,6 +2930,12 @@ handle_missing_profiles (void) >>>>>>>> max_tp_first_run = e->caller->tp_first_run; >>>>>>>> } >>>>>>>> >>>>>>>> + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we >>>>>>>> only drop >>>>>>>> + profiles for 0-count functions called by non-zero call >>>>>>>> edges. */ >>>>>>>> + if ((!min_reest_ratio && node->count > 0) >>>>>>>> + || (min_reest_ratio && node->count * min_reest_ratio > >>>>>>>> call_count)) >>>>>>>> + continue; >>>>>>>> + >>>>>>>> /* If time profile is missing, let assign the maximum that >>>>>>>> comes from >>>>>>>> caller functions. */ >>>>>>>> if (!node->tp_first_run && max_tp_first_run) >>>>>>>> @@ -2862,11 +2946,12 @@ handle_missing_profiles (void) >>>>>>>> && (call_count * unlikely_count_fraction >= >>>>>>>> profile_info->runs)) >>>>>>>> { >>>>>>>> drop_profile (node, call_count); >>>>>>>> + changed = true; >>>>>>>> worklist.safe_push (node); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> - /* Propagate the profile dropping to other 0-count COMDATs that are >>>>>>>> + /* Propagate the profile dropping to other low-count COMDATs that >>>>>>>> are >>>>>>>> potentially called by COMDATs we already dropped the profile on. >>>>>>>> */ >>>>>>>> while (worklist.length () > 0) >>>>>>>> { >>>>>>>> @@ -2878,17 +2963,33 @@ handle_missing_profiles (void) >>>>>>>> struct cgraph_node *callee = e->callee; >>>>>>>> struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >>>>>>>> >>>>>>>> - if (callee->count > 0) >>>>>>>> + /* When min_reest_ratio is non-zero, if we get here we >>>>>>>> dropped >>>>>>>> + a caller's profile since it was significantly smaller >>>>>>>> than its >>>>>>>> + call edge. Drop the profile on any callees whose node >>>>>>>> count is >>>>>>>> + now exceeded by the new caller node count. */ >>>>>>>> + if ((!min_reest_ratio && callee->count > 0) >>>>>>>> + || (min_reest_ratio && callee->count >= node->count)) >>>>>>>> continue; >>>>>>>> + >>>>>>>> + gcov_type call_count = 0; >>>>>>>> + if (min_reest_ratio > 0) >>>>>>>> + { >>>>>>>> + struct cgraph_edge *e2; >>>>>>>> + for (e2 = node->callers; e2; e2 = e2->next_caller) >>>>>>>> + call_count += e2->count; >>>>>>>> + } >>>>>>>> + >>>>>>>> if (DECL_COMDAT (callee->decl) && fn && fn->cfg >>>>>>>> && profile_status_for_fn (fn) == PROFILE_READ) >>>>>>>> { >>>>>>>> - drop_profile (node, 0); >>>>>>>> + drop_profile (node, call_count); >>>>>>>> + changed = true; >>>>>>>> worklist.safe_push (callee); >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> worklist.release (); >>>>>>>> + return changed; >>>>>>>> } >>>>>>>> >>>>>>>> /* Convert counts measured by profile driven feedback to frequencies. >>>>>>>> @@ -2900,12 +3001,6 @@ counts_to_freqs (void) >>>>>>>> gcov_type count_max, true_count_max = 0; >>>>>>>> basic_block bb; >>>>>>>> >>>>>>>> - /* Don't overwrite the estimated frequencies when the profile for >>>>>>>> - the function is missing. We may drop this function >>>>>>>> PROFILE_GUESSED >>>>>>>> - later in drop_profile (). */ >>>>>>>> - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count) >>>>>>>> - return 0; >>>>>>>> - >>>>>>>> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) >>>>>>>> true_count_max = MAX (bb->count, true_count_max); >>>>>>>> >>>>>>>> Index: predict.h >>>>>>>> =================================================================== >>>>>>>> --- predict.h (revision 207436) >>>>>>>> +++ predict.h (working copy) >>>>>>>> @@ -47,11 +47,11 @@ enum prediction >>>>>>>> >>>>>>>> extern void predict_insn_def (rtx, enum br_predictor, enum >>>>>>>> prediction); >>>>>>>> extern int counts_to_freqs (void); >>>>>>>> -extern void handle_missing_profiles (void); >>>>>>>> +extern bool handle_missing_profiles (void); >>>>>>>> extern void estimate_bb_frequencies (bool); >>>>>>>> extern const char *predictor_name (enum br_predictor); >>>>>>>> extern tree build_predict_expr (enum br_predictor, enum prediction); >>>>>>>> -extern void tree_estimate_probability (void); >>>>>>>> +extern void tree_estimate_probability (bool); >>>>>>>> extern void compute_function_frequency (void); >>>>>>>> extern void rebuild_frequencies (void); >>>>>>>> >>>>>>>> Index: tree-inline.c >>>>>>>> =================================================================== >>>>>>>> --- tree-inline.c (revision 207436) >>>>>>>> +++ tree-inline.c (working copy) >>>>>>>> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, >>>>>>>> basic_blo >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> -/* Convert estimated frequencies into counts for NODE, scaling COUNT >>>>>>>> - with each bb's frequency. Used when NODE has a 0-weight entry >>>>>>>> - but we are about to inline it into a non-zero count call bb. >>>>>>>> - See the comments for handle_missing_profiles() in predict.c for >>>>>>>> - when this can happen for COMDATs. */ >>>>>>>> - >>>>>>>> -void >>>>>>>> -freqs_to_counts (struct cgraph_node *node, gcov_type count) >>>>>>>> -{ >>>>>>>> - basic_block bb; >>>>>>>> - edge_iterator ei; >>>>>>>> - edge e; >>>>>>>> - struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>>>> - >>>>>>>> - FOR_ALL_BB_FN(bb, fn) >>>>>>>> - { >>>>>>>> - bb->count = apply_scale (count, >>>>>>>> - GCOV_COMPUTE_SCALE (bb->frequency, >>>>>>>> BB_FREQ_MAX)); >>>>>>>> - FOR_EACH_EDGE (e, ei, bb->succs) >>>>>>>> - e->count = apply_probability (e->src->count, e->probability); >>>>>>>> - } >>>>>>>> -} >>>>>>>> - >>>>>>>> /* Make a copy of the body of FN so that it can be inserted inline in >>>>>>>> another function. Walks FN via CFG, returns new fndecl. */ >>>>>>>> >>>>>>>> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type >>>>>>>> coun >>>>>>>> int incoming_frequency = 0; >>>>>>>> gcov_type incoming_count = 0; >>>>>>>> >>>>>>>> - /* This can happen for COMDAT routines that end up with 0 counts >>>>>>>> - despite being called (see the comments for >>>>>>>> handle_missing_profiles() >>>>>>>> - in predict.c as to why). Apply counts to the blocks in the callee >>>>>>>> - before inlining, using the guessed edge frequencies, so that we >>>>>>>> don't >>>>>>>> - end up with a 0-count inline body which can confuse downstream >>>>>>>> - optimizations such as function splitting. */ >>>>>>>> - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count) >>>>>>>> - { >>>>>>>> - /* Apply the larger of the call bb count and the total incoming >>>>>>>> - call edge count to the callee. */ >>>>>>>> - gcov_type in_count = 0; >>>>>>>> - struct cgraph_edge *in_edge; >>>>>>>> - for (in_edge = id->src_node->callers; in_edge; >>>>>>>> - in_edge = in_edge->next_caller) >>>>>>>> - in_count += in_edge->count; >>>>>>>> - freqs_to_counts (id->src_node, count > in_count ? count : >>>>>>>> in_count); >>>>>>>> - } >>>>>>>> - >>>>>>>> if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count) >>>>>>>> count_scale >>>>>>>> = GCOV_COMPUTE_SCALE (count, >>>>>>>> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type >>>>>>>> coun >>>>>>>> else >>>>>>>> count_scale = REG_BR_PROB_BASE; >>>>>>>> >>>>>>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>>>>>> + fprintf (dump_file, >>>>>>>> + "Scaling entry count %ld to %ld with scale %ld while >>>>>>>> inlining " >>>>>>>> + "%s into %s\n", >>>>>>>> + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, >>>>>>>> count_scale, >>>>>>>> + id->src_node->name (), id->dst_node->name ()); >>>>>>>> + >>>>>>>> /* Register specific tree functions. */ >>>>>>>> gimple_register_cfg_hooks (); >>>>>>>> >>>>>>>> Index: tree-profile.c >>>>>>>> =================================================================== >>>>>>>> --- tree-profile.c (revision 207436) >>>>>>>> +++ tree-profile.c (working copy) >>>>>>>> @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un >>>>>>>> gsi_insert_before (&gsi, call, GSI_NEW_STMT); >>>>>>>> } >>>>>>>> >>>>>>>> +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph >>>>>>>> edges. */ >>>>>>>> + >>>>>>>> +static void >>>>>>>> +rebuild_cgraph (bool update_calls) >>>>>>>> +{ >>>>>>>> + struct cgraph_node *node; >>>>>>>> + >>>>>>>> + FOR_EACH_DEFINED_FUNCTION (node) >>>>>>>> + { >>>>>>>> + basic_block bb; >>>>>>>> + >>>>>>>> + if (!gimple_has_body_p (node->decl) >>>>>>>> + || !(!node->clone_of >>>>>>>> + || node->decl != node->clone_of->decl)) >>>>>>>> + continue; >>>>>>>> + >>>>>>>> + /* Don't profile functions produced for builtin stuff. */ >>>>>>>> + if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) >>>>>>>> + continue; >>>>>>>> + >>>>>>>> + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >>>>>>>> + >>>>>>>> + if (update_calls) >>>>>>>> + { >>>>>>>> + FOR_EACH_BB_FN (bb, cfun) >>>>>>>> + { >>>>>>>> + gimple_stmt_iterator gsi; >>>>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); >>>>>>>> gsi_next (&gsi)) >>>>>>>> + { >>>>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>>>> + if (is_gimple_call (stmt)) >>>>>>>> + update_stmt (stmt); >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* re-merge split blocks. */ >>>>>>>> + cleanup_tree_cfg (); >>>>>>>> + update_ssa (TODO_update_ssa); >>>>>>>> + } >>>>>>>> + >>>>>>>> + rebuild_cgraph_edges (); >>>>>>>> + >>>>>>>> + pop_cfun (); >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> /* Profile all functions in the callgraph. */ >>>>>>>> >>>>>>>> static unsigned int >>>>>>>> @@ -622,43 +668,14 @@ tree_profiling (void) >>>>>>>> } >>>>>>>> >>>>>>>> /* Update call statements and rebuild the cgraph. */ >>>>>>>> - FOR_EACH_DEFINED_FUNCTION (node) >>>>>>>> - { >>>>>>>> - basic_block bb; >>>>>>>> + rebuild_cgraph (true); >>>>>>>> >>>>>>>> - if (!gimple_has_body_p (node->decl) >>>>>>>> - || !(!node->clone_of >>>>>>>> - || node->decl != node->clone_of->decl)) >>>>>>>> - continue; >>>>>>>> + /* If the profiles were dropped on any functions, unfortunately we >>>>>>>> + need to rebuild the cgraph to propagate the new reestimated >>>>>>>> counts >>>>>>>> + and avoid sanity failures due to inconsistencies. */ >>>>>>>> + if (handle_missing_profiles ()) >>>>>>>> + rebuild_cgraph (false); >>>>>>>> >>>>>>>> - /* Don't profile functions produced for builtin stuff. */ >>>>>>>> - if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) >>>>>>>> - continue; >>>>>>>> - >>>>>>>> - push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >>>>>>>> - >>>>>>>> - FOR_EACH_BB_FN (bb, cfun) >>>>>>>> - { >>>>>>>> - gimple_stmt_iterator gsi; >>>>>>>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next >>>>>>>> (&gsi)) >>>>>>>> - { >>>>>>>> - gimple stmt = gsi_stmt (gsi); >>>>>>>> - if (is_gimple_call (stmt)) >>>>>>>> - update_stmt (stmt); >>>>>>>> - } >>>>>>>> - } >>>>>>>> - >>>>>>>> - /* re-merge split blocks. */ >>>>>>>> - cleanup_tree_cfg (); >>>>>>>> - update_ssa (TODO_update_ssa); >>>>>>>> - >>>>>>>> - rebuild_cgraph_edges (); >>>>>>>> - >>>>>>>> - pop_cfun (); >>>>>>>> - } >>>>>>>> - >>>>>>>> - handle_missing_profiles (); >>>>>>>> - >>>>>>>> del_node_map (); >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Teresa Johnson | Software Engineer | tejohn...@google.com | >>>>>>>> 408-460-2413 >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413