On Tue, Feb 11, 2014 at 6:13 PM, Xinliang David Li <davi...@google.com> wrote: > On Tue, Feb 11, 2014 at 5:36 PM, Teresa Johnson <tejohn...@google.com> wrote: >> On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li <davi...@google.com> >> wrote: >>> Why is call graph needed to determine whether to drop the profile? >> >> Because we detect this situation by looking for cases where the call >> edge counts greatly exceed the callee node count. >> >>> >>> If that is needed, it might be possible to leverage the ipa_profile >>> pass as it will walk through all function nodes to do profile >>> annotation. With this you can make decision to drop callee profile in >>> caller's context. >> >> There are 2 ipa profiling passes, which are somewhat confusingly named >> (to me at least. =). This is being done during the first. >> >> The first is pass_ipa_tree_profile in tree-profile.c, but is a >> SIMPLE_IPA_PASS and has the name "profile" in the dump. The second is >> pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the >> name "profile_estimate" in the dump. I assume you are suggesting to >> move this into the latter? But I'm not clear on what benefit that >> gives - the functions are not being traversed in order, so there is >> still the issue of needing to rebuild the cgraph after dropping >> profiles, which might be best done earlier as I have in the patch. > > > I meant the tree-profile one. I think this might work: after all the > function's profile counts are annotated, add another walk of the the > call graph nodes to drop bad profiles before the the call graph is > rebuilt (Call graph does exist at that point).
Ok, so it is already done in tree-profile. But it sounds like you are suggesting reordering it to just above where we update the calls and rebuild the cgraph the first time? As you noted in a follow-on email to me, the cgraph edges don't have the profile counts at that point (and neither do the nodes), so I would need to compare the count on the call's bb to the entry bb count of the callee. That should be doable, let me take a stab at it. Thanks, Teresa > > David >> >> Teresa >> >>> >>> David >>> >>> On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson <tejohn...@google.com> >>> wrote: >>>> On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li <davi...@google.com> >>>> wrote: >>>>> Is it better to add some logic in counts_to_freq to determine if the >>>>> profile count needs to be dropped completely to force profile >>>>> estimation? >>>> >>>> This is the problem I was mentioning below where we call >>>> counts_to_freqs before we have the cgraph and can tell that we will >>>> need to drop the profile. When we were only dropping the profile for >>>> functions with all 0 counts, we simply avoided doing the >>>> counts_to_freqs when the counts were all 0, which works since the 0 >>>> counts don't leave things in an inconsistent state (counts vs >>>> estimated frequencies). >>>> >>>> Teresa >>>> >>>>> >>>>> David >>>>> >>>>> On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <tejohn...@google.com> >>>>> wrote: >>>>>> This patch attempts to address the lost profile issue for COMDATs in >>>>>> more circumstances, exposed by function splitting. >>>>>> >>>>>> My earlier patch handled the case where the comdat had 0 counts since >>>>>> the linker kept the copy in a different module. In that case we >>>>>> prevent the guessed frequencies on 0-count functions from being >>>>>> dropped by counts_to_freq, and then later mark any reached via >>>>>> non-zero callgraph edges as guessed. Finally, when one such 0-count >>>>>> comdat is inlined the call count is propagated to the callee blocks >>>>>> using the guessed probabilities. >>>>>> >>>>>> However, in this case, there was a comdat that had a very small >>>>>> non-zero count, that was being inlined to a much hotter callsite. This >>>>>> could happen when there was a copy that was ipa-inlined >>>>>> in the profile gen compile, so the copy in that module gets some >>>>>> non-zero counts from the ipa inlined instance, but when the out of >>>>>> line copy was eliminated by the linker (selected from a different >>>>>> module). In this case the inliner was scaling the bb counts up quite a >>>>>> lot when inlining. The problem is that you most likely can't trust >>>>>> that the 0 count bbs in such a case are really not executed by the >>>>>> callsite it is being into, since the counts are very small and >>>>>> correspond to a different callsite. In some internal C++ applications >>>>>> I am seeing that we execute out of the split cold portion of COMDATs >>>>>> for this reason. >>>>>> >>>>>> This problem is more complicated to address than the 0-count instance, >>>>>> because we need the cgraph to determine which functions to drop the >>>>>> profile on, and at that point the estimated frequencies have already >>>>>> been overwritten by counts_to_freqs. To handle this broader case, I >>>>>> have changed the drop_profile routine to simply re-estimate the >>>>>> probabilities/frequencies (and translate these into counts scaled from >>>>>> the incoming call edge counts). This unfortunately necessitates >>>>>> rebuilding the cgraph, to propagate the new synthesized counts and >>>>>> avoid checking failures downstream. But it will only be rebuilt if we >>>>>> dropped any profiles. With this solution, some of the older approach >>>>>> can be removed (e.g. propagating counts using the guessed >>>>>> probabilities during inlining). >>>>>> >>>>>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. >>>>>> Also tested on >>>>>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? >>>>>> >>>>>> Thanks, >>>>>> Teresa >>>>>> >>>>>> 2014-02-10 Teresa Johnson <tejohn...@google.com> >>>>>> >>>>>> * graphite.c (graphite_finalize): Pass new parameter. >>>>>> * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. >>>>>> * predict.c (tree_estimate_probability): New parameter. >>>>>> (tree_estimate_probability_worker): Renamed from >>>>>> tree_estimate_probability_driver. >>>>>> (tree_reestimate_probability): New function. >>>>>> (tree_estimate_probability_driver): Invoke >>>>>> tree_estimate_probability_worker. >>>>>> (freqs_to_counts): Move from tree-inline.c. >>>>>> (drop_profile): Re-estimated profiles when dropping counts. >>>>>> (handle_missing_profiles): Drop for some non-zero functions as >>>>>> well. >>>>>> (counts_to_freqs): Remove code obviated by reestimation. >>>>>> * predict.h (handle_missing_profiles): Update declartion. >>>>>> (tree_estimate_probability): Ditto. >>>>>> * tree-inline.c (freqs_to_counts): Move to predict.c. >>>>>> (copy_cfg_body): Remove code obviated by reestimation. >>>>>> * tree-profile.c (gimple_gen_ior_profiler): >>>>>> (rebuild_cgraph): Code extracted from tree_profiling to rebuild >>>>>> cgraph. >>>>>> (tree_profiling): Invoke rebuild_cgraph as needed. >>>>>> >>>>>> Index: graphite.c >>>>>> =================================================================== >>>>>> --- graphite.c (revision 207436) >>>>>> +++ graphite.c (working copy) >>>>>> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) >>>>>> cleanup_tree_cfg (); >>>>>> profile_status_for_fn (cfun) = PROFILE_ABSENT; >>>>>> release_recorded_exits (); >>>>>> - tree_estimate_probability (); >>>>>> + tree_estimate_probability (false); >>>>>> } >>>>>> >>>>>> cloog_state_free (cloog_state); >>>>>> Index: params.def >>>>>> =================================================================== >>>>>> --- params.def (revision 207436) >>>>>> +++ params.def (working copy) >>>>>> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, >>>>>> "Maximal estimated outcome of branch considered predictable", >>>>>> 2, 0, 50) >>>>>> >>>>>> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, >>>>>> + "min-caller-reestimate-ratio", >>>>>> + "Minimum caller-to-callee node count ratio to force >>>>>> reestimated branch " >>>>>> + "probabilities in callee (where 0 means only when callee >>>>>> count is 0)", >>>>>> + 10, 0, 0) >>>>>> + >>>>>> DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, >>>>>> "inline-min-speedup", >>>>>> "The minimal estimated speedup allowing inliner to ignore >>>>>> inline-insns-single and inline-isnsns-auto", >>>>>> Index: predict.c >>>>>> =================================================================== >>>>>> --- predict.c (revision 207436) >>>>>> +++ predict.c (working copy) >>>>>> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb) >>>>>> >>>>>> /* Predict branch probabilities and estimate profile of the tree CFG. >>>>>> This function can be called from the loop optimizers to recompute >>>>>> - the profile information. */ >>>>>> + the profile information. When REDO is true then we are forcing >>>>>> + re-estimation of the probabilities because the profile was deemed >>>>>> + insufficient. */ >>>>>> >>>>>> void >>>>>> -tree_estimate_probability (void) >>>>>> +tree_estimate_probability (bool redo) >>>>>> { >>>>>> basic_block bb; >>>>>> >>>>>> @@ -2390,7 +2392,8 @@ void >>>>>> connect_infinite_loops_to_exit (); >>>>>> /* We use loop_niter_by_eval, which requires that the loops have >>>>>> preheaders. */ >>>>>> - create_preheaders (CP_SIMPLE_PREHEADERS); >>>>>> + if (!redo) >>>>>> + create_preheaders (CP_SIMPLE_PREHEADERS); >>>>>> calculate_dominance_info (CDI_POST_DOMINATORS); >>>>>> >>>>>> bb_predictions = pointer_map_create (); >>>>>> @@ -2412,16 +2415,16 @@ void >>>>>> pointer_map_destroy (bb_predictions); >>>>>> bb_predictions = NULL; >>>>>> >>>>>> - estimate_bb_frequencies (false); >>>>>> + estimate_bb_frequencies (redo); >>>>>> free_dominance_info (CDI_POST_DOMINATORS); >>>>>> remove_fake_exit_edges (); >>>>>> } >>>>>> >>>>>> /* Predict branch probabilities and estimate profile of the tree CFG. >>>>>> - This is the driver function for PASS_PROFILE. */ >>>>>> + When REDO is true, we are forcing reestimation of the probabilities. >>>>>> */ >>>>>> >>>>>> -static unsigned int >>>>>> -tree_estimate_probability_driver (void) >>>>>> +static void >>>>>> +tree_estimate_probability_worker (bool redo) >>>>>> { >>>>>> unsigned nb_loops; >>>>>> >>>>>> @@ -2435,7 +2438,7 @@ void >>>>>> if (nb_loops > 1) >>>>>> scev_initialize (); >>>>>> >>>>>> - tree_estimate_probability (); >>>>>> + tree_estimate_probability (redo); >>>>>> >>>>>> if (nb_loops > 1) >>>>>> scev_finalize (); >>>>>> @@ -2445,6 +2448,34 @@ void >>>>>> gimple_dump_cfg (dump_file, dump_flags); >>>>>> if (profile_status_for_fn (cfun) == PROFILE_ABSENT) >>>>>> profile_status_for_fn (cfun) = PROFILE_GUESSED; >>>>>> +} >>>>>> + >>>>>> +/* Force re-estimation of the probabilities, because the profile was >>>>>> + deemed insufficient. */ >>>>>> + >>>>>> +static void >>>>>> +tree_reestimate_probability (void) >>>>>> +{ >>>>>> + basic_block bb; >>>>>> + edge e; >>>>>> + edge_iterator ei; >>>>>> + >>>>>> + /* Need to reset the counts to ensure probabilities are recomputed. >>>>>> */ >>>>>> + FOR_EACH_BB_FN (bb, cfun) >>>>>> + { >>>>>> + bb->count = 0; >>>>>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>>>> + e->count = 0; >>>>>> + } >>>>>> + tree_estimate_probability_worker (true); >>>>>> +} >>>>>> + >>>>>> +/* Estimate probabilities. >>>>>> + This is the driver function for PASS_PROFILE. */ >>>>>> +static unsigned int >>>>>> +tree_estimate_probability_driver (void) >>>>>> +{ >>>>>> + tree_estimate_probability_worker (false); >>>>>> return 0; >>>>>> } >>>>>> ^L >>>>>> @@ -2765,6 +2796,28 @@ estimate_loops (void) >>>>>> BITMAP_FREE (tovisit); >>>>>> } >>>>>> >>>>>> +/* Convert estimated frequencies into counts for NODE, scaling COUNT >>>>>> + with each bb's frequency. Used when NODE has an entry count that >>>>>> + is much lower than the caller edges reaching it. See the comments >>>>>> + for handle_missing_profiles() for when this can happen for COMDATs. >>>>>> */ >>>>>> + >>>>>> +void >>>>>> +freqs_to_counts (struct cgraph_node *node, gcov_type count) >>>>>> +{ >>>>>> + basic_block bb; >>>>>> + edge_iterator ei; >>>>>> + edge e; >>>>>> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>> + >>>>>> + FOR_ALL_BB_FN(bb, fn) >>>>>> + { >>>>>> + bb->count = apply_scale (count, >>>>>> + GCOV_COMPUTE_SCALE (bb->frequency, >>>>>> BB_FREQ_MAX)); >>>>>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>>>> + e->count = apply_probability (e->src->count, e->probability); >>>>>> + } >>>>>> +} >>>>>> + >>>>>> /* Drop the profile for NODE to guessed, and update its frequency based >>>>>> on >>>>>> whether it is expected to be hot given the CALL_COUNT. */ >>>>>> >>>>>> @@ -2772,6 +2825,9 @@ static void >>>>>> drop_profile (struct cgraph_node *node, gcov_type call_count) >>>>>> { >>>>>> struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>> + >>>>>> + if (profile_status_for_fn (fn) == PROFILE_GUESSED) >>>>>> + return; >>>>>> /* In the case where this was called by another function with a >>>>>> dropped profile, call_count will be 0. Since there are no >>>>>> non-zero call counts to this function, we don't know for sure >>>>>> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type >>>>>> >>>>>> if (dump_file) >>>>>> fprintf (dump_file, >>>>>> - "Dropping 0 profile for %s/%i. %s based on calls.\n", >>>>>> + "Dropping %ld profile for %s/%i. %s based on calls.\n", >>>>>> + node->count, >>>>>> node->name (), node->order, >>>>>> hot ? "Function is hot" : "Function is normal"); >>>>>> /* We only expect to miss profiles for functions that are reached >>>>>> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type >>>>>> node->name (), node->order); >>>>>> } >>>>>> >>>>>> + /* Re-estimate the probabilities for function and use the estimated >>>>>> + frequencies to compute the counts. */ >>>>>> + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >>>>>> + tree_reestimate_probability (); >>>>>> + freqs_to_counts (node, call_count); >>>>>> + if (dump_file) >>>>>> + { >>>>>> + fprintf (dump_file, "After re-estimating probabilies and >>>>>> counts\n"); >>>>>> + gimple_dump_cfg (dump_file, >>>>>> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS); >>>>>> + } >>>>>> + pop_cfun (); >>>>>> + >>>>>> profile_status_for_fn (fn) >>>>>> = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >>>>>> node->frequency >>>>>> @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, gcov_type >>>>>> /* In the case of COMDAT routines, multiple object files will contain >>>>>> the same >>>>>> function and the linker will select one for the binary. In that case >>>>>> all the other copies from the profile instrument binary will be >>>>>> missing >>>>>> - profile counts. Look for cases where this happened, due to non-zero >>>>>> + profile counts. This can confuse downstream optimizations such as >>>>>> + function splitting. >>>>>> + >>>>>> + Look for cases where this happened, due to non-zero >>>>>> call counts going to 0-count functions, and drop the profile to >>>>>> guessed >>>>>> so that we can use the estimated probabilities and avoid optimizing >>>>>> only >>>>>> - for size. >>>>>> + for size. In the case where the COMDAT was inlined in some locations >>>>>> + within the file and not others, the profile count will be non-zero >>>>>> due >>>>>> + to the inlined instances, but may still be significantly smaller >>>>>> than the >>>>>> + call edges for the non-inlined instances. Detect that case when >>>>>> requested >>>>>> + and reestimate probabilities, since the counts will not necessarily >>>>>> reflect >>>>>> + the behavior along the more frequent call paths. >>>>>> >>>>>> The other case where the profile may be missing is when the routine >>>>>> is not going to be emitted to the object file, e.g. for "extern >>>>>> template" >>>>>> class methods. Those will be marked DECL_EXTERNAL. Emit a warning in >>>>>> all other cases of non-zero calls to 0-count functions. */ >>>>>> >>>>>> -void >>>>>> +bool >>>>>> handle_missing_profiles (void) >>>>>> { >>>>>> struct cgraph_node *node; >>>>>> int unlikely_count_fraction = PARAM_VALUE >>>>>> (UNLIKELY_BB_COUNT_FRACTION); >>>>>> vec<struct cgraph_node *> worklist; >>>>>> worklist.create (64); >>>>>> + int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO); >>>>>> + bool changed = false; >>>>>> >>>>>> - /* See if 0 count function has non-0 count callers. In this case we >>>>>> - lost some profile. Drop its function profile to PROFILE_GUESSED. >>>>>> */ >>>>>> + /* See if 0 or low count function has higher count caller edges. In >>>>>> this >>>>>> + case we lost some profile. Drop its function profile to >>>>>> + PROFILE_GUESSED. */ >>>>>> FOR_EACH_DEFINED_FUNCTION (node) >>>>>> { >>>>>> struct cgraph_edge *e; >>>>>> @@ -2842,8 +2922,6 @@ handle_missing_profiles (void) >>>>>> gcov_type max_tp_first_run = 0; >>>>>> struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>> >>>>>> - if (node->count) >>>>>> - continue; >>>>>> for (e = node->callers; e; e = e->next_caller) >>>>>> { >>>>>> call_count += e->count; >>>>>> @@ -2852,6 +2930,12 @@ handle_missing_profiles (void) >>>>>> max_tp_first_run = e->caller->tp_first_run; >>>>>> } >>>>>> >>>>>> + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only >>>>>> drop >>>>>> + profiles for 0-count functions called by non-zero call edges. >>>>>> */ >>>>>> + if ((!min_reest_ratio && node->count > 0) >>>>>> + || (min_reest_ratio && node->count * min_reest_ratio > >>>>>> call_count)) >>>>>> + continue; >>>>>> + >>>>>> /* If time profile is missing, let assign the maximum that comes >>>>>> from >>>>>> caller functions. */ >>>>>> if (!node->tp_first_run && max_tp_first_run) >>>>>> @@ -2862,11 +2946,12 @@ handle_missing_profiles (void) >>>>>> && (call_count * unlikely_count_fraction >= >>>>>> profile_info->runs)) >>>>>> { >>>>>> drop_profile (node, call_count); >>>>>> + changed = true; >>>>>> worklist.safe_push (node); >>>>>> } >>>>>> } >>>>>> >>>>>> - /* Propagate the profile dropping to other 0-count COMDATs that are >>>>>> + /* Propagate the profile dropping to other low-count COMDATs that are >>>>>> potentially called by COMDATs we already dropped the profile on. >>>>>> */ >>>>>> while (worklist.length () > 0) >>>>>> { >>>>>> @@ -2878,17 +2963,33 @@ handle_missing_profiles (void) >>>>>> struct cgraph_node *callee = e->callee; >>>>>> struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >>>>>> >>>>>> - if (callee->count > 0) >>>>>> + /* When min_reest_ratio is non-zero, if we get here we dropped >>>>>> + a caller's profile since it was significantly smaller than >>>>>> its >>>>>> + call edge. Drop the profile on any callees whose node >>>>>> count is >>>>>> + now exceeded by the new caller node count. */ >>>>>> + if ((!min_reest_ratio && callee->count > 0) >>>>>> + || (min_reest_ratio && callee->count >= node->count)) >>>>>> continue; >>>>>> + >>>>>> + gcov_type call_count = 0; >>>>>> + if (min_reest_ratio > 0) >>>>>> + { >>>>>> + struct cgraph_edge *e2; >>>>>> + for (e2 = node->callers; e2; e2 = e2->next_caller) >>>>>> + call_count += e2->count; >>>>>> + } >>>>>> + >>>>>> if (DECL_COMDAT (callee->decl) && fn && fn->cfg >>>>>> && profile_status_for_fn (fn) == PROFILE_READ) >>>>>> { >>>>>> - drop_profile (node, 0); >>>>>> + drop_profile (node, call_count); >>>>>> + changed = true; >>>>>> worklist.safe_push (callee); >>>>>> } >>>>>> } >>>>>> } >>>>>> worklist.release (); >>>>>> + return changed; >>>>>> } >>>>>> >>>>>> /* Convert counts measured by profile driven feedback to frequencies. >>>>>> @@ -2900,12 +3001,6 @@ counts_to_freqs (void) >>>>>> gcov_type count_max, true_count_max = 0; >>>>>> basic_block bb; >>>>>> >>>>>> - /* Don't overwrite the estimated frequencies when the profile for >>>>>> - the function is missing. We may drop this function PROFILE_GUESSED >>>>>> - later in drop_profile (). */ >>>>>> - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count) >>>>>> - return 0; >>>>>> - >>>>>> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) >>>>>> true_count_max = MAX (bb->count, true_count_max); >>>>>> >>>>>> Index: predict.h >>>>>> =================================================================== >>>>>> --- predict.h (revision 207436) >>>>>> +++ predict.h (working copy) >>>>>> @@ -47,11 +47,11 @@ enum prediction >>>>>> >>>>>> extern void predict_insn_def (rtx, enum br_predictor, enum prediction); >>>>>> extern int counts_to_freqs (void); >>>>>> -extern void handle_missing_profiles (void); >>>>>> +extern bool handle_missing_profiles (void); >>>>>> extern void estimate_bb_frequencies (bool); >>>>>> extern const char *predictor_name (enum br_predictor); >>>>>> extern tree build_predict_expr (enum br_predictor, enum prediction); >>>>>> -extern void tree_estimate_probability (void); >>>>>> +extern void tree_estimate_probability (bool); >>>>>> extern void compute_function_frequency (void); >>>>>> extern void rebuild_frequencies (void); >>>>>> >>>>>> Index: tree-inline.c >>>>>> =================================================================== >>>>>> --- tree-inline.c (revision 207436) >>>>>> +++ tree-inline.c (working copy) >>>>>> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo >>>>>> } >>>>>> } >>>>>> >>>>>> -/* Convert estimated frequencies into counts for NODE, scaling COUNT >>>>>> - with each bb's frequency. Used when NODE has a 0-weight entry >>>>>> - but we are about to inline it into a non-zero count call bb. >>>>>> - See the comments for handle_missing_profiles() in predict.c for >>>>>> - when this can happen for COMDATs. */ >>>>>> - >>>>>> -void >>>>>> -freqs_to_counts (struct cgraph_node *node, gcov_type count) >>>>>> -{ >>>>>> - basic_block bb; >>>>>> - edge_iterator ei; >>>>>> - edge e; >>>>>> - struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >>>>>> - >>>>>> - FOR_ALL_BB_FN(bb, fn) >>>>>> - { >>>>>> - bb->count = apply_scale (count, >>>>>> - GCOV_COMPUTE_SCALE (bb->frequency, >>>>>> BB_FREQ_MAX)); >>>>>> - FOR_EACH_EDGE (e, ei, bb->succs) >>>>>> - e->count = apply_probability (e->src->count, e->probability); >>>>>> - } >>>>>> -} >>>>>> - >>>>>> /* Make a copy of the body of FN so that it can be inserted inline in >>>>>> another function. Walks FN via CFG, returns new fndecl. */ >>>>>> >>>>>> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun >>>>>> int incoming_frequency = 0; >>>>>> gcov_type incoming_count = 0; >>>>>> >>>>>> - /* This can happen for COMDAT routines that end up with 0 counts >>>>>> - despite being called (see the comments for >>>>>> handle_missing_profiles() >>>>>> - in predict.c as to why). Apply counts to the blocks in the callee >>>>>> - before inlining, using the guessed edge frequencies, so that we >>>>>> don't >>>>>> - end up with a 0-count inline body which can confuse downstream >>>>>> - optimizations such as function splitting. */ >>>>>> - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count) >>>>>> - { >>>>>> - /* Apply the larger of the call bb count and the total incoming >>>>>> - call edge count to the callee. */ >>>>>> - gcov_type in_count = 0; >>>>>> - struct cgraph_edge *in_edge; >>>>>> - for (in_edge = id->src_node->callers; in_edge; >>>>>> - in_edge = in_edge->next_caller) >>>>>> - in_count += in_edge->count; >>>>>> - freqs_to_counts (id->src_node, count > in_count ? count : >>>>>> in_count); >>>>>> - } >>>>>> - >>>>>> if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count) >>>>>> count_scale >>>>>> = GCOV_COMPUTE_SCALE (count, >>>>>> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun >>>>>> else >>>>>> count_scale = REG_BR_PROB_BASE; >>>>>> >>>>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>>>> + fprintf (dump_file, >>>>>> + "Scaling entry count %ld to %ld with scale %ld while >>>>>> inlining " >>>>>> + "%s into %s\n", >>>>>> + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, >>>>>> count_scale, >>>>>> + id->src_node->name (), id->dst_node->name ()); >>>>>> + >>>>>> /* Register specific tree functions. */ >>>>>> gimple_register_cfg_hooks (); >>>>>> >>>>>> Index: tree-profile.c >>>>>> =================================================================== >>>>>> --- tree-profile.c (revision 207436) >>>>>> +++ tree-profile.c (working copy) >>>>>> @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un >>>>>> gsi_insert_before (&gsi, call, GSI_NEW_STMT); >>>>>> } >>>>>> >>>>>> +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph >>>>>> edges. */ >>>>>> + >>>>>> +static void >>>>>> +rebuild_cgraph (bool update_calls) >>>>>> +{ >>>>>> + struct cgraph_node *node; >>>>>> + >>>>>> + FOR_EACH_DEFINED_FUNCTION (node) >>>>>> + { >>>>>> + basic_block bb; >>>>>> + >>>>>> + if (!gimple_has_body_p (node->decl) >>>>>> + || !(!node->clone_of >>>>>> + || node->decl != node->clone_of->decl)) >>>>>> + continue; >>>>>> + >>>>>> + /* Don't profile functions produced for builtin stuff. */ >>>>>> + if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) >>>>>> + continue; >>>>>> + >>>>>> + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >>>>>> + >>>>>> + if (update_calls) >>>>>> + { >>>>>> + FOR_EACH_BB_FN (bb, cfun) >>>>>> + { >>>>>> + gimple_stmt_iterator gsi; >>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next >>>>>> (&gsi)) >>>>>> + { >>>>>> + gimple stmt = gsi_stmt (gsi); >>>>>> + if (is_gimple_call (stmt)) >>>>>> + update_stmt (stmt); >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /* re-merge split blocks. */ >>>>>> + cleanup_tree_cfg (); >>>>>> + update_ssa (TODO_update_ssa); >>>>>> + } >>>>>> + >>>>>> + rebuild_cgraph_edges (); >>>>>> + >>>>>> + pop_cfun (); >>>>>> + } >>>>>> +} >>>>>> + >>>>>> /* Profile all functions in the callgraph. */ >>>>>> >>>>>> static unsigned int >>>>>> @@ -622,43 +668,14 @@ tree_profiling (void) >>>>>> } >>>>>> >>>>>> /* Update call statements and rebuild the cgraph. */ >>>>>> - FOR_EACH_DEFINED_FUNCTION (node) >>>>>> - { >>>>>> - basic_block bb; >>>>>> + rebuild_cgraph (true); >>>>>> >>>>>> - if (!gimple_has_body_p (node->decl) >>>>>> - || !(!node->clone_of >>>>>> - || node->decl != node->clone_of->decl)) >>>>>> - continue; >>>>>> + /* If the profiles were dropped on any functions, unfortunately we >>>>>> + need to rebuild the cgraph to propagate the new reestimated counts >>>>>> + and avoid sanity failures due to inconsistencies. */ >>>>>> + if (handle_missing_profiles ()) >>>>>> + rebuild_cgraph (false); >>>>>> >>>>>> - /* Don't profile functions produced for builtin stuff. */ >>>>>> - if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) >>>>>> - continue; >>>>>> - >>>>>> - push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >>>>>> - >>>>>> - FOR_EACH_BB_FN (bb, cfun) >>>>>> - { >>>>>> - gimple_stmt_iterator gsi; >>>>>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next >>>>>> (&gsi)) >>>>>> - { >>>>>> - gimple stmt = gsi_stmt (gsi); >>>>>> - if (is_gimple_call (stmt)) >>>>>> - update_stmt (stmt); >>>>>> - } >>>>>> - } >>>>>> - >>>>>> - /* re-merge split blocks. */ >>>>>> - cleanup_tree_cfg (); >>>>>> - update_ssa (TODO_update_ssa); >>>>>> - >>>>>> - rebuild_cgraph_edges (); >>>>>> - >>>>>> - pop_cfun (); >>>>>> - } >>>>>> - >>>>>> - handle_missing_profiles (); >>>>>> - >>>>>> del_node_map (); >>>>>> return 0; >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413