Is it better to add some logic in counts_to_freq to determine if the profile count needs to be dropped completely to force profile estimation?
David On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <tejohn...@google.com> wrote: > This patch attempts to address the lost profile issue for COMDATs in > more circumstances, exposed by function splitting. > > My earlier patch handled the case where the comdat had 0 counts since > the linker kept the copy in a different module. In that case we > prevent the guessed frequencies on 0-count functions from being > dropped by counts_to_freq, and then later mark any reached via > non-zero callgraph edges as guessed. Finally, when one such 0-count > comdat is inlined the call count is propagated to the callee blocks > using the guessed probabilities. > > However, in this case, there was a comdat that had a very small > non-zero count, that was being inlined to a much hotter callsite. This > could happen when there was a copy that was ipa-inlined > in the profile gen compile, so the copy in that module gets some > non-zero counts from the ipa inlined instance, but when the out of > line copy was eliminated by the linker (selected from a different > module). In this case the inliner was scaling the bb counts up quite a > lot when inlining. The problem is that you most likely can't trust > that the 0 count bbs in such a case are really not executed by the > callsite it is being into, since the counts are very small and > correspond to a different callsite. In some internal C++ applications > I am seeing that we execute out of the split cold portion of COMDATs > for this reason. > > This problem is more complicated to address than the 0-count instance, > because we need the cgraph to determine which functions to drop the > profile on, and at that point the estimated frequencies have already > been overwritten by counts_to_freqs. To handle this broader case, I > have changed the drop_profile routine to simply re-estimate the > probabilities/frequencies (and translate these into counts scaled from > the incoming call edge counts). This unfortunately necessitates > rebuilding the cgraph, to propagate the new synthesized counts and > avoid checking failures downstream. But it will only be rebuilt if we > dropped any profiles. With this solution, some of the older approach > can be removed (e.g. propagating counts using the guessed > probabilities during inlining). > > Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. > Also tested on > a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? > > Thanks, > Teresa > > 2014-02-10 Teresa Johnson <tejohn...@google.com> > > * graphite.c (graphite_finalize): Pass new parameter. > * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. > * predict.c (tree_estimate_probability): New parameter. > (tree_estimate_probability_worker): Renamed from > tree_estimate_probability_driver. > (tree_reestimate_probability): New function. > (tree_estimate_probability_driver): Invoke > tree_estimate_probability_worker. > (freqs_to_counts): Move from tree-inline.c. > (drop_profile): Re-estimated profiles when dropping counts. > (handle_missing_profiles): Drop for some non-zero functions as well. > (counts_to_freqs): Remove code obviated by reestimation. > * predict.h (handle_missing_profiles): Update declartion. > (tree_estimate_probability): Ditto. > * tree-inline.c (freqs_to_counts): Move to predict.c. > (copy_cfg_body): Remove code obviated by reestimation. > * tree-profile.c (gimple_gen_ior_profiler): > (rebuild_cgraph): Code extracted from tree_profiling to rebuild > cgraph. > (tree_profiling): Invoke rebuild_cgraph as needed. > > Index: graphite.c > =================================================================== > --- graphite.c (revision 207436) > +++ graphite.c (working copy) > @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) > cleanup_tree_cfg (); > profile_status_for_fn (cfun) = PROFILE_ABSENT; > release_recorded_exits (); > - tree_estimate_probability (); > + tree_estimate_probability (false); > } > > cloog_state_free (cloog_state); > Index: params.def > =================================================================== > --- params.def (revision 207436) > +++ params.def (working copy) > @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, > "Maximal estimated outcome of branch considered predictable", > 2, 0, 50) > > +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, > + "min-caller-reestimate-ratio", > + "Minimum caller-to-callee node count ratio to force > reestimated branch " > + "probabilities in callee (where 0 means only when callee > count is 0)", > + 10, 0, 0) > + > DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, > "inline-min-speedup", > "The minimal estimated speedup allowing inliner to ignore > inline-insns-single and inline-isnsns-auto", > Index: predict.c > =================================================================== > --- predict.c (revision 207436) > +++ predict.c (working copy) > @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb) > > /* Predict branch probabilities and estimate profile of the tree CFG. > This function can be called from the loop optimizers to recompute > - the profile information. */ > + the profile information. When REDO is true then we are forcing > + re-estimation of the probabilities because the profile was deemed > + insufficient. */ > > void > -tree_estimate_probability (void) > +tree_estimate_probability (bool redo) > { > basic_block bb; > > @@ -2390,7 +2392,8 @@ void > connect_infinite_loops_to_exit (); > /* We use loop_niter_by_eval, which requires that the loops have > preheaders. */ > - create_preheaders (CP_SIMPLE_PREHEADERS); > + if (!redo) > + create_preheaders (CP_SIMPLE_PREHEADERS); > calculate_dominance_info (CDI_POST_DOMINATORS); > > bb_predictions = pointer_map_create (); > @@ -2412,16 +2415,16 @@ void > pointer_map_destroy (bb_predictions); > bb_predictions = NULL; > > - estimate_bb_frequencies (false); > + estimate_bb_frequencies (redo); > free_dominance_info (CDI_POST_DOMINATORS); > remove_fake_exit_edges (); > } > > /* Predict branch probabilities and estimate profile of the tree CFG. > - This is the driver function for PASS_PROFILE. */ > + When REDO is true, we are forcing reestimation of the probabilities. */ > > -static unsigned int > -tree_estimate_probability_driver (void) > +static void > +tree_estimate_probability_worker (bool redo) > { > unsigned nb_loops; > > @@ -2435,7 +2438,7 @@ void > if (nb_loops > 1) > scev_initialize (); > > - tree_estimate_probability (); > + tree_estimate_probability (redo); > > if (nb_loops > 1) > scev_finalize (); > @@ -2445,6 +2448,34 @@ void > gimple_dump_cfg (dump_file, dump_flags); > if (profile_status_for_fn (cfun) == PROFILE_ABSENT) > profile_status_for_fn (cfun) = PROFILE_GUESSED; > +} > + > +/* Force re-estimation of the probabilities, because the profile was > + deemed insufficient. */ > + > +static void > +tree_reestimate_probability (void) > +{ > + basic_block bb; > + edge e; > + edge_iterator ei; > + > + /* Need to reset the counts to ensure probabilities are recomputed. */ > + FOR_EACH_BB_FN (bb, cfun) > + { > + bb->count = 0; > + FOR_EACH_EDGE (e, ei, bb->succs) > + e->count = 0; > + } > + tree_estimate_probability_worker (true); > +} > + > +/* Estimate probabilities. > + This is the driver function for PASS_PROFILE. */ > +static unsigned int > +tree_estimate_probability_driver (void) > +{ > + tree_estimate_probability_worker (false); > return 0; > } > ^L > @@ -2765,6 +2796,28 @@ estimate_loops (void) > BITMAP_FREE (tovisit); > } > > +/* Convert estimated frequencies into counts for NODE, scaling COUNT > + with each bb's frequency. Used when NODE has an entry count that > + is much lower than the caller edges reaching it. See the comments > + for handle_missing_profiles() for when this can happen for COMDATs. */ > + > +void > +freqs_to_counts (struct cgraph_node *node, gcov_type count) > +{ > + basic_block bb; > + edge_iterator ei; > + edge e; > + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + FOR_ALL_BB_FN(bb, fn) > + { > + bb->count = apply_scale (count, > + GCOV_COMPUTE_SCALE (bb->frequency, > BB_FREQ_MAX)); > + FOR_EACH_EDGE (e, ei, bb->succs) > + e->count = apply_probability (e->src->count, e->probability); > + } > +} > + > /* Drop the profile for NODE to guessed, and update its frequency based on > whether it is expected to be hot given the CALL_COUNT. */ > > @@ -2772,6 +2825,9 @@ static void > drop_profile (struct cgraph_node *node, gcov_type call_count) > { > struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > + > + if (profile_status_for_fn (fn) == PROFILE_GUESSED) > + return; > /* In the case where this was called by another function with a > dropped profile, call_count will be 0. Since there are no > non-zero call counts to this function, we don't know for sure > @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type > > if (dump_file) > fprintf (dump_file, > - "Dropping 0 profile for %s/%i. %s based on calls.\n", > + "Dropping %ld profile for %s/%i. %s based on calls.\n", > + node->count, > node->name (), node->order, > hot ? "Function is hot" : "Function is normal"); > /* We only expect to miss profiles for functions that are reached > @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type > node->name (), node->order); > } > > + /* Re-estimate the probabilities for function and use the estimated > + frequencies to compute the counts. */ > + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); > + tree_reestimate_probability (); > + freqs_to_counts (node, call_count); > + if (dump_file) > + { > + fprintf (dump_file, "After re-estimating probabilies and counts\n"); > + gimple_dump_cfg (dump_file, > dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS); > + } > + pop_cfun (); > + > profile_status_for_fn (fn) > = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); > node->frequency > @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, gcov_type > /* In the case of COMDAT routines, multiple object files will contain the > same > function and the linker will select one for the binary. In that case > all the other copies from the profile instrument binary will be missing > - profile counts. Look for cases where this happened, due to non-zero > + profile counts. This can confuse downstream optimizations such as > + function splitting. > + > + Look for cases where this happened, due to non-zero > call counts going to 0-count functions, and drop the profile to guessed > so that we can use the estimated probabilities and avoid optimizing only > - for size. > + for size. In the case where the COMDAT was inlined in some locations > + within the file and not others, the profile count will be non-zero due > + to the inlined instances, but may still be significantly smaller than the > + call edges for the non-inlined instances. Detect that case when requested > + and reestimate probabilities, since the counts will not necessarily > reflect > + the behavior along the more frequent call paths. > > The other case where the profile may be missing is when the routine > is not going to be emitted to the object file, e.g. for "extern template" > class methods. Those will be marked DECL_EXTERNAL. Emit a warning in > all other cases of non-zero calls to 0-count functions. */ > > -void > +bool > handle_missing_profiles (void) > { > struct cgraph_node *node; > int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); > vec<struct cgraph_node *> worklist; > worklist.create (64); > + int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO); > + bool changed = false; > > - /* See if 0 count function has non-0 count callers. In this case we > - lost some profile. Drop its function profile to PROFILE_GUESSED. */ > + /* See if 0 or low count function has higher count caller edges. In this > + case we lost some profile. Drop its function profile to > + PROFILE_GUESSED. */ > FOR_EACH_DEFINED_FUNCTION (node) > { > struct cgraph_edge *e; > @@ -2842,8 +2922,6 @@ handle_missing_profiles (void) > gcov_type max_tp_first_run = 0; > struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > > - if (node->count) > - continue; > for (e = node->callers; e; e = e->next_caller) > { > call_count += e->count; > @@ -2852,6 +2930,12 @@ handle_missing_profiles (void) > max_tp_first_run = e->caller->tp_first_run; > } > > + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop > + profiles for 0-count functions called by non-zero call edges. */ > + if ((!min_reest_ratio && node->count > 0) > + || (min_reest_ratio && node->count * min_reest_ratio > call_count)) > + continue; > + > /* If time profile is missing, let assign the maximum that comes from > caller functions. */ > if (!node->tp_first_run && max_tp_first_run) > @@ -2862,11 +2946,12 @@ handle_missing_profiles (void) > && (call_count * unlikely_count_fraction >= profile_info->runs)) > { > drop_profile (node, call_count); > + changed = true; > worklist.safe_push (node); > } > } > > - /* Propagate the profile dropping to other 0-count COMDATs that are > + /* Propagate the profile dropping to other low-count COMDATs that are > potentially called by COMDATs we already dropped the profile on. */ > while (worklist.length () > 0) > { > @@ -2878,17 +2963,33 @@ handle_missing_profiles (void) > struct cgraph_node *callee = e->callee; > struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); > > - if (callee->count > 0) > + /* When min_reest_ratio is non-zero, if we get here we dropped > + a caller's profile since it was significantly smaller than its > + call edge. Drop the profile on any callees whose node count is > + now exceeded by the new caller node count. */ > + if ((!min_reest_ratio && callee->count > 0) > + || (min_reest_ratio && callee->count >= node->count)) > continue; > + > + gcov_type call_count = 0; > + if (min_reest_ratio > 0) > + { > + struct cgraph_edge *e2; > + for (e2 = node->callers; e2; e2 = e2->next_caller) > + call_count += e2->count; > + } > + > if (DECL_COMDAT (callee->decl) && fn && fn->cfg > && profile_status_for_fn (fn) == PROFILE_READ) > { > - drop_profile (node, 0); > + drop_profile (node, call_count); > + changed = true; > worklist.safe_push (callee); > } > } > } > worklist.release (); > + return changed; > } > > /* Convert counts measured by profile driven feedback to frequencies. > @@ -2900,12 +3001,6 @@ counts_to_freqs (void) > gcov_type count_max, true_count_max = 0; > basic_block bb; > > - /* Don't overwrite the estimated frequencies when the profile for > - the function is missing. We may drop this function PROFILE_GUESSED > - later in drop_profile (). */ > - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count) > - return 0; > - > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) > true_count_max = MAX (bb->count, true_count_max); > > Index: predict.h > =================================================================== > --- predict.h (revision 207436) > +++ predict.h (working copy) > @@ -47,11 +47,11 @@ enum prediction > > extern void predict_insn_def (rtx, enum br_predictor, enum prediction); > extern int counts_to_freqs (void); > -extern void handle_missing_profiles (void); > +extern bool handle_missing_profiles (void); > extern void estimate_bb_frequencies (bool); > extern const char *predictor_name (enum br_predictor); > extern tree build_predict_expr (enum br_predictor, enum prediction); > -extern void tree_estimate_probability (void); > +extern void tree_estimate_probability (bool); > extern void compute_function_frequency (void); > extern void rebuild_frequencies (void); > > Index: tree-inline.c > =================================================================== > --- tree-inline.c (revision 207436) > +++ tree-inline.c (working copy) > @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo > } > } > > -/* Convert estimated frequencies into counts for NODE, scaling COUNT > - with each bb's frequency. Used when NODE has a 0-weight entry > - but we are about to inline it into a non-zero count call bb. > - See the comments for handle_missing_profiles() in predict.c for > - when this can happen for COMDATs. */ > - > -void > -freqs_to_counts (struct cgraph_node *node, gcov_type count) > -{ > - basic_block bb; > - edge_iterator ei; > - edge e; > - struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > - > - FOR_ALL_BB_FN(bb, fn) > - { > - bb->count = apply_scale (count, > - GCOV_COMPUTE_SCALE (bb->frequency, > BB_FREQ_MAX)); > - FOR_EACH_EDGE (e, ei, bb->succs) > - e->count = apply_probability (e->src->count, e->probability); > - } > -} > - > /* Make a copy of the body of FN so that it can be inserted inline in > another function. Walks FN via CFG, returns new fndecl. */ > > @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun > int incoming_frequency = 0; > gcov_type incoming_count = 0; > > - /* This can happen for COMDAT routines that end up with 0 counts > - despite being called (see the comments for handle_missing_profiles() > - in predict.c as to why). Apply counts to the blocks in the callee > - before inlining, using the guessed edge frequencies, so that we don't > - end up with a 0-count inline body which can confuse downstream > - optimizations such as function splitting. */ > - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count) > - { > - /* Apply the larger of the call bb count and the total incoming > - call edge count to the callee. */ > - gcov_type in_count = 0; > - struct cgraph_edge *in_edge; > - for (in_edge = id->src_node->callers; in_edge; > - in_edge = in_edge->next_caller) > - in_count += in_edge->count; > - freqs_to_counts (id->src_node, count > in_count ? count : in_count); > - } > - > if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count) > count_scale > = GCOV_COMPUTE_SCALE (count, > @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun > else > count_scale = REG_BR_PROB_BASE; > > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + "Scaling entry count %ld to %ld with scale %ld while inlining " > + "%s into %s\n", > + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale, > + id->src_node->name (), id->dst_node->name ()); > + > /* Register specific tree functions. */ > gimple_register_cfg_hooks (); > > Index: tree-profile.c > =================================================================== > --- tree-profile.c (revision 207436) > +++ tree-profile.c (working copy) > @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un > gsi_insert_before (&gsi, call, GSI_NEW_STMT); > } > > +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph edges. > */ > + > +static void > +rebuild_cgraph (bool update_calls) > +{ > + struct cgraph_node *node; > + > + FOR_EACH_DEFINED_FUNCTION (node) > + { > + basic_block bb; > + > + if (!gimple_has_body_p (node->decl) > + || !(!node->clone_of > + || node->decl != node->clone_of->decl)) > + continue; > + > + /* Don't profile functions produced for builtin stuff. */ > + if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) > + continue; > + > + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); > + > + if (update_calls) > + { > + FOR_EACH_BB_FN (bb, cfun) > + { > + gimple_stmt_iterator gsi; > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next > (&gsi)) > + { > + gimple stmt = gsi_stmt (gsi); > + if (is_gimple_call (stmt)) > + update_stmt (stmt); > + } > + } > + > + /* re-merge split blocks. */ > + cleanup_tree_cfg (); > + update_ssa (TODO_update_ssa); > + } > + > + rebuild_cgraph_edges (); > + > + pop_cfun (); > + } > +} > + > /* Profile all functions in the callgraph. */ > > static unsigned int > @@ -622,43 +668,14 @@ tree_profiling (void) > } > > /* Update call statements and rebuild the cgraph. */ > - FOR_EACH_DEFINED_FUNCTION (node) > - { > - basic_block bb; > + rebuild_cgraph (true); > > - if (!gimple_has_body_p (node->decl) > - || !(!node->clone_of > - || node->decl != node->clone_of->decl)) > - continue; > + /* If the profiles were dropped on any functions, unfortunately we > + need to rebuild the cgraph to propagate the new reestimated counts > + and avoid sanity failures due to inconsistencies. */ > + if (handle_missing_profiles ()) > + rebuild_cgraph (false); > > - /* Don't profile functions produced for builtin stuff. */ > - if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) > - continue; > - > - push_cfun (DECL_STRUCT_FUNCTION (node->decl)); > - > - FOR_EACH_BB_FN (bb, cfun) > - { > - gimple_stmt_iterator gsi; > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > - { > - gimple stmt = gsi_stmt (gsi); > - if (is_gimple_call (stmt)) > - update_stmt (stmt); > - } > - } > - > - /* re-merge split blocks. */ > - cleanup_tree_cfg (); > - update_ssa (TODO_update_ssa); > - > - rebuild_cgraph_edges (); > - > - pop_cfun (); > - } > - > - handle_missing_profiles (); > - > del_node_map (); > return 0; > } > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413