On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li <davi...@google.com> wrote: > Is it better to add some logic in counts_to_freq to determine if the > profile count needs to be dropped completely to force profile > estimation?
This is the problem I was mentioning below where we call counts_to_freqs before we have the cgraph and can tell that we will need to drop the profile. When we were only dropping the profile for functions with all 0 counts, we simply avoided doing the counts_to_freqs when the counts were all 0, which works since the 0 counts don't leave things in an inconsistent state (counts vs estimated frequencies). Teresa > > David > > On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <tejohn...@google.com> wrote: >> This patch attempts to address the lost profile issue for COMDATs in >> more circumstances, exposed by function splitting. >> >> My earlier patch handled the case where the comdat had 0 counts since >> the linker kept the copy in a different module. In that case we >> prevent the guessed frequencies on 0-count functions from being >> dropped by counts_to_freq, and then later mark any reached via >> non-zero callgraph edges as guessed. Finally, when one such 0-count >> comdat is inlined the call count is propagated to the callee blocks >> using the guessed probabilities. >> >> However, in this case, there was a comdat that had a very small >> non-zero count, that was being inlined to a much hotter callsite. This >> could happen when there was a copy that was ipa-inlined >> in the profile gen compile, so the copy in that module gets some >> non-zero counts from the ipa inlined instance, but when the out of >> line copy was eliminated by the linker (selected from a different >> module). In this case the inliner was scaling the bb counts up quite a >> lot when inlining. The problem is that you most likely can't trust >> that the 0 count bbs in such a case are really not executed by the >> callsite it is being into, since the counts are very small and >> correspond to a different callsite. In some internal C++ applications >> I am seeing that we execute out of the split cold portion of COMDATs >> for this reason. >> >> This problem is more complicated to address than the 0-count instance, >> because we need the cgraph to determine which functions to drop the >> profile on, and at that point the estimated frequencies have already >> been overwritten by counts_to_freqs. To handle this broader case, I >> have changed the drop_profile routine to simply re-estimate the >> probabilities/frequencies (and translate these into counts scaled from >> the incoming call edge counts). This unfortunately necessitates >> rebuilding the cgraph, to propagate the new synthesized counts and >> avoid checking failures downstream. But it will only be rebuilt if we >> dropped any profiles. With this solution, some of the older approach >> can be removed (e.g. propagating counts using the guessed >> probabilities during inlining). >> >> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. >> Also tested on >> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? >> >> Thanks, >> Teresa >> >> 2014-02-10 Teresa Johnson <tejohn...@google.com> >> >> * graphite.c (graphite_finalize): Pass new parameter. >> * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. >> * predict.c (tree_estimate_probability): New parameter. >> (tree_estimate_probability_worker): Renamed from >> tree_estimate_probability_driver. >> (tree_reestimate_probability): New function. >> (tree_estimate_probability_driver): Invoke >> tree_estimate_probability_worker. >> (freqs_to_counts): Move from tree-inline.c. >> (drop_profile): Re-estimated profiles when dropping counts. >> (handle_missing_profiles): Drop for some non-zero functions as well. >> (counts_to_freqs): Remove code obviated by reestimation. >> * predict.h (handle_missing_profiles): Update declartion. >> (tree_estimate_probability): Ditto. >> * tree-inline.c (freqs_to_counts): Move to predict.c. >> (copy_cfg_body): Remove code obviated by reestimation. >> * tree-profile.c (gimple_gen_ior_profiler): >> (rebuild_cgraph): Code extracted from tree_profiling to rebuild >> cgraph. >> (tree_profiling): Invoke rebuild_cgraph as needed. >> >> Index: graphite.c >> =================================================================== >> --- graphite.c (revision 207436) >> +++ graphite.c (working copy) >> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) >> cleanup_tree_cfg (); >> profile_status_for_fn (cfun) = PROFILE_ABSENT; >> release_recorded_exits (); >> - tree_estimate_probability (); >> + tree_estimate_probability (false); >> } >> >> cloog_state_free (cloog_state); >> Index: params.def >> =================================================================== >> --- params.def (revision 207436) >> +++ params.def (working copy) >> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, >> "Maximal estimated outcome of branch considered predictable", >> 2, 0, 50) >> >> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, >> + "min-caller-reestimate-ratio", >> + "Minimum caller-to-callee node count ratio to force >> reestimated branch " >> + "probabilities in callee (where 0 means only when callee >> count is 0)", >> + 10, 0, 0) >> + >> DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, >> "inline-min-speedup", >> "The minimal estimated speedup allowing inliner to ignore >> inline-insns-single and inline-isnsns-auto", >> Index: predict.c >> =================================================================== >> --- predict.c (revision 207436) >> +++ predict.c (working copy) >> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb) >> >> /* Predict branch probabilities and estimate profile of the tree CFG. >> This function can be called from the loop optimizers to recompute >> - the profile information. */ >> + the profile information. When REDO is true then we are forcing >> + re-estimation of the probabilities because the profile was deemed >> + insufficient. */ >> >> void >> -tree_estimate_probability (void) >> +tree_estimate_probability (bool redo) >> { >> basic_block bb; >> >> @@ -2390,7 +2392,8 @@ void >> connect_infinite_loops_to_exit (); >> /* We use loop_niter_by_eval, which requires that the loops have >> preheaders. */ >> - create_preheaders (CP_SIMPLE_PREHEADERS); >> + if (!redo) >> + create_preheaders (CP_SIMPLE_PREHEADERS); >> calculate_dominance_info (CDI_POST_DOMINATORS); >> >> bb_predictions = pointer_map_create (); >> @@ -2412,16 +2415,16 @@ void >> pointer_map_destroy (bb_predictions); >> bb_predictions = NULL; >> >> - estimate_bb_frequencies (false); >> + estimate_bb_frequencies (redo); >> free_dominance_info (CDI_POST_DOMINATORS); >> remove_fake_exit_edges (); >> } >> >> /* Predict branch probabilities and estimate profile of the tree CFG. >> - This is the driver function for PASS_PROFILE. */ >> + When REDO is true, we are forcing reestimation of the probabilities. */ >> >> -static unsigned int >> -tree_estimate_probability_driver (void) >> +static void >> +tree_estimate_probability_worker (bool redo) >> { >> unsigned nb_loops; >> >> @@ -2435,7 +2438,7 @@ void >> if (nb_loops > 1) >> scev_initialize (); >> >> - tree_estimate_probability (); >> + tree_estimate_probability (redo); >> >> if (nb_loops > 1) >> scev_finalize (); >> @@ -2445,6 +2448,34 @@ void >> gimple_dump_cfg (dump_file, dump_flags); >> if (profile_status_for_fn (cfun) == PROFILE_ABSENT) >> profile_status_for_fn (cfun) = PROFILE_GUESSED; >> +} >> + >> +/* Force re-estimation of the probabilities, because the profile was >> + deemed insufficient. */ >> + >> +static void >> +tree_reestimate_probability (void) >> +{ >> + basic_block bb; >> + edge e; >> + edge_iterator ei; >> + >> + /* Need to reset the counts to ensure probabilities are recomputed. */ >> + FOR_EACH_BB_FN (bb, cfun) >> + { >> + bb->count = 0; >> + FOR_EACH_EDGE (e, ei, bb->succs) >> + e->count = 0; >> + } >> + tree_estimate_probability_worker (true); >> +} >> + >> +/* Estimate probabilities. >> + This is the driver function for PASS_PROFILE. */ >> +static unsigned int >> +tree_estimate_probability_driver (void) >> +{ >> + tree_estimate_probability_worker (false); >> return 0; >> } >> ^L >> @@ -2765,6 +2796,28 @@ estimate_loops (void) >> BITMAP_FREE (tovisit); >> } >> >> +/* Convert estimated frequencies into counts for NODE, scaling COUNT >> + with each bb's frequency. Used when NODE has an entry count that >> + is much lower than the caller edges reaching it. See the comments >> + for handle_missing_profiles() for when this can happen for COMDATs. */ >> + >> +void >> +freqs_to_counts (struct cgraph_node *node, gcov_type count) >> +{ >> + basic_block bb; >> + edge_iterator ei; >> + edge e; >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + FOR_ALL_BB_FN(bb, fn) >> + { >> + bb->count = apply_scale (count, >> + GCOV_COMPUTE_SCALE (bb->frequency, >> BB_FREQ_MAX)); >> + FOR_EACH_EDGE (e, ei, bb->succs) >> + e->count = apply_probability (e->src->count, e->probability); >> + } >> +} >> + >> /* Drop the profile for NODE to guessed, and update its frequency based on >> whether it is expected to be hot given the CALL_COUNT. */ >> >> @@ -2772,6 +2825,9 @@ static void >> drop_profile (struct cgraph_node *node, gcov_type call_count) >> { >> struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (profile_status_for_fn (fn) == PROFILE_GUESSED) >> + return; >> /* In the case where this was called by another function with a >> dropped profile, call_count will be 0. Since there are no >> non-zero call counts to this function, we don't know for sure >> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type >> >> if (dump_file) >> fprintf (dump_file, >> - "Dropping 0 profile for %s/%i. %s based on calls.\n", >> + "Dropping %ld profile for %s/%i. %s based on calls.\n", >> + node->count, >> node->name (), node->order, >> hot ? "Function is hot" : "Function is normal"); >> /* We only expect to miss profiles for functions that are reached >> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type >> node->name (), node->order); >> } >> >> + /* Re-estimate the probabilities for function and use the estimated >> + frequencies to compute the counts. */ >> + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >> + tree_reestimate_probability (); >> + freqs_to_counts (node, call_count); >> + if (dump_file) >> + { >> + fprintf (dump_file, "After re-estimating probabilies and counts\n"); >> + gimple_dump_cfg (dump_file, >> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS); >> + } >> + pop_cfun (); >> + >> profile_status_for_fn (fn) >> = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >> node->frequency >> @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, gcov_type >> /* In the case of COMDAT routines, multiple object files will contain the >> same >> function and the linker will select one for the binary. In that case >> all the other copies from the profile instrument binary will be missing >> - profile counts. Look for cases where this happened, due to non-zero >> + profile counts. This can confuse downstream optimizations such as >> + function splitting. >> + >> + Look for cases where this happened, due to non-zero >> call counts going to 0-count functions, and drop the profile to guessed >> so that we can use the estimated probabilities and avoid optimizing only >> - for size. >> + for size. In the case where the COMDAT was inlined in some locations >> + within the file and not others, the profile count will be non-zero due >> + to the inlined instances, but may still be significantly smaller than the >> + call edges for the non-inlined instances. Detect that case when requested >> + and reestimate probabilities, since the counts will not necessarily >> reflect >> + the behavior along the more frequent call paths. >> >> The other case where the profile may be missing is when the routine >> is not going to be emitted to the object file, e.g. for "extern template" >> class methods. Those will be marked DECL_EXTERNAL. Emit a warning in >> all other cases of non-zero calls to 0-count functions. */ >> >> -void >> +bool >> handle_missing_profiles (void) >> { >> struct cgraph_node *node; >> int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); >> vec<struct cgraph_node *> worklist; >> worklist.create (64); >> + int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO); >> + bool changed = false; >> >> - /* See if 0 count function has non-0 count callers. In this case we >> - lost some profile. Drop its function profile to PROFILE_GUESSED. */ >> + /* See if 0 or low count function has higher count caller edges. In this >> + case we lost some profile. Drop its function profile to >> + PROFILE_GUESSED. */ >> FOR_EACH_DEFINED_FUNCTION (node) >> { >> struct cgraph_edge *e; >> @@ -2842,8 +2922,6 @@ handle_missing_profiles (void) >> gcov_type max_tp_first_run = 0; >> struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> >> - if (node->count) >> - continue; >> for (e = node->callers; e; e = e->next_caller) >> { >> call_count += e->count; >> @@ -2852,6 +2930,12 @@ handle_missing_profiles (void) >> max_tp_first_run = e->caller->tp_first_run; >> } >> >> + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop >> + profiles for 0-count functions called by non-zero call edges. */ >> + if ((!min_reest_ratio && node->count > 0) >> + || (min_reest_ratio && node->count * min_reest_ratio > >> call_count)) >> + continue; >> + >> /* If time profile is missing, let assign the maximum that comes from >> caller functions. */ >> if (!node->tp_first_run && max_tp_first_run) >> @@ -2862,11 +2946,12 @@ handle_missing_profiles (void) >> && (call_count * unlikely_count_fraction >= profile_info->runs)) >> { >> drop_profile (node, call_count); >> + changed = true; >> worklist.safe_push (node); >> } >> } >> >> - /* Propagate the profile dropping to other 0-count COMDATs that are >> + /* Propagate the profile dropping to other low-count COMDATs that are >> potentially called by COMDATs we already dropped the profile on. */ >> while (worklist.length () > 0) >> { >> @@ -2878,17 +2963,33 @@ handle_missing_profiles (void) >> struct cgraph_node *callee = e->callee; >> struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >> >> - if (callee->count > 0) >> + /* When min_reest_ratio is non-zero, if we get here we dropped >> + a caller's profile since it was significantly smaller than its >> + call edge. Drop the profile on any callees whose node count is >> + now exceeded by the new caller node count. */ >> + if ((!min_reest_ratio && callee->count > 0) >> + || (min_reest_ratio && callee->count >= node->count)) >> continue; >> + >> + gcov_type call_count = 0; >> + if (min_reest_ratio > 0) >> + { >> + struct cgraph_edge *e2; >> + for (e2 = node->callers; e2; e2 = e2->next_caller) >> + call_count += e2->count; >> + } >> + >> if (DECL_COMDAT (callee->decl) && fn && fn->cfg >> && profile_status_for_fn (fn) == PROFILE_READ) >> { >> - drop_profile (node, 0); >> + drop_profile (node, call_count); >> + changed = true; >> worklist.safe_push (callee); >> } >> } >> } >> worklist.release (); >> + return changed; >> } >> >> /* Convert counts measured by profile driven feedback to frequencies. >> @@ -2900,12 +3001,6 @@ counts_to_freqs (void) >> gcov_type count_max, true_count_max = 0; >> basic_block bb; >> >> - /* Don't overwrite the estimated frequencies when the profile for >> - the function is missing. We may drop this function PROFILE_GUESSED >> - later in drop_profile (). */ >> - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count) >> - return 0; >> - >> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) >> true_count_max = MAX (bb->count, true_count_max); >> >> Index: predict.h >> =================================================================== >> --- predict.h (revision 207436) >> +++ predict.h (working copy) >> @@ -47,11 +47,11 @@ enum prediction >> >> extern void predict_insn_def (rtx, enum br_predictor, enum prediction); >> extern int counts_to_freqs (void); >> -extern void handle_missing_profiles (void); >> +extern bool handle_missing_profiles (void); >> extern void estimate_bb_frequencies (bool); >> extern const char *predictor_name (enum br_predictor); >> extern tree build_predict_expr (enum br_predictor, enum prediction); >> -extern void tree_estimate_probability (void); >> +extern void tree_estimate_probability (bool); >> extern void compute_function_frequency (void); >> extern void rebuild_frequencies (void); >> >> Index: tree-inline.c >> =================================================================== >> --- tree-inline.c (revision 207436) >> +++ tree-inline.c (working copy) >> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo >> } >> } >> >> -/* Convert estimated frequencies into counts for NODE, scaling COUNT >> - with each bb's frequency. Used when NODE has a 0-weight entry >> - but we are about to inline it into a non-zero count call bb. >> - See the comments for handle_missing_profiles() in predict.c for >> - when this can happen for COMDATs. */ >> - >> -void >> -freqs_to_counts (struct cgraph_node *node, gcov_type count) >> -{ >> - basic_block bb; >> - edge_iterator ei; >> - edge e; >> - struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> - >> - FOR_ALL_BB_FN(bb, fn) >> - { >> - bb->count = apply_scale (count, >> - GCOV_COMPUTE_SCALE (bb->frequency, >> BB_FREQ_MAX)); >> - FOR_EACH_EDGE (e, ei, bb->succs) >> - e->count = apply_probability (e->src->count, e->probability); >> - } >> -} >> - >> /* Make a copy of the body of FN so that it can be inserted inline in >> another function. Walks FN via CFG, returns new fndecl. */ >> >> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun >> int incoming_frequency = 0; >> gcov_type incoming_count = 0; >> >> - /* This can happen for COMDAT routines that end up with 0 counts >> - despite being called (see the comments for handle_missing_profiles() >> - in predict.c as to why). Apply counts to the blocks in the callee >> - before inlining, using the guessed edge frequencies, so that we don't >> - end up with a 0-count inline body which can confuse downstream >> - optimizations such as function splitting. */ >> - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count) >> - { >> - /* Apply the larger of the call bb count and the total incoming >> - call edge count to the callee. */ >> - gcov_type in_count = 0; >> - struct cgraph_edge *in_edge; >> - for (in_edge = id->src_node->callers; in_edge; >> - in_edge = in_edge->next_caller) >> - in_count += in_edge->count; >> - freqs_to_counts (id->src_node, count > in_count ? count : in_count); >> - } >> - >> if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count) >> count_scale >> = GCOV_COMPUTE_SCALE (count, >> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun >> else >> count_scale = REG_BR_PROB_BASE; >> >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, >> + "Scaling entry count %ld to %ld with scale %ld while inlining " >> + "%s into %s\n", >> + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale, >> + id->src_node->name (), id->dst_node->name ()); >> + >> /* Register specific tree functions. */ >> gimple_register_cfg_hooks (); >> >> Index: tree-profile.c >> =================================================================== >> --- tree-profile.c (revision 207436) >> +++ tree-profile.c (working copy) >> @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un >> gsi_insert_before (&gsi, call, GSI_NEW_STMT); >> } >> >> +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph edges. >> */ >> + >> +static void >> +rebuild_cgraph (bool update_calls) >> +{ >> + struct cgraph_node *node; >> + >> + FOR_EACH_DEFINED_FUNCTION (node) >> + { >> + basic_block bb; >> + >> + if (!gimple_has_body_p (node->decl) >> + || !(!node->clone_of >> + || node->decl != node->clone_of->decl)) >> + continue; >> + >> + /* Don't profile functions produced for builtin stuff. */ >> + if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) >> + continue; >> + >> + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >> + >> + if (update_calls) >> + { >> + FOR_EACH_BB_FN (bb, cfun) >> + { >> + gimple_stmt_iterator gsi; >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next >> (&gsi)) >> + { >> + gimple stmt = gsi_stmt (gsi); >> + if (is_gimple_call (stmt)) >> + update_stmt (stmt); >> + } >> + } >> + >> + /* re-merge split blocks. */ >> + cleanup_tree_cfg (); >> + update_ssa (TODO_update_ssa); >> + } >> + >> + rebuild_cgraph_edges (); >> + >> + pop_cfun (); >> + } >> +} >> + >> /* Profile all functions in the callgraph. */ >> >> static unsigned int >> @@ -622,43 +668,14 @@ tree_profiling (void) >> } >> >> /* Update call statements and rebuild the cgraph. */ >> - FOR_EACH_DEFINED_FUNCTION (node) >> - { >> - basic_block bb; >> + rebuild_cgraph (true); >> >> - if (!gimple_has_body_p (node->decl) >> - || !(!node->clone_of >> - || node->decl != node->clone_of->decl)) >> - continue; >> + /* If the profiles were dropped on any functions, unfortunately we >> + need to rebuild the cgraph to propagate the new reestimated counts >> + and avoid sanity failures due to inconsistencies. */ >> + if (handle_missing_profiles ()) >> + rebuild_cgraph (false); >> >> - /* Don't profile functions produced for builtin stuff. */ >> - if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION) >> - continue; >> - >> - push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >> - >> - FOR_EACH_BB_FN (bb, cfun) >> - { >> - gimple_stmt_iterator gsi; >> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> - { >> - gimple stmt = gsi_stmt (gsi); >> - if (is_gimple_call (stmt)) >> - update_stmt (stmt); >> - } >> - } >> - >> - /* re-merge split blocks. */ >> - cleanup_tree_cfg (); >> - update_ssa (TODO_update_ssa); >> - >> - rebuild_cgraph_edges (); >> - >> - pop_cfun (); >> - } >> - >> - handle_missing_profiles (); >> - >> del_node_map (); >> return 0; >> } >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413