On 7/31/19 7:29 AM, luoxhu wrote: > This patch aims to fix PR69678 caused by PGO indirect call profiling > performance issues. > The bug that profiling data is never working was fixed by Martin's pull > back of topN patches, performance got GEOMEAN ~1% improvement. > Still, currently the default profile only generates SINGLE indirect target > that called more than 75%. This patch leverages MULTIPLE indirect > targets use in LTO-WPA and LTO-LTRANS stage, as a result, function > specialization, profiling, partial devirtualization, inlining and > cloning could be done successfully based on it. > Performance can get improved from 0.70 sec to 0.38 sec on simple tests. > Details are: > 1. PGO with topn is enbaled by default now, but only one indirect > target edge will be generated in ipa-profile pass, so add variables to > enable > multiple speculative edges through passes, speculative_id will record the > direct edge > index bind to the indirect edge, num_of_ics records how many direct edges > owned by > the indirect edge, postpone gimple_ic to ipa-profile like default as inline > pass will decide whether it is benefit to transform indirect call. > 2. Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for > profile full support in ipa passes and cgraph_edge functions. > speculative_id > can be set by make_speculative id when multiple targets are binded to > one indirect edge, and cloned if new edge is cloned. speculative_id > is streamed out and stream int by lto like lto_stmt_uid. > 3. Add 1 in module testcase and 2 cross module testcases. > 4. Bootstrap and regression test passed on Power8-LE. > > v3 Changes: > 1. Rebase to trunk. > 2. Use speculative_id to track and search the reference node matched > with the direct edge's callee for multiple targets. This could > eliminate the workaround strstr before. Actually, it is the caller's > response to handle the direct edges mapped to same indirect edge. > speculative_call_info will still return one of the direct edge > specified, this will leverage current IPA edge process framework mostly.
Thank you for the next version of the patch. Inline comments follow: > > gcc/ChangeLog > > 2019-07-31 Xiong Hu Luo <luo...@linux.ibm.com> > > PR ipa/69678 > * cgraph.c (symbol_table::create_edge): Init speculative_id. > (cgraph_edge::make_speculative): Add param for setting speculative_id. > (cgraph_edge::speculative_call_info): Find reference by > speculative_id for multiple indirect targets. > (cgraph_edge::resolve_speculation): Decrease the speculations > for indirect edge, drop it's speculative if not direct target > left. > (cgraph_edge::redirect_call_stmt_to_callee): Likewise. > (cgraph_node::verify_node): Don't report error if speculative > edge not include statement. > * cgraph.h (struct indirect_target_info): New struct. > (indirect_call_targets): New vector variable. > (num_of_ics): New variable. > (make_speculative): Add param for setting speculative_id. > (speculative_id): New variable. > * cgraphclones.c (cgraph_node::create_clone): Clone speculative_id. > * ipa-inline.c (inline_small_functions): Add iterator update. > * ipa-profile.c (ipa_profile_generate_summary): Add indirect > multiple targets logic. > (ipa_profile): Likewise. > * ipa-ref.h (speculative_id): New variable. > * ipa.c (process_references): Fix typo. > * lto-cgraph.c (lto_output_edge): Add indirect multiple targets > logic. Stream out speculative_id. > (input_edge): Likewise. > * predict.c (dump_prediction): Revome edges count assert to be > precise. > * symtab.c (symtab_node::create_reference): Init speculative_id. > (symtab_node::clone_references): Clone speculative_id. > (symtab_node::clone_referring): Clone speculative_id. > (symtab_node::clone_reference): Clone speculative_id. > (symtab_node::clear_stmts_in_references): Clear speculative_id. > * tree-inline.c (copy_bb): Duplicate all the speculative edges > if indirect call contains multiple speculative targets. > * tree-profile.c (gimple_gen_ic_profiler): Use the new variable > __gcov_indirect_call.counters and __gcov_indirect_call.callee. > (gimple_gen_ic_func_profiler): Likewise. > (pass_ipa_tree_profile::gate): Fix comment typos. > * value-prof.c (gimple_ic_transform): Handle topn case. > Fix comment typos. > > gcc/testsuite/ChangeLog > > 2019-07-31 Xiong Hu Luo <luo...@linux.ibm.com> > > PR ipa/69678 > * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase. > * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase. > --- > gcc/cgraph.c | 70 +++++- > gcc/cgraph.h | 28 ++- > gcc/cgraphclones.c | 1 + > gcc/ipa-inline.c | 15 +- > gcc/ipa-profile.c | 238 +++++++++++------- > gcc/ipa-ref.h | 1 + > gcc/ipa.c | 2 +- > gcc/lto-cgraph.c | 54 +++- > gcc/predict.c | 1 - > gcc/symtab.c | 5 + > .../tree-prof/crossmodule-indir-call-topn-1.c | 35 +++ > .../crossmodule-indir-call-topn-1a.c | 22 ++ > .../tree-prof/crossmodule-indir-call-topn-2.c | 42 ++++ > .../gcc.dg/tree-prof/indir-call-prof-topn.c | 38 +++ > gcc/tree-inline.c | 20 ++ > gcc/tree-profile.c | 12 +- > gcc/value-prof.c | 80 +++--- > 17 files changed, 500 insertions(+), 164 deletions(-) > create mode 100644 > gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > create mode 100644 > gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > create mode 100644 > gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index a7e3bcf2132..d7be9e9c545 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -869,6 +869,7 @@ symbol_table::create_edge (cgraph_node *caller, > cgraph_node *callee, > edge->prev_callee = NULL; > edge->next_callee = NULL; > edge->lto_stmt_uid = 0; > + edge->speculative_id = 0; > > edge->count = count; > > @@ -1063,7 +1064,8 @@ cgraph_edge::remove (void) > Return direct edge created. */ > > cgraph_edge * > -cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count) > +cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count, > + unsigned int speculative_id) > { > cgraph_node *n = caller; > ipa_ref *ref = NULL; > @@ -1081,11 +1083,13 @@ cgraph_edge::make_speculative (cgraph_node *n2, > profile_count direct_count) > else > e2->can_throw_external = can_throw_external; > e2->lto_stmt_uid = lto_stmt_uid; > + e2->speculative_id = speculative_id; > e2->in_polymorphic_cdtor = in_polymorphic_cdtor; > count -= e2->count; > symtab->call_edge_duplication_hooks (this, e2); > ref = n->create_reference (n2, IPA_REF_ADDR, call_stmt); > ref->lto_stmt_uid = lto_stmt_uid; > + ref->speculative_id = speculative_id; > ref->speculative = speculative; > n2->mark_address_taken (); > return e2; > @@ -1099,6 +1103,38 @@ cgraph_edge::make_speculative (cgraph_node *n2, > profile_count direct_count) > call) and if one of them exists, all of them must exist. > > Given speculative call edge, return all three components. > + > + For some indirect edge, it may maps to multiple direct edges, i.e. 1:N. > + check the speculative_id to return all the three components for specified > + direct edge or indirect edge. > + If input is indirect, caller of this function will get the direct edge > one by > + one, get_edge will just return one of the direct edge mapped to the > indirect > + edge, the returned direct edge will be resolved or redirected by the > caller, > + then num_of_ics (speculations) is deceased in each access. > + If input is direct, this function will get the indirect edge and reference > + with matched speculative_id, the returned edge will also be resolved or > + redirected, decrease the speculations accordingly. > + Speculations of indirect edge will be dropped only if all direct edges > + be handled. > + > + e.g. for indirect edge E statement "call call_dest": > + > + Redirect N3 after redirected N2: > + > + if (call_dest == N2) > + n2 (); > + else if (call_dest == N3) > + n3 (); > + else > + call call_dest > + > + Resolve N3 and only redirect N2: > + > + if (call_dest == N2) > + n2 (); > + else > + call call_dest > + > */ > > void > @@ -1138,7 +1174,7 @@ cgraph_edge::speculative_call_info (cgraph_edge > *&direct, > > reference = NULL; > for (i = 0; e->caller->iterate_reference (i, ref); i++) > - if (ref->speculative > + if (ref->speculative && ref->speculative_id == e->speculative_id > && ((ref->stmt && ref->stmt == e->call_stmt) > || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid))) > { > @@ -1199,7 +1235,19 @@ cgraph_edge::resolve_speculation (tree callee_decl) > in the functions inlined through it. */ > } > edge->count += e2->count; > - edge->speculative = false; > + /* edge is indirect, e2 is direct. If edge contains multiple speculations, > + remove one of speculations for this indirect edge, then if edge still > + contains direct target, keep the speculation, next direct target > + will continue use it. Give up speculation completely if no direct > + target is left for this indirect edge. */ > + if (edge->indirect_info && edge->indirect_info->num_of_ics) > + { > + edge->indirect_info->num_of_ics--; > + if (edge->indirect_info->num_of_ics == 0) > + edge->speculative = false; > + } > + else > + edge->speculative = false; > e2->speculative = false; > ref->remove_reference (); > if (e2->indirect_unknown_callee || e2->inline_failed) > @@ -1333,7 +1381,19 @@ cgraph_edge::redirect_call_stmt_to_callee (void) > e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt, > false); > e->count = gimple_bb (e->call_stmt)->count; > - e2->speculative = false; > + /* edge is direct, e2 is indirect here. If e2 contains multiple > + speculations, remove one of speculations for this indirect edge, > + then if e2 still contains direct target, keep the speculation, > + next direct target will continue use it. Give up speculation > + completely if no direct target is left for this indirect e2. */ > + if (e2->indirect_info && e2->indirect_info->num_of_ics) > + { > + e2->indirect_info->num_of_ics--; > + if (e2->indirect_info->num_of_ics == 0) > + e2->speculative = false; > + } > + else > + e2->speculative = false; > e2->count = gimple_bb (e2->call_stmt)->count; > ref->speculative = false; > ref->stmt = NULL; > @@ -3434,7 +3494,7 @@ cgraph_node::verify_node (void) > > for (e = callees; e; e = e->next_callee) > { > - if (!e->aux) > + if (!e->aux && !e->speculative) > { > error ("edge %s->%s has no corresponding call_stmt", > identifier_to_locale (e->caller->name ()), > diff --git a/gcc/cgraph.h b/gcc/cgraph.h > index fa5224fb3a5..0b453c3534c 100644 > --- a/gcc/cgraph.h > +++ b/gcc/cgraph.h > @@ -1630,6 +1630,16 @@ private: > void make_speculative (tree otr_type = NULL); > }; > > +/* Structure containing indirect target information from profile. */ > + > +struct GTY (()) indirect_target_info > +{ > + /* Profile_id of common target obtained from profile. */ > + unsigned int common_target_id; > + /* Probability that call will land in function with COMMON_TARGET_ID. */ > + int common_target_probability; > +}; > + > /* Structure containing additional information about an indirect call. */ > > class GTY(()) cgraph_indirect_call_info > @@ -1648,10 +1658,14 @@ public: > int param_index; > /* ECF flags determined from the caller. */ > int ecf_flags; > - /* Profile_id of common target obtrained from profile. */ > - int common_target_id; > - /* Probability that call will land in function with COMMON_TARGET_ID. */ > - int common_target_probability; > + > + /* An indirect call may contain one or multiple call targets. */ > + vec<indirect_target_info, va_gc> *indirect_call_targets; > + > + /* Number of indirect targets (speculations) for the indirect call. > + Increased by ipa-profile, decreased by resolve_speculation or redirect > + statement to callee. */ > + unsigned num_of_ics; Would it be possible to use just the vector and replace all usages of num_of_ics with indirect_call_targets->length () ? It would make the code easier. > > /* Set when the call is a virtual call with the parameter being the > associated object pointer rather than a simple direct call. */ > @@ -1708,7 +1722,8 @@ public: > /* Turn edge into speculative call calling N2. Update > the profile so the direct call is taken COUNT times > with FREQUENCY. */ > - cgraph_edge *make_speculative (cgraph_node *n2, profile_count > direct_count); > + cgraph_edge *make_speculative (cgraph_node *n2, profile_count direct_count, > + unsigned int speculative_id = 0); > > /* Given speculative call edge, return all three components. */ > void speculative_call_info (cgraph_edge *&direct, cgraph_edge *&indirect, > @@ -1786,6 +1801,9 @@ public: > /* The stmt_uid of call_stmt. This is used by LTO to recover the call_stmt > when the function is serialized in. */ > unsigned int lto_stmt_uid; > + /* speculative id is used by multiple indirect targets when the function is > + speculated. */ > + unsigned int speculative_id; > /* Whether this edge was made direct by indirect inlining. */ > unsigned int indirect_inlining_edge : 1; > /* Whether this edge describes an indirect call with an undetermined > diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c > index fd867ecac91..ab6b9dd86c8 100644 > --- a/gcc/cgraphclones.c > +++ b/gcc/cgraphclones.c > @@ -128,6 +128,7 @@ cgraph_edge::clone (cgraph_node *n, gcall *call_stmt, > unsigned stmt_uid, > new_edge->inline_failed = inline_failed; > new_edge->indirect_inlining_edge = indirect_inlining_edge; > new_edge->lto_stmt_uid = stmt_uid; > + new_edge->speculative_id = speculative_id; > /* Clone flags that depend on call_stmt availability manually. */ > new_edge->can_throw_external = can_throw_external; > new_edge->call_stmt_cannot_inline_p = call_stmt_cannot_inline_p; > diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c > index 939d86ef94a..f5654198877 100644 > --- a/gcc/ipa-inline.c > +++ b/gcc/ipa-inline.c > @@ -1877,12 +1877,15 @@ inline_small_functions (void) > } > if (has_speculative) > for (edge = node->callees; edge; edge = next) > - if (edge->speculative && !speculation_useful_p (edge, > - edge->aux != NULL)) > - { > - edge->resolve_speculation (); > - update = true; > - } > + { > + next = edge->next_callee; > + if (edge->speculative > + && !speculation_useful_p (edge, edge->aux != NULL)) > + { > + edge->resolve_speculation (); > + update = true; > + } > + } > if (update) > { > struct cgraph_node *where = node->global.inlined_to > diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c > index 970dba39c80..26214eba27e 100644 > --- a/gcc/ipa-profile.c > +++ b/gcc/ipa-profile.c > @@ -192,23 +192,39 @@ ipa_profile_generate_summary (void) > if (h) > { > gcov_type val, count, all; > - if (get_nth_most_common_value (NULL, "indirect call", h, > - &val, &count, &all)) > + struct cgraph_edge *e = node->get_edge (stmt); > + if (e && !e->indirect_unknown_callee) > + continue; > + > + e->indirect_info->num_of_ics = 0; > + for (unsigned j = 0; j < GCOV_TOPN_VALUES; j++) > { > - struct cgraph_edge * e = node->get_edge (stmt); > - if (e && !e->indirect_unknown_callee) > + if (!get_nth_most_common_value (NULL, "indirect call", > + h, &val, &count, &all, > + j)) > + continue; > + > + if (val == 0) > continue; > > - e->indirect_info->common_target_id = val; > - e->indirect_info->common_target_probability > + struct indirect_target_info item; > + item.common_target_id = val; > + item.common_target_probability > = GCOV_COMPUTE_SCALE (count, all); > - if (e->indirect_info->common_target_probability > > REG_BR_PROB_BASE) > + if (item.common_target_probability > REG_BR_PROB_BASE) > { > if (dump_file) > - fprintf (dump_file, "Probability capped to > 1\n"); > - e->indirect_info->common_target_probability = > REG_BR_PROB_BASE; > + fprintf (dump_file, > + "Probability capped to 1\n"); > + item.common_target_probability = REG_BR_PROB_BASE; > } > + vec_safe_push ( > + e->indirect_info->indirect_call_targets, item); > + e->indirect_info->num_of_ics++; > } > + gcc_assert (e->indirect_info->num_of_ics > + <= GCOV_TOPN_VALUES); > + > gimple_remove_histogram_value (DECL_STRUCT_FUNCTION > (node->decl), > stmt, h); > } > @@ -492,6 +508,7 @@ ipa_profile (void) > int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0; > int nmismatch = 0, nimpossible = 0; > bool node_map_initialized = false; > + gcov_type threshold; > > if (dump_file) > dump_histogram (dump_file, histogram); > @@ -500,14 +517,12 @@ ipa_profile (void) > overall_time += histogram[i]->count * histogram[i]->time; > overall_size += histogram[i]->size; > } > + threshold = 0; > if (overall_time) > { > - gcov_type threshold; > - > gcc_assert (overall_size); > > cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) > / 1000; > - threshold = 0; > for (i = 0; cumulated < cutoff; i++) > { > cumulated += histogram[i]->count * histogram[i]->time; > @@ -543,7 +558,7 @@ ipa_profile (void) > histogram.release (); > histogram_pool.release (); > > - /* Produce speculative calls: we saved common traget from porfiling into > + /* Produce speculative calls: we saved common target from profiling into > e->common_target_id. Now, at link time, we can look up corresponding > function node and produce speculative call. */ > > @@ -558,104 +573,133 @@ ipa_profile (void) > { > if (n->count.initialized_p ()) > nindirect++; > - if (e->indirect_info->common_target_id) > + if (e->indirect_info && e->indirect_info->num_of_ics) > { > - if (!node_map_initialized) > - init_node_map (false); > - node_map_initialized = true; > - ncommon++; > - n2 = find_func_by_profile_id (e->indirect_info->common_target_id); > - if (n2) > + if (in_lto_p) > { Starting from here, it's very difficult to read the patch. Reason is that you wrapped all this in a FOR_EACH_VEC_SAFE_ELT. Would it be please possible to make a refactoring patch that will put code with all the if-else-else-else.. into a separate function? Then you'll be able to call the function in a small loop: FOR_EACH_VEC_SAFE_ELT (e->indirect_info->indirect_call_targets, i, item) new_function_call... It will simplify the patch review a lot. > if (dump_file) > { > - fprintf (dump_file, "Indirect call -> direct call from" > - " other module %s => %s, prob %3.2f\n", > - n->dump_name (), > - n2->dump_name (), > - e->indirect_info->common_target_probability > - / (float)REG_BR_PROB_BASE); > - } > - if (e->indirect_info->common_target_probability > - < REG_BR_PROB_BASE / 2) > - { > - nuseless++; > - if (dump_file) > - fprintf (dump_file, > - "Not speculating: probability is too low.\n"); > - } > - else if (!e->maybe_hot_p ()) > - { > - nuseless++; > - if (dump_file) > - fprintf (dump_file, > - "Not speculating: call is cold.\n"); > - } > - else if (n2->get_availability () <= AVAIL_INTERPOSABLE > - && n2->can_be_discarded_p ()) > - { > - nuseless++; > - if (dump_file) > - fprintf (dump_file, > - "Not speculating: target is overwritable " > - "and can be discarded.\n"); > + fprintf (dump_file, > + "Updating hotness threshold in LTO mode.\n"); > + fprintf (dump_file, "Updated min count: %" PRId64 "\n", > + (int64_t) threshold); > } > - else if (ipa_node_params_sum && ipa_edge_args_sum > - && (!vec_safe_is_empty > - (IPA_NODE_REF (n2)->descriptors)) > - && ipa_get_param_count (IPA_NODE_REF (n2)) > - != ipa_get_cs_argument_count (IPA_EDGE_REF (e)) > - && (ipa_get_param_count (IPA_NODE_REF (n2)) > - >= ipa_get_cs_argument_count (IPA_EDGE_REF (e)) > - || !stdarg_p (TREE_TYPE (n2->decl)))) > + set_hot_bb_threshold (threshold > + / e->indirect_info->num_of_ics); > + } > + if (!node_map_initialized) > + init_node_map (false); > + node_map_initialized = true; > + ncommon++; > + unsigned speculative_id = 0; > + struct indirect_target_info *item; > + FOR_EACH_VEC_SAFE_ELT (e->indirect_info->indirect_call_targets, i, > + item) > + { > + if ((unsigned) i > e->indirect_info->num_of_ics) > + break; > + n2 = find_func_by_profile_id (item->common_target_id); > + if (n2) > { > - nmismatch++; > if (dump_file) > - fprintf (dump_file, > - "Not speculating: " > - "parameter count mistmatch\n"); > + { > + fprintf (dump_file, > + "Indirect call -> direct call from" > + " other module %s => %s, prob %3.2f\n", > + n->dump_name (), n2->dump_name (), > + item->common_target_probability > + / (float) REG_BR_PROB_BASE); > + } > + if (item->common_target_probability > + < REG_BR_PROB_BASE / 2) > + { > + nuseless++; > + if (dump_file) > + fprintf ( > + dump_file, > + "Not speculating: probability is too low.\n"); > + } > + else if (!e->maybe_hot_p ()) > + { > + nuseless++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: call is cold.\n"); > + } > + else if (n2->get_availability () <= AVAIL_INTERPOSABLE > + && n2->can_be_discarded_p ()) > + { > + nuseless++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: target is overwritable " > + "and can be discarded.\n"); > + } > + else if (ipa_node_params_sum && ipa_edge_args_sum > + && (!vec_safe_is_empty ( > + IPA_NODE_REF (n2)->descriptors)) > + && ipa_get_param_count (IPA_NODE_REF (n2)) > + != ipa_get_cs_argument_count ( > + IPA_EDGE_REF (e)) > + && (ipa_get_param_count (IPA_NODE_REF (n2)) > + >= ipa_get_cs_argument_count ( > + IPA_EDGE_REF (e)) > + || !stdarg_p (TREE_TYPE (n2->decl)))) > + { > + nmismatch++; > + if (dump_file) > + fprintf (dump_file, "Not speculating: " > + "parameter count mismatch\n"); > + } > + else if (e->indirect_info->polymorphic > + && !opt_for_fn (n->decl, flag_devirtualize) > + && !possible_polymorphic_call_target_p (e, n2)) > + { > + nimpossible++; > + if (dump_file) > + fprintf (dump_file, > + "Not speculating: " > + "function is not in the polymorphic " > + "call target list\n"); > + } > + else > + { > + /* Target may be overwritable, but profile says that > + control flow goes to this particular implementation > + of N2. Speculate on the local alias to allow > + inlining. > + */ > + if (!n2->can_be_discarded_p ()) > + { > + cgraph_node *alias; > + alias = dyn_cast<cgraph_node *> ( > + n2->noninterposable_alias ()); > + if (alias) > + n2 = alias; > + } > + nconverted++; > + e->make_speculative ( > + n2, > + e->count.apply_probability ( > + item->common_target_probability), > + speculative_id); > + update = true; > + speculative_id++; > + } > } > - else if (e->indirect_info->polymorphic > - && !opt_for_fn (n->decl, flag_devirtualize) > - && !possible_polymorphic_call_target_p (e, n2)) > + else > { > - nimpossible++; > if (dump_file) > fprintf (dump_file, > - "Not speculating: " > - "function is not in the polymorphic " > - "call target list\n"); > + "Function with profile-id %i not found.\n", > + item->common_target_id); > + nunknown++; > } > - else > - { > - /* Target may be overwritable, but profile says that > - control flow goes to this particular implementation > - of N2. Speculate on the local alias to allow inlining. > - */ > - if (!n2->can_be_discarded_p ()) > - { > - cgraph_node *alias; > - alias = dyn_cast<cgraph_node *> > (n2->noninterposable_alias ()); > - if (alias) > - n2 = alias; > - } > - nconverted++; > - e->make_speculative > - (n2, > - e->count.apply_probability > - > (e->indirect_info->common_target_probability)); > - update = true; > - } > - } > - else > - { > - if (dump_file) > - fprintf (dump_file, "Function with profile-id %i not > found.\n", > - e->indirect_info->common_target_id); > - nunknown++; > } > + if (speculative_id < e->indirect_info->num_of_ics) > + e->indirect_info->num_of_ics = speculative_id; > } > - } > + } > if (update) > ipa_update_overall_fn_summary (n); > } > diff --git a/gcc/ipa-ref.h b/gcc/ipa-ref.h > index 0d8e509c932..3e6562ec9d1 100644 > --- a/gcc/ipa-ref.h > +++ b/gcc/ipa-ref.h > @@ -59,6 +59,7 @@ public: > symtab_node *referred; > gimple *stmt; > unsigned int lto_stmt_uid; > + unsigned int speculative_id; > unsigned int referred_index; > ENUM_BITFIELD (ipa_ref_use) use:3; > unsigned int speculative:1; > diff --git a/gcc/ipa.c b/gcc/ipa.c > index 2496694124c..c1fe081a72d 100644 > --- a/gcc/ipa.c > +++ b/gcc/ipa.c > @@ -166,7 +166,7 @@ process_references (symtab_node *snode, > devirtualization happens. After inlining still keep their declarations > around, so we can devirtualize to a direct call. > > - Also try to make trivial devirutalization when no or only one target is > + Also try to make trivial devirtualization when no or only one target is > possible. */ > > static void > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c > index bc0f0107333..1dc856d3c64 100644 > --- a/gcc/lto-cgraph.c > +++ b/gcc/lto-cgraph.c > @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, > struct cgraph_edge *edge, > unsigned int uid; > intptr_t ref; > struct bitpack_d bp; > + unsigned len; > > if (edge->indirect_unknown_callee) > streamer_write_enum (ob->main_stream, LTO_symtab_tags, > LTO_symtab_last_tag, > @@ -265,6 +266,7 @@ lto_output_edge (struct lto_simple_output_block *ob, > struct cgraph_edge *edge, > bp_pack_enum (&bp, cgraph_inline_failed_t, > CIF_N_REASONS, edge->inline_failed); > bp_pack_var_len_unsigned (&bp, uid); > + bp_pack_var_len_unsigned (&bp, edge->speculative_id); > bp_pack_value (&bp, edge->indirect_inlining_edge, 1); > bp_pack_value (&bp, edge->speculative, 1); > bp_pack_value (&bp, edge->call_stmt_cannot_inline_p, 1); > @@ -291,11 +293,28 @@ lto_output_edge (struct lto_simple_output_block *ob, > struct cgraph_edge *edge, > streamer_write_bitpack (&bp); > if (edge->indirect_unknown_callee) > { > + struct indirect_target_info *item; > + unsigned int i; > + len = edge->indirect_info->num_of_ics; > + gcc_assert (len <= GCOV_TOPN_VALUES); > + > streamer_write_hwi_stream (ob->main_stream, > - edge->indirect_info->common_target_id); > - if (edge->indirect_info->common_target_id) > - streamer_write_hwi_stream > - (ob->main_stream, edge->indirect_info->common_target_probability); > + edge->indirect_info->num_of_ics); > + > + if (len) > + { > + FOR_EACH_VEC_SAFE_ELT (edge->indirect_info->indirect_call_targets, i, > + item) > + { > + if (i == edge->indirect_info->num_of_ics) > + break; This is one of the places which will simplify with the vec::length. > + streamer_write_hwi_stream (ob->main_stream, > + item->common_target_id); > + if (item->common_target_id) > + streamer_write_hwi_stream (ob->main_stream, > + item->common_target_probability); > + } > + } > } > } > > @@ -688,6 +707,7 @@ lto_output_ref (struct lto_simple_output_block *ob, > struct ipa_ref *ref, > if (ref->stmt) > uid = gimple_uid (ref->stmt) + 1; > streamer_write_hwi_stream (ob->main_stream, uid); > + streamer_write_hwi_stream (ob->main_stream, ref->speculative_id); > } > } > > @@ -1420,6 +1440,8 @@ input_ref (class lto_input_block *ib, > ref->speculative = speculative; > if (is_a <cgraph_node *> (referring_node)) > ref->lto_stmt_uid = streamer_read_hwi (ib); > + if (is_a <cgraph_node *> (referring_node)) > + ref->speculative_id = streamer_read_hwi (ib); Here you want to write the condition just once. > } > > /* Read an edge from IB. NODES points to a vector of previously read nodes > for > @@ -1433,11 +1455,12 @@ input_edge (class lto_input_block *ib, > vec<symtab_node *> nodes, > { > struct cgraph_node *caller, *callee; > struct cgraph_edge *edge; > - unsigned int stmt_id; > + unsigned int stmt_id, speculative_id; > profile_count count; > cgraph_inline_failed_t inline_failed; > struct bitpack_d bp; > int ecf_flags = 0; > + unsigned i, len; > > caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]); > if (caller == NULL || caller->decl == NULL_TREE) > @@ -1457,6 +1480,7 @@ input_edge (class lto_input_block *ib, vec<symtab_node > *> nodes, > bp = streamer_read_bitpack (ib); > inline_failed = bp_unpack_enum (&bp, cgraph_inline_failed_t, > CIF_N_REASONS); > stmt_id = bp_unpack_var_len_unsigned (&bp); > + speculative_id = bp_unpack_var_len_unsigned (&bp); > > if (indirect) > edge = caller->create_indirect_edge (NULL, 0, count); > @@ -1466,6 +1490,7 @@ input_edge (class lto_input_block *ib, vec<symtab_node > *> nodes, > edge->indirect_inlining_edge = bp_unpack_value (&bp, 1); > edge->speculative = bp_unpack_value (&bp, 1); > edge->lto_stmt_uid = stmt_id; > + edge->speculative_id = speculative_id; > edge->inline_failed = inline_failed; > edge->call_stmt_cannot_inline_p = bp_unpack_value (&bp, 1); > edge->can_throw_external = bp_unpack_value (&bp, 1); > @@ -1485,9 +1510,22 @@ input_edge (class lto_input_block *ib, vec<symtab_node > *> nodes, > if (bp_unpack_value (&bp, 1)) > ecf_flags |= ECF_RETURNS_TWICE; > edge->indirect_info->ecf_flags = ecf_flags; > - edge->indirect_info->common_target_id = streamer_read_hwi (ib); > - if (edge->indirect_info->common_target_id) > - edge->indirect_info->common_target_probability = streamer_read_hwi > (ib); > + > + len = streamer_read_hwi (ib); > + edge->indirect_info->num_of_ics = len; > + > + gcc_assert (len <= GCOV_TOPN_VALUES); > + > + if (len) > + { > + for (i = 0; i < len; i++) > + { > + indirect_target_info item; I would also suggest a constructor for indirect_target_info. You initialize it multiple times. > + item.common_target_id = streamer_read_hwi (ib); > + item.common_target_probability = streamer_read_hwi (ib); > + vec_safe_push (edge->indirect_info->indirect_call_targets, item); > + } > + } > } > } > > diff --git a/gcc/predict.c b/gcc/predict.c > index 07f66aab7a3..67a859ea436 100644 > --- a/gcc/predict.c > +++ b/gcc/predict.c > @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, > int probability, > && bb->count.precise_p () > && reason == REASON_NONE) > { > - gcc_assert (e->count ().precise_p ()); > fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n", > predictor_info[predictor].name, > bb->count.to_gcov_type (), e->count ().to_gcov_type (), > diff --git a/gcc/symtab.c b/gcc/symtab.c > index b1589ea8a40..90a1f62d6a9 100644 > --- a/gcc/symtab.c > +++ b/gcc/symtab.c > @@ -603,6 +603,7 @@ symtab_node::create_reference (symtab_node *referred_node, > ref->referred = referred_node; > ref->stmt = stmt; > ref->lto_stmt_uid = 0; > + ref->speculative_id = 0; > ref->use = use_type; > ref->speculative = 0; > > @@ -660,6 +661,7 @@ symtab_node::clone_references (symtab_node *node) > ref2 = create_reference (ref->referred, ref->use, ref->stmt); > ref2->speculative = speculative; > ref2->lto_stmt_uid = stmt_uid; > + ref2->speculative_id = ref->speculative_id; > } > } > > @@ -678,6 +680,7 @@ symtab_node::clone_referring (symtab_node *node) > ref2 = ref->referring->create_reference (this, ref->use, ref->stmt); > ref2->speculative = speculative; > ref2->lto_stmt_uid = stmt_uid; > + ref2->speculative_id = ref->speculative_id; > } > } > > @@ -693,6 +696,7 @@ symtab_node::clone_reference (ipa_ref *ref, gimple *stmt) > ref2 = create_reference (ref->referred, ref->use, stmt); > ref2->speculative = speculative; > ref2->lto_stmt_uid = stmt_uid; > + ref2->speculative_id = ref->speculative_id; > return ref2; > } > > @@ -747,6 +751,7 @@ symtab_node::clear_stmts_in_references (void) > { > r->stmt = NULL; > r->lto_stmt_uid = 0; > + r->speculative_id = 0; > } > } > > diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > new file mode 100644 > index 00000000000..e0a83c2e067 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c > @@ -0,0 +1,35 @@ > +/* { dg-require-effective-target lto } */ > +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */ > +/* { dg-require-profiling "-fprofile-generate" } */ > +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param > indir-call-topn-profile=1" } */ > + > +#include <stdio.h> > + > +typedef int (*fptr) (int); > +int > +one (int a); > + > +int > +two (int a); > + > +fptr table[] = {&one, &two}; > + > +int > +main() > +{ > + int i, x; > + fptr p = &one; > + > + x = one (3); > + > + for (i = 0; i < 350000000; i++) > + { > + x = (*p) (3); > + p = table[x]; > + } > + printf ("done:%d\n", x); > +} > + > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* one transformation on insn" "profile_estimate" } } */ > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* two transformation on insn" "profile_estimate" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > new file mode 100644 > index 00000000000..a8c6e365fb9 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c > @@ -0,0 +1,22 @@ > +/* It seems there is no way to avoid the other source of mulitple > + source testcase from being compiled independently. Just avoid > + error. */ > +#ifdef DOJOB > +int > +one (int a) > +{ > + return 1; > +} > + > +int > +two (int a) > +{ > + return 0; > +} > +#else > +int > +main() > +{ > + return 0; > +} > +#endif > diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > new file mode 100644 > index 00000000000..aa3887fde83 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c > @@ -0,0 +1,42 @@ > +/* { dg-require-effective-target lto } */ > +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */ > +/* { dg-require-profiling "-fprofile-generate" } */ > +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param > indir-call-topn-profile=1" } */ > + > +#include <stdio.h> > + > +typedef int (*fptr) (int); > +int > +one (int a); > + > +int > +two (int a); > + > +fptr table[] = {&one, &two}; > + > +int foo () > +{ > + int i, x; > + fptr p = &one; > + > + x = one (3); > + > + for (i = 0; i < 350000000; i++) > + { > + x = (*p) (3); > + p = table[x]; > + } > + return x; > +} > + > +int > +main() > +{ > + int x = foo (); > + printf ("done:%d\n", x); > +} > + > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* one transformation on insn" "profile_estimate" } } */ > +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct > call.* two transformation on insn" "profile_estimate" } } */ > + > + > diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > new file mode 100644 > index 00000000000..951bc7ddd19 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c > @@ -0,0 +1,38 @@ > +/* { dg-require-profiling "-fprofile-generate" } */ > +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } > */ > + > +#include <stdio.h> > + > +typedef int (*fptr) (int); > +int > +one (int a) > +{ > + return 1; > +} > + > +int > +two (int a) > +{ > + return 0; > +} > + > +fptr table[] = {&one, &two}; > + > +int > +main() > +{ > + int i, x; > + fptr p = &one; > + > + one (3); > + > + for (i = 0; i < 350000000; i++) > + { > + x = (*p) (3); > + p = table[x]; > + } > + printf ("done:%d\n", x); > +} > + > +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct > call.* one transformation on insn" "profile" } } */ > +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct > call.* two transformation on insn" "profile" } } */ > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index 4311309acce..93d9269aa8f 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -2167,6 +2167,26 @@ copy_bb (copy_body_data *id, basic_block bb, > > gcc_assert (!edge->indirect_unknown_callee); > old_edge->speculative_call_info (direct, indirect, > ref); > + while (old_edge->next_callee > + && old_edge->next_callee->speculative > + && indirect->indirect_info > + && indirect->indirect_info->num_of_ics > 1) Maybe indirect->indirect_info && indirect->indirect_info->num_of_ics can be put into a function indirect->has_indirect_call_p () ? > + { > + /* Some speculative calls may contain more than > + one direct target, loop iterate it to clone all > + related direct edges before cloning the related > + indirect edge. */ > + id->dst_node->clone_reference (ref, stmt); > + > + edge = old_edge->next_callee; > + edge = edge->clone (id->dst_node, call_stmt, > + gimple_uid (stmt), num, den, > + true); > + old_edge = old_edge->next_callee; > + gcc_assert (!edge->indirect_unknown_callee); > + old_edge->speculative_call_info (direct, indirect, > + ref); > + } > > profile_count indir_cnt = indirect->count; > indirect = indirect->clone (id->dst_node, call_stmt, > diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c > index 554a8c98419..f35bd3f9d7a 100644 > --- a/gcc/tree-profile.c > +++ b/gcc/tree-profile.c > @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field; > /* Do initialization work for the edge profiler. */ > > /* Add code: > - __thread gcov* __gcov_indirect_call_counters; // pointer to actual > counter > - __thread void* __gcov_indirect_call_callee; // actual callee address > + __thread gcov* __gcov_indirect_call.counters; // pointer to actual > counter > + __thread void* __gcov_indirect_call.callee; // actual callee address > __thread int __gcov_function_counter; // time profiler function counter > */ > static void > @@ -384,7 +384,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned > tag, unsigned base) > f_1 = foo; > __gcov_indirect_call.counters = &__gcov4.main[0]; > PROF_9 = f_1; > - __gcov_indirect_call_callee = PROF_9; > + __gcov_indirect_call.callee = PROF_9; > _4 = f_1 (); > */ > > @@ -447,11 +447,11 @@ gimple_gen_ic_func_profiler (void) > > /* Insert code: > > - if (__gcov_indirect_call_callee != NULL) > + if (__gcov_indirect_call.callee != NULL) > __gcov_indirect_call_profiler_v3 (profile_id, ¤t_function_decl); > > The function __gcov_indirect_call_profiler_v3 is responsible for > - resetting __gcov_indirect_call_callee to NULL. */ > + resetting __gcov_indirect_call.callee to NULL. */ > > gimple_stmt_iterator gsi = gsi_start_bb (cond_bb); > void0 = build_int_cst (ptr_type_node, 0); > @@ -893,7 +893,7 @@ pass_ipa_tree_profile::gate (function *) > { > /* When profile instrumentation, use or test coverage shall be performed. > But for AutoFDO, this there is no instrumentation, thus this pass is > - diabled. */ > + disabled. */ > return (!in_lto_p && !flag_auto_profile > && (flag_branch_probabilities || flag_test_coverage > || profile_arc_flag)); > diff --git a/gcc/value-prof.c b/gcc/value-prof.c > index 759458868a8..cbcc104bd5c 100644 > --- a/gcc/value-prof.c > +++ b/gcc/value-prof.c Please rebase changes in this file. I installed recently changes that collide with your patch. Thanks, Martin > @@ -1404,11 +1404,10 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node > *direct_call, > return dcall_stmt; > } > > -/* > - For every checked indirect/virtual call determine if most common pid of > - function/class method has probability more than 50%. If yes modify code of > - this call to: > - */ > +/* There maybe multiple indirect targets in histogram. Check every > + indirect/virtual call if callee function exists, if not exist, leave it to > + LTO stage for later process. Modify code of this indirect call to an > if-else > + structure in ipa-profile finally. */ > > static bool > gimple_ic_transform (gimple_stmt_iterator *gsi) > @@ -1432,54 +1431,65 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) > if (!histogram) > return false; > > - if (!get_nth_most_common_value (NULL, "indirect call", histogram, &val, > - &count, &all)) > - return false; > + count = 0; > + all = histogram->hvalue.counters[0]; > > - if (4 * count <= 3 * all) > - return false; > + for (unsigned j = 0; j < GCOV_TOPN_VALUES; j++) > + { > + if (!get_nth_most_common_value (NULL, "indirect call", histogram, &val, > + &count, &all, j)) > + continue; > > - direct_call = find_func_by_profile_id ((int)val); > + if (4 * count <= all) > + continue; > > - if (direct_call == NULL) > - { > - if (val) > + direct_call = find_func_by_profile_id ((int) val); > + > + if (direct_call == NULL) > + { > + if (val) > + { > + if (dump_file) > + { > + fprintf (dump_file, > + "Indirect call -> direct call from other module"); > + print_generic_expr (dump_file, gimple_call_fn (stmt), > + TDF_SLIM); > + fprintf (dump_file, "=> %i (will resolve only with LTO)\n", > + (int) val); > + } > + } > + continue; > + } > + > + if (!check_ic_target (stmt, direct_call)) > { > if (dump_file) > { > - fprintf (dump_file, "Indirect call -> direct call from other > module"); > + fprintf (dump_file, "Indirect call -> direct call "); > print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); > - fprintf (dump_file, "=> %i (will resolve only with LTO)\n", > (int)val); > + fprintf (dump_file, "=> "); > + print_generic_expr (dump_file, direct_call->decl, TDF_SLIM); > + fprintf (dump_file, > + " transformation skipped because of type mismatch"); > + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > } > + gimple_remove_histogram_value (cfun, stmt, histogram); > + return false; > } > - return false; > - } > > - if (!check_ic_target (stmt, direct_call)) > - { > if (dump_file) > { > fprintf (dump_file, "Indirect call -> direct call "); > print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); > fprintf (dump_file, "=> "); > print_generic_expr (dump_file, direct_call->decl, TDF_SLIM); > - fprintf (dump_file, " transformation skipped because of type > mismatch"); > + fprintf (dump_file, > + " transformation on insn postponed to ipa-profile"); > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > + fprintf (dump_file, "hist->count %" PRId64 " hist->all %" PRId64 "\n", > + count, all); > } > - gimple_remove_histogram_value (cfun, stmt, histogram); > - return false; > - } > - > - if (dump_file) > - { > - fprintf (dump_file, "Indirect call -> direct call "); > - print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); > - fprintf (dump_file, "=> "); > - print_generic_expr (dump_file, direct_call->decl, TDF_SLIM); > - fprintf (dump_file, " transformation on insn postponned to > ipa-profile"); > - print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > - fprintf (dump_file, "hist->count %" PRId64 > - " hist->all %" PRId64"\n", count, all); > } > > return true; >