> For non-local nodes which can have unknown callers, the algorithm just > takes half of the counts - we may decide that taking just a third or > some other portion is more reasonable, but I do not think we can > attempt anything more clever.
Can't you just sum the calling edges and subtract it from callee's count? > 2021-08-23 Martin Jambor <mjam...@suse.cz> > > * ipa-cp.c (struct caller_statistics): New fields rec_count_sum, > n_nonrec_calls and itself, document all fields. > (init_caller_stats): Initialize the above new fields. > (gather_caller_stats): Gather self-recursive counts and calls number. > (get_info_about_necessary_edges): Gather counts of self-recursive and > other edges bringing in the requested value separately. > (dump_profile_updates): Rework to dump info about a single node only. > (lenient_count_portion_handling): New function. > (struct gather_other_count_struct): New type. > (gather_count_of_non_rec_edges): New function. > (struct desc_incoming_count_struct): New type. > (analyze_clone_icoming_counts): New function. > (adjust_clone_incoming_counts): Likewise. > (update_counts_for_self_gen_clones): Likewise. > (update_profiling_info): Rewritten. > (update_specialized_profile): Adjust call to dump_profile_updates. > (create_specialized_node): Do not update profiling info. > (decide_about_value): New parameter self_gen_clones, either push new > clones into it or updat their profile counts. For self-recursively > generated values, use a portion of the node count instead of count > from self-recursive edges to estimate goodness. > (decide_whether_version_node): Gather clones for self-generated values > in a new vector, update their profiles at once at the end. > --- > gcc/ipa-cp.c | 543 +++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 457 insertions(+), 86 deletions(-) > > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index b987d975793..53cca7aa804 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -701,20 +701,36 @@ ipcp_versionable_function_p (struct cgraph_node *node) > > struct caller_statistics > { > + /* If requested (see below), self-recursive call counts are summed into > this > + field. */ > + profile_count rec_count_sum; > + /* The sum of all ipa counts of all the other (non-recursive) calls. */ > profile_count count_sum; > + /* Sum of all frequencies for all calls. */ > sreal freq_sum; > + /* Number of calls and hot calls respectively. */ > int n_calls, n_hot_calls; > + /* If itself is set up, also count the number of non-self-recursive > + calls. */ > + int n_nonrec_calls; > + /* If non-NULL, this is the node itself and calls from it should have their > + counts included in rec_count_sum and not count_sum. */ > + cgraph_node *itself; > }; > > +/* With partial train run we do not want to assume that original's count is > + zero whenever we redurect all executed edges to clone. Simply drop > profile > + to local one in this case. In eany case, return the new value. ORIG_NODE > + is the original node and its count has not been updaed yet. */ > + > +profile_count > +lenient_count_portion_handling (profile_count remainder, cgraph_node > *orig_node) > +{ > + if (remainder.ipa_p () && !remainder.ipa ().nonzero_p () > + && orig_node->count.ipa_p () && orig_node->count.ipa ().nonzero_p () > + && opt_for_fn (orig_node->decl, flag_profile_partial_training)) > + remainder = remainder.guessed_local (); I do not think you need partial training flag here. You should see IPA profile is mising by simply testing ipa_p predicate on relevant counts. > + > +/* If caller edge counts of a clone created for a self-recursive arithmetic > jump > + function must be adjusted, do so. NODE is the node or its thunk. */ I would add comment on why it needs to be adjusted and how. > + > +static void > +adjust_clone_incoming_counts (cgraph_node *node, > + desc_incoming_count_struct *desc) > +{ > + for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller) > + if (cs->caller->thunk) > + { > + adjust_clone_incoming_counts (cs->caller, desc); > + profile_count sum = profile_count::zero (); > + for (cgraph_edge *e = cs->caller->callers; e; e = e->next_caller) > + if (e->count.initialized_p ()) > + sum += e->count.ipa (); > + cs->count = cs->count.combine_with_ipa_count (sum); > + } > + else if (!desc->processed_edges->contains (cs) > + && cs->caller->clone_of == desc->orig) > + { > + cs->count += desc->count; > + if (dump_file) > + { > + fprintf (dump_file, " Adjusted count of an incoming edge of " > + "a clone %s -> %s to ", cs->caller->dump_name (), > + cs->callee->dump_name ()); > + cs->count.dump (dump_file); > + fprintf (dump_file, "\n"); > + } > + } > +} Otherwise the patch looks OK. Honza