> For non-local nodes which can have unknown callers, the algorithm just
> takes half of the counts - we may decide that taking just a third or
> some other portion is more reasonable, but I do not think we can
> attempt anything more clever.

Can't you just sum the calling edges and subtract it from callee's
count?
> 2021-08-23  Martin Jambor  <mjam...@suse.cz>
> 
>       * ipa-cp.c (struct caller_statistics): New fields rec_count_sum,
>       n_nonrec_calls and itself, document all fields.
>       (init_caller_stats): Initialize the above new fields.
>       (gather_caller_stats): Gather self-recursive counts and calls number.
>       (get_info_about_necessary_edges): Gather counts of self-recursive and
>       other edges bringing in the requested value separately.
>       (dump_profile_updates): Rework to dump info about a single node only.
>       (lenient_count_portion_handling): New function.
>       (struct gather_other_count_struct): New type.
>       (gather_count_of_non_rec_edges): New function.
>       (struct desc_incoming_count_struct): New type.
>       (analyze_clone_icoming_counts): New function.
>       (adjust_clone_incoming_counts): Likewise.
>       (update_counts_for_self_gen_clones): Likewise.
>       (update_profiling_info): Rewritten.
>       (update_specialized_profile): Adjust call to dump_profile_updates.
>       (create_specialized_node): Do not update profiling info.
>       (decide_about_value): New parameter self_gen_clones, either push new
>       clones into it or updat their profile counts.  For self-recursively
>       generated values, use a portion of the node count instead of count
>       from self-recursive edges to estimate goodness.
>       (decide_whether_version_node): Gather clones for self-generated values
>       in a new vector, update their profiles at once at the end.
> ---
>  gcc/ipa-cp.c | 543 +++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 457 insertions(+), 86 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index b987d975793..53cca7aa804 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -701,20 +701,36 @@ ipcp_versionable_function_p (struct cgraph_node *node)
>  
>  struct caller_statistics
>  {
> +  /* If requested (see below), self-recursive call counts are summed into 
> this
> +     field.  */
> +  profile_count rec_count_sum;
> +  /* The sum of all ipa counts of all the other (non-recursive) calls.  */
>    profile_count count_sum;
> +  /* Sum of all frequencies for all calls.  */
>    sreal freq_sum;
> +  /* Number of calls and hot calls respectively.  */
>    int n_calls, n_hot_calls;
> +  /* If itself is set up, also count the number of non-self-recursive
> +     calls.  */
> +  int n_nonrec_calls;
> +  /* If non-NULL, this is the node itself and calls from it should have their
> +     counts included in rec_count_sum and not count_sum.  */
> +  cgraph_node *itself;
>  };
>  
> +/* With partial train run we do not want to assume that original's count is
> +   zero whenever we redurect all executed edges to clone.  Simply drop 
> profile
> +   to local one in this case.  In eany case, return the new value.  ORIG_NODE
> +   is the original node and its count has not been updaed yet.  */
> +
> +profile_count
> +lenient_count_portion_handling (profile_count remainder, cgraph_node 
> *orig_node)
> +{
> +  if (remainder.ipa_p () && !remainder.ipa ().nonzero_p ()
> +      && orig_node->count.ipa_p () && orig_node->count.ipa ().nonzero_p ()
> +      && opt_for_fn (orig_node->decl, flag_profile_partial_training))
> +    remainder = remainder.guessed_local ();

I do not think you need partial training flag here.  You should see IPA
profile is mising by simply testing ipa_p predicate on relevant counts.
> +
> +/* If caller edge counts of a clone created for a self-recursive arithmetic 
> jump
> +   function must be adjusted, do so. NODE is the node or its thunk.  */

I would add comment on why it needs to be adjusted and how.
> +
> +static void
> +adjust_clone_incoming_counts (cgraph_node *node,
> +                           desc_incoming_count_struct *desc)
> +{
> +  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
> +    if (cs->caller->thunk)
> +      {
> +     adjust_clone_incoming_counts (cs->caller, desc);
> +     profile_count sum = profile_count::zero ();
> +     for (cgraph_edge *e = cs->caller->callers; e; e = e->next_caller)
> +       if (e->count.initialized_p ())
> +         sum += e->count.ipa ();
> +     cs->count = cs->count.combine_with_ipa_count (sum);
> +      }
> +    else if (!desc->processed_edges->contains (cs)
> +          && cs->caller->clone_of == desc->orig)
> +      {
> +     cs->count += desc->count;
> +     if (dump_file)
> +       {
> +         fprintf (dump_file, "       Adjusted count of an incoming edge of "
> +                  "a clone %s -> %s to ", cs->caller->dump_name (),
> +                  cs->callee->dump_name ());
> +         cs->count.dump (dump_file);
> +         fprintf (dump_file, "\n");
> +       }
> +      }
> +}

Otherwise the patch looks OK.

Honza

Reply via email to