Re: [PATCH] loop: Fix profile updates after unrolling [PR102385]

2021-10-08 Thread Richard Biener via Gcc-patches
On Tue, Oct 5, 2021 at 3:39 PM Richard Sandiford via Gcc-patches
 wrote:
>
> In g:62acc72a957b5614 I'd stopped the unroller from using
> an epilogue loop in cases where the iteration count was
> known to be a multiple of the unroll factor.  The epilogue
> and non-epilogue cases still shared this (preexisting) code
> to update the edge frequencies:
>
>   basic_block exit_bb = single_pred (loop->latch);
>   new_exit = find_edge (exit_bb, rest);
>   new_exit->probability = profile_probability::always ()
>.apply_scale (1, new_est_niter + 1);
>   [etc]
>
> But of course (in hindsight) that only makes sense for the
> epilogue case, where we've already moved the main loop's exit edge
> to be a sibling of the latch edge.  For the non-epilogue case,
> the exit edge stays (and needs to stay) in its original position.
>
> I don't really understand what the code is trying to do for
> the epilogue case.  It has:
>
>   /* Ensure that the frequencies in the loop match the new estimated
>  number of iterations, and change the probability of the new
>  exit edge.  */
>
>   profile_count freq_h = loop->header->count;
>   profile_count freq_e = (loop_preheader_edge (loop))->count ();
>   if (freq_h.nonzero_p ())
> {
>   ...
>   scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
> }
>
> Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
> header block, this has the effect of:
>
>   new header count = freq_h * (freq_e / freq_h)
>
> i.e. we say that the header executes exactly as often as the
> preheader edge, which would only make sense if the loop never
> iterates.  Also, after setting the probability of the nonexit edge
> (correctly) to new_est_niter / (new_est_niter + 1), the code does:
>
> scale_bbs_frequencies (>latch, 1, prob);
>
> for this new probability.  I think that only makes sense if the
> nonexit edge was previously unconditional (100%).  But the code
> carefully preserved the probability of the original exit edge
> when creating the new one.
>
> All I'm trying to do here though is fix the mess I created
> and get the probabilities right for the non-epilogue case.
> Things are simpler there since we don't have to worry about
> loop versioning.  Hopefully the comments explain the approach.
>
> The function's current interface implies that it can cope with
> multiple exit edges and that the function only needs the iteration
> count relative to one of those edges in order to work correctly.
> In practice that's not the case: it assumes there is exactly one
> exit edge and all current callers also ensure that the exit test
> dominates the latch.  I think the function is easier to follow
> if we remove the implied generality.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> PR tree-optimization/102385
> * predict.h (change_edge_frequency): Declare.
> * predict.c (change_edge_frequency): New function.
> * tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
> edge argument.
> (tree_unroll_loop): Likewise.
> * gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
> * tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
> Likewise.
> * tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
> (tree_transform_and_unroll_loop): Likewise.  Use single_dom_exit
> to retrieve the exit edges.  Make all the old profile update code
> conditional on !single_loop_p -- the case it was written for --
> and use a different approach for the single-loop case.
>
> gcc/testsuite/
> * testsuite/gcc.dg/pr102385.c: New test.
> ---
>  gcc/gimple-loop-jam.c   |   3 +-
>  gcc/predict.c   |  37 +++
>  gcc/predict.h   |   1 +
>  gcc/testsuite/gcc.dg/pr102385.c |  14 
>  gcc/tree-predcom.c  |   3 +-
>  gcc/tree-ssa-loop-manip.c   | 111 
>  gcc/tree-ssa-loop-manip.h   |   5 +-
>  gcc/tree-ssa-loop-prefetch.c|   3 +-
>  8 files changed, 140 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102385.c
>
> diff --git a/gcc/predict.h b/gcc/predict.h
> index 8860cafa31c..4df51bd615c 100644
> --- a/gcc/predict.h
> +++ b/gcc/predict.h
> @@ -100,6 +100,7 @@ extern void rebuild_frequencies (void);
>  extern void report_predictor_hitrates (void);
>  extern void force_edge_cold (edge, bool);
>  extern void propagate_unlikely_bbs_forward (void);
> +extern void change_edge_frequency (edge, profile_probability);
>
>  extern void add_reg_br_prob_note (rtx_insn *, profile_probability);
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index d9c7249831e..68b11135680 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -4481,6 +4481,43 @@ force_edge_cold (edge e, bool impossible)
>  }
>  }
>

[PATCH] loop: Fix profile updates after unrolling [PR102385]

2021-10-05 Thread Richard Sandiford via Gcc-patches
In g:62acc72a957b5614 I'd stopped the unroller from using
an epilogue loop in cases where the iteration count was
known to be a multiple of the unroll factor.  The epilogue
and non-epilogue cases still shared this (preexisting) code
to update the edge frequencies:

  basic_block exit_bb = single_pred (loop->latch);
  new_exit = find_edge (exit_bb, rest);
  new_exit->probability = profile_probability::always ()
   .apply_scale (1, new_est_niter + 1);
  [etc]

But of course (in hindsight) that only makes sense for the
epilogue case, where we've already moved the main loop's exit edge
to be a sibling of the latch edge.  For the non-epilogue case,
the exit edge stays (and needs to stay) in its original position.

I don't really understand what the code is trying to do for
the epilogue case.  It has:

  /* Ensure that the frequencies in the loop match the new estimated
 number of iterations, and change the probability of the new
 exit edge.  */

  profile_count freq_h = loop->header->count;
  profile_count freq_e = (loop_preheader_edge (loop))->count ();
  if (freq_h.nonzero_p ())
{
  ...
  scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
}

Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
header block, this has the effect of:

  new header count = freq_h * (freq_e / freq_h)

i.e. we say that the header executes exactly as often as the
preheader edge, which would only make sense if the loop never
iterates.  Also, after setting the probability of the nonexit edge
(correctly) to new_est_niter / (new_est_niter + 1), the code does:

scale_bbs_frequencies (>latch, 1, prob);

for this new probability.  I think that only makes sense if the
nonexit edge was previously unconditional (100%).  But the code
carefully preserved the probability of the original exit edge
when creating the new one.

All I'm trying to do here though is fix the mess I created
and get the probabilities right for the non-epilogue case.
Things are simpler there since we don't have to worry about
loop versioning.  Hopefully the comments explain the approach.

The function's current interface implies that it can cope with
multiple exit edges and that the function only needs the iteration
count relative to one of those edges in order to work correctly.
In practice that's not the case: it assumes there is exactly one
exit edge and all current callers also ensure that the exit test
dominates the latch.  I think the function is easier to follow
if we remove the implied generality.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/102385
* predict.h (change_edge_frequency): Declare.
* predict.c (change_edge_frequency): New function.
* tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
edge argument.
(tree_unroll_loop): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
* tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
Likewise.
* tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
(tree_transform_and_unroll_loop): Likewise.  Use single_dom_exit
to retrieve the exit edges.  Make all the old profile update code
conditional on !single_loop_p -- the case it was written for --
and use a different approach for the single-loop case.

gcc/testsuite/
* testsuite/gcc.dg/pr102385.c: New test.
---
 gcc/gimple-loop-jam.c   |   3 +-
 gcc/predict.c   |  37 +++
 gcc/predict.h   |   1 +
 gcc/testsuite/gcc.dg/pr102385.c |  14 
 gcc/tree-predcom.c  |   3 +-
 gcc/tree-ssa-loop-manip.c   | 111 
 gcc/tree-ssa-loop-manip.h   |   5 +-
 gcc/tree-ssa-loop-prefetch.c|   3 +-
 8 files changed, 140 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102385.c

diff --git a/gcc/predict.h b/gcc/predict.h
index 8860cafa31c..4df51bd615c 100644
--- a/gcc/predict.h
+++ b/gcc/predict.h
@@ -100,6 +100,7 @@ extern void rebuild_frequencies (void);
 extern void report_predictor_hitrates (void);
 extern void force_edge_cold (edge, bool);
 extern void propagate_unlikely_bbs_forward (void);
+extern void change_edge_frequency (edge, profile_probability);
 
 extern void add_reg_br_prob_note (rtx_insn *, profile_probability);
 
diff --git a/gcc/predict.c b/gcc/predict.c
index d9c7249831e..68b11135680 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -4481,6 +4481,43 @@ force_edge_cold (edge e, bool impossible)
 }
 }
 
+/* Change E's probability to NEW_E_PROB, redistributing the probabilities
+   of other outgoing edges proportionally.
+
+   Note that this function does not change the profile counts of any
+   basic blocks.  The caller must do that instead, using whatever
+   information it has about the region that