> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Monday, January 8, 2024 11:29 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina <tamar.christ...@arm.com>
> Subject: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases
> 
> The following avoids creating a niter peeling epilog more consistently,
> matching what peeling later uses for the skip_vector condition, in
> particular when versioning is required which then also ensures the
> vector loop is entered unless the epilog is vectorized.  This should
> ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
> later, some refactoring could make that better matching.
> 
> The patch also makes sure to adjust the upper bound of the epilogues
> when we do not have a skip edge around the vector loop.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Tamar, does
> that look OK wrt early-breaks?

Yeah the value looks correct, I did find a few cases where the niters should 
actually be
higher for skip_vector, namely when of the breaks forces ncopies > 1 and we 
have a
break condition that requires all values to be true to continue.

The code is not wrong in that case, just executes a completely useless vector 
iters.

But that's unrelated, this looks correct because it means bound_scalar is not 
set, in
which case there's no difference between one and multiple exits.

Thanks,
Tamar

> 
> Thanks,
> Richard.
> 
>       PR tree-optimization/113026
>       * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
>       Avoid an epilog in more cases.
>       * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
>       epilogues niter upper bounds and estimates.
> 
>       * gcc.dg/torture/pr113026-1.c: New testcase.
>       * gcc.dg/torture/pr113026-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 ++++++++
>  gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 +++++++++++++
>  gcc/tree-vect-loop-manip.cc               | 32 +++++++++++++++++++++++
>  gcc/tree-vect-loop.cc                     |  6 ++++-
>  4 files changed, 66 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> new file mode 100644
> index 00000000000..56dfef3b36c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +char dst[16];
> +
> +void
> +foo (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +    dst[i] = src[i]; /* { dg-bogus "" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> new file mode 100644
> index 00000000000..b9d5857a403
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +char dst1[17];
> +void
> +foo1 (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +    dst1[i] = src[i]; /* { dg-bogus "" } */
> +}
> +
> +char dst2[18];
> +void
> +foo2 (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +    dst2[i] = src[i]; /* { dg-bogus "" } */
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 9330183bfb9..927f76a0947 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
>           bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count
> ();
>         bb_before_epilog = loop_preheader_edge (epilog)->src;
>       }
> +      else
> +     {
> +       /* When we do not have a loop-around edge to the epilog we know
> +          the vector loop covered at least VF scalar iterations unless
> +          we have early breaks and the epilog will cover at most
> +          VF - 1 + gap peeling iterations.
> +          Update any known upper bound with this knowledge.  */
> +       if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +         {
> +           if (epilog->any_upper_bound)
> +             epilog->nb_iterations_upper_bound -= lowest_vf;
> +           if (epilog->any_likely_upper_bound)
> +             epilog->nb_iterations_likely_upper_bound -= lowest_vf;
> +           if (epilog->any_estimate)
> +             epilog->nb_iterations_estimate -= lowest_vf;
> +         }
> +       unsigned HOST_WIDE_INT const_vf;
> +       if (vf.is_constant (&const_vf))
> +         {
> +           const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1;
> +           if (epilog->any_upper_bound)
> +             epilog->nb_iterations_upper_bound
> +               = wi::umin (epilog->nb_iterations_upper_bound, const_vf);
> +           if (epilog->any_likely_upper_bound)
> +             epilog->nb_iterations_likely_upper_bound
> +               = wi::umin (epilog->nb_iterations_likely_upper_bound,
> +                           const_vf);
> +           if (epilog->any_estimate)
> +             epilog->nb_iterations_estimate
> +               = wi::umin (epilog->nb_iterations_estimate, const_vf);
> +         }
> +     }
> 
>        /* If loop is peeled for non-zero constant times, now niters refers to
>        orig_niters - prolog_peeling, it won't overflow even the orig_niters
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index a06771611ac..9dd573ef125 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1261,7 +1261,11 @@ vect_need_peeling_or_partial_vectors_p
> (loop_vec_info loop_vinfo)
>            the epilogue is unnecessary.  */
>         && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>             || ((unsigned HOST_WIDE_INT) max_niter
> -               > (th / const_vf) * const_vf))))
> +               /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
> +                  but that's only computed later based on our result.
> +                  The following is the most conservative approximation.  */
> +               > (std::max ((unsigned HOST_WIDE_INT) th,
> +                            const_vf) / const_vf) * const_vf))))
>      return true;
> 
>    return false;
> --
> 2.35.3

Reply via email to