> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Monday, January 8, 2024 11:29 AM > To: gcc-patches@gcc.gnu.org > Cc: Tamar Christina <tamar.christ...@arm.com> > Subject: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases > > The following avoids creating a niter peeling epilog more consistently, > matching what peeling later uses for the skip_vector condition, in > particular when versioning is required which then also ensures the > vector loop is entered unless the epilog is vectorized. This should > ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed > later, some refactoring could make that better matching. > > The patch also makes sure to adjust the upper bound of the epilogues > when we do not have a skip edge around the vector loop. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. Tamar, does > that look OK wrt early-breaks?
Yeah the value looks correct, I did find a few cases where the niters should actually be higher for skip_vector, namely when of the breaks forces ncopies > 1 and we have a break condition that requires all values to be true to continue. The code is not wrong in that case, just executes a completely useless vector iters. But that's unrelated, this looks correct because it means bound_scalar is not set, in which case there's no difference between one and multiple exits. Thanks, Tamar > > Thanks, > Richard. > > PR tree-optimization/113026 > * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): > Avoid an epilog in more cases. > * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the > epilogues niter upper bounds and estimates. > > * gcc.dg/torture/pr113026-1.c: New testcase. > * gcc.dg/torture/pr113026-2.c: Likewise. > --- > gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 ++++++++ > gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 +++++++++++++ > gcc/tree-vect-loop-manip.cc | 32 +++++++++++++++++++++++ > gcc/tree-vect-loop.cc | 6 ++++- > 4 files changed, 66 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c > create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c > b/gcc/testsuite/gcc.dg/torture/pr113026-1.c > new file mode 100644 > index 00000000000..56dfef3b36c > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Wall" } */ > + > +char dst[16]; > + > +void > +foo (char *src, long n) > +{ > + for (long i = 0; i < n; i++) > + dst[i] = src[i]; /* { dg-bogus "" } */ > +} > diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c > b/gcc/testsuite/gcc.dg/torture/pr113026-2.c > new file mode 100644 > index 00000000000..b9d5857a403 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Wall" } */ > + > +char dst1[17]; > +void > +foo1 (char *src, long n) > +{ > + for (long i = 0; i < n; i++) > + dst1[i] = src[i]; /* { dg-bogus "" } */ > +} > + > +char dst2[18]; > +void > +foo2 (char *src, long n) > +{ > + for (long i = 0; i < n; i++) > + dst2[i] = src[i]; /* { dg-bogus "" } */ > +} > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 9330183bfb9..927f76a0947 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count > (); > bb_before_epilog = loop_preheader_edge (epilog)->src; > } > + else > + { > + /* When we do not have a loop-around edge to the epilog we know > + the vector loop covered at least VF scalar iterations unless > + we have early breaks and the epilog will cover at most > + VF - 1 + gap peeling iterations. > + Update any known upper bound with this knowledge. */ > + if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > + { > + if (epilog->any_upper_bound) > + epilog->nb_iterations_upper_bound -= lowest_vf; > + if (epilog->any_likely_upper_bound) > + epilog->nb_iterations_likely_upper_bound -= lowest_vf; > + if (epilog->any_estimate) > + epilog->nb_iterations_estimate -= lowest_vf; > + } > + unsigned HOST_WIDE_INT const_vf; > + if (vf.is_constant (&const_vf)) > + { > + const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1; > + if (epilog->any_upper_bound) > + epilog->nb_iterations_upper_bound > + = wi::umin (epilog->nb_iterations_upper_bound, const_vf); > + if (epilog->any_likely_upper_bound) > + epilog->nb_iterations_likely_upper_bound > + = wi::umin (epilog->nb_iterations_likely_upper_bound, > + const_vf); > + if (epilog->any_estimate) > + epilog->nb_iterations_estimate > + = wi::umin (epilog->nb_iterations_estimate, const_vf); > + } > + } > > /* If loop is peeled for non-zero constant times, now niters refers to > orig_niters - prolog_peeling, it won't overflow even the orig_niters > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index a06771611ac..9dd573ef125 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -1261,7 +1261,11 @@ vect_need_peeling_or_partial_vectors_p > (loop_vec_info loop_vinfo) > the epilogue is unnecessary. */ > && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) > || ((unsigned HOST_WIDE_INT) max_niter > - > (th / const_vf) * const_vf)))) > + /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD > + but that's only computed later based on our result. > + The following is the most conservative approximation. */ > + > (std::max ((unsigned HOST_WIDE_INT) th, > + const_vf) / const_vf) * const_vf)))) > return true; > > return false; > -- > 2.35.3