On Wed, 27 Apr 2022, Richard Biener wrote: > The following makes sure to take into account prologue peeling > when trying to narrow down the maximum number of iterations > computed for the epilogue of a vectorized epilogue. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > I did not verify this solves the original aarch64 testcase yet > but it looks like a simpler fix and explains why I don't see > the issue on the 11 branch which does otherwise the same transforms.
So the following also includes ->peeling_for_gaps which should have a similar issue. We discussed being more precise here but I think we should do this as followup during stage1. Bootstrapped and tested on x86_64-unknown-linux-gnu. Can you verify the original aarch64 case is fixed and maybe add a testcase for that to gcc.target/aarch64? OK for trunk? Thanks, Richard. >From 03f437c9b2d5260888535e0dad570daee694e4b3 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguent...@suse.de> Date: Wed, 27 Apr 2022 14:06:12 +0200 Subject: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue To: gcc-patches@gcc.gnu.org The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the vectorized epilogue. A similar issue exists when peeling for gaps. 2022-04-27 Richard Biener <rguent...@suse.de> PR tree-optimization/105219 * tree-vect-loop.cc (vect_transform_loop): Disable special code narrowing the vectorized epilogue max iterations when peeling for alignment or gaps was in effect. * gcc.dg/vect/pr105219.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr105219.c | 29 ++++++++++++++++++++++++++++ gcc/tree-vect-loop.cc | 6 +++++- 2 files changed, 34 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c new file mode 100644 index 00000000000..0cb7ae2f4d6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */ + +#include "tree-vect.h" + +int data[128]; + +void __attribute((noipa)) +foo (int *data, int n) +{ + for (int i = 0; i < n; ++i) + data[i] = i; +} + +int main() +{ + check_vect (); + for (int start = 0; start < 16; ++start) + for (int n = 1; n < 3*16; ++n) + { + __builtin_memset (data, 0, sizeof (data)); + foo (&data[start], n); + for (int j = 0; j < n; ++j) + if (data[start + j] != j) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7bc34636bd..f53a634a390 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9977,7 +9977,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) lowest_vf) - 1 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, lowest_vf) - 1); - if (main_vinfo) + if (main_vinfo + /* Both peeling for alignment and peeling for gaps can end up + with the scalar epilogue running for more than VF-1 iterations. */ + && !main_vinfo->peeling_for_alignment + && !main_vinfo->peeling_for_gaps) { unsigned int bound; poly_uint64 main_iters -- 2.34.1