Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue
On Thu, 28 Apr 2022, Andre Vieira (lists) wrote: > > On 27/04/2022 15:03, Richard Biener wrote: > > On Wed, 27 Apr 2022, Richard Biener wrote: > > > >> The following makes sure to take into account prologue peeling > >> when trying to narrow down the maximum number of iterations > >> computed for the epilogue of a vectorized epilogue. > >> > >> Bootstrap & regtest running on x86_64-unknown-linux-gnu. > >> > >> I did not verify this solves the original aarch64 testcase yet > >> but it looks like a simpler fix and explains why I don't see > >> the issue on the 11 branch which does otherwise the same transforms. > > Can you verify the original aarch64 case is fixed and maybe add > > a testcase for that to gcc.target/aarch64? > > Your committed patch fixes the original testcase, but I'll stick to the > testcase you have as it's less fiddly, but I will add the options required to > make it originally fail for aarch64. > > Is the attached patch OK for trunk? Yes. Thanks, Richard. > > gcc/ChangeLog: > > PR tree-optimization/105219 > * testsuite/gcc.dg/vect/pr105219.c: Add aarch64 target option. > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue
On 27/04/2022 15:03, Richard Biener wrote: On Wed, 27 Apr 2022, Richard Biener wrote: The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the epilogue of a vectorized epilogue. Bootstrap & regtest running on x86_64-unknown-linux-gnu. I did not verify this solves the original aarch64 testcase yet but it looks like a simpler fix and explains why I don't see the issue on the 11 branch which does otherwise the same transforms. Can you verify the original aarch64 case is fixed and maybe add a testcase for that to gcc.target/aarch64? Your committed patch fixes the original testcase, but I'll stick to the testcase you have as it's less fiddly, but I will add the options required to make it originally fail for aarch64. Is the attached patch OK for trunk? gcc/ChangeLog: PR tree-optimization/105219 * testsuite/gcc.dg/vect/pr105219.c: Add aarch64 target option. diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c index 0cb7ae2f4d6dc2a236740d34b59b771255b067b7..4bca5bbba30a9740a54e6205bc0d0c8011070977 100644 --- a/gcc/testsuite/gcc.dg/vect/pr105219.c +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-additional-options "-O3" } */ /* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */ +/* { dg-additional-options "-mtune=thunderx" { target aarch64*-*-* } } */ #include "tree-vect.h"
Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue
On Wed, 27 Apr 2022, Richard Biener wrote: > The following makes sure to take into account prologue peeling > when trying to narrow down the maximum number of iterations > computed for the epilogue of a vectorized epilogue. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > I did not verify this solves the original aarch64 testcase yet > but it looks like a simpler fix and explains why I don't see > the issue on the 11 branch which does otherwise the same transforms. So the following also includes ->peeling_for_gaps which should have a similar issue. We discussed being more precise here but I think we should do this as followup during stage1. Bootstrapped and tested on x86_64-unknown-linux-gnu. Can you verify the original aarch64 case is fixed and maybe add a testcase for that to gcc.target/aarch64? OK for trunk? Thanks, Richard. >From 03f437c9b2d5260888535e0dad570daee694e4b3 Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Wed, 27 Apr 2022 14:06:12 +0200 Subject: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue To: gcc-patches@gcc.gnu.org The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the vectorized epilogue. A similar issue exists when peeling for gaps. 2022-04-27 Richard Biener PR tree-optimization/105219 * tree-vect-loop.cc (vect_transform_loop): Disable special code narrowing the vectorized epilogue max iterations when peeling for alignment or gaps was in effect. * gcc.dg/vect/pr105219.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr105219.c | 29 gcc/tree-vect-loop.cc| 6 +- 2 files changed, 34 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c new file mode 100644 index 000..0cb7ae2f4d6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */ + +#include "tree-vect.h" + +int data[128]; + +void __attribute((noipa)) +foo (int *data, int n) +{ + for (int i = 0; i < n; ++i) +data[i] = i; +} + +int main() +{ + check_vect (); + for (int start = 0; start < 16; ++start) +for (int n = 1; n < 3*16; ++n) + { +__builtin_memset (data, 0, sizeof (data)); +foo ([start], n); +for (int j = 0; j < n; ++j) + if (data[start + j] != j) +__builtin_abort (); + } + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7bc34636bd..f53a634a390 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9977,7 +9977,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) lowest_vf) - 1 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, lowest_vf) - 1); - if (main_vinfo) + if (main_vinfo + /* Both peeling for alignment and peeling for gaps can end up +with the scalar epilogue running for more than VF-1 iterations. */ + && !main_vinfo->peeling_for_alignment + && !main_vinfo->peeling_for_gaps) { unsigned int bound; poly_uint64 main_iters -- 2.34.1
[PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue
The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the epilogue of a vectorized epilogue. Bootstrap & regtest running on x86_64-unknown-linux-gnu. I did not verify this solves the original aarch64 testcase yet but it looks like a simpler fix and explains why I don't see the issue on the 11 branch which does otherwise the same transforms. Richard. 2022-04-27 Richard Biener PR tree-optimization/105219 * tree-vect-loop.cc (vect_transform_loop): Disable special code narrowing the vectorized epilogue epilogue max iterations when peeling for alignment was in effect. * gcc.dg/vect/pr105219.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr105219.c | 29 gcc/tree-vect-loop.cc| 2 +- 2 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c new file mode 100644 index 000..0cb7ae2f4d6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */ + +#include "tree-vect.h" + +int data[128]; + +void __attribute((noipa)) +foo (int *data, int n) +{ + for (int i = 0; i < n; ++i) +data[i] = i; +} + +int main() +{ + check_vect (); + for (int start = 0; start < 16; ++start) +for (int n = 1; n < 3*16; ++n) + { +__builtin_memset (data, 0, sizeof (data)); +foo ([start], n); +for (int j = 0; j < n; ++j) + if (data[start + j] != j) +__builtin_abort (); + } + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7bc34636bd..217abab814b 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9977,7 +9977,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) lowest_vf) - 1 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, lowest_vf) - 1); - if (main_vinfo) + if (main_vinfo && !main_vinfo->peeling_for_alignment) { unsigned int bound; poly_uint64 main_iters -- 2.34.1