Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue

2022-04-29 Thread Richard Biener via Gcc-patches
On Thu, 28 Apr 2022, Andre Vieira (lists) wrote:

> 
> On 27/04/2022 15:03, Richard Biener wrote:
> > On Wed, 27 Apr 2022, Richard Biener wrote:
> >
> >> The following makes sure to take into account prologue peeling
> >> when trying to narrow down the maximum number of iterations
> >> computed for the epilogue of a vectorized epilogue.
> >>
> >> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >>
> >> I did not verify this solves the original aarch64 testcase yet
> >> but it looks like a simpler fix and explains why I don't see
> >> the issue on the 11 branch which does otherwise the same transforms.
> > Can you verify the original aarch64 case is fixed and maybe add
> > a testcase for that to gcc.target/aarch64?
> 
> Your committed patch fixes the original testcase, but I'll stick to the
> testcase you have as it's less fiddly, but I will add the options required to
> make it originally fail for aarch64.
> 
> Is the attached patch OK for trunk?

Yes.

Thanks,
Richard.

> 
>     gcc/ChangeLog:
> 
>     PR tree-optimization/105219
>     * testsuite/gcc.dg/vect/pr105219.c: Add aarch64 target option.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue

2022-04-28 Thread Andre Vieira (lists) via Gcc-patches


On 27/04/2022 15:03, Richard Biener wrote:

On Wed, 27 Apr 2022, Richard Biener wrote:


The following makes sure to take into account prologue peeling
when trying to narrow down the maximum number of iterations
computed for the epilogue of a vectorized epilogue.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

I did not verify this solves the original aarch64 testcase yet
but it looks like a simpler fix and explains why I don't see
the issue on the 11 branch which does otherwise the same transforms.

Can you verify the original aarch64 case is fixed and maybe add
a testcase for that to gcc.target/aarch64?


Your committed patch fixes the original testcase, but I'll stick to the 
testcase you have as it's less fiddly, but I will add the options 
required to make it originally fail for aarch64.


Is the attached patch OK for trunk?


    gcc/ChangeLog:

    PR tree-optimization/105219
    * testsuite/gcc.dg/vect/pr105219.c: Add aarch64 target option.
diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c 
b/gcc/testsuite/gcc.dg/vect/pr105219.c
index 
0cb7ae2f4d6dc2a236740d34b59b771255b067b7..4bca5bbba30a9740a54e6205bc0d0c8011070977
 100644
--- a/gcc/testsuite/gcc.dg/vect/pr105219.c
+++ b/gcc/testsuite/gcc.dg/vect/pr105219.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-additional-options "-O3" } */
 /* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */
+/* { dg-additional-options "-mtune=thunderx" { target aarch64*-*-* } } */
 
 #include "tree-vect.h"
 


Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue

2022-04-27 Thread Richard Biener via Gcc-patches
On Wed, 27 Apr 2022, Richard Biener wrote:

> The following makes sure to take into account prologue peeling
> when trying to narrow down the maximum number of iterations
> computed for the epilogue of a vectorized epilogue.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> I did not verify this solves the original aarch64 testcase yet
> but it looks like a simpler fix and explains why I don't see
> the issue on the 11 branch which does otherwise the same transforms.

So the following also includes ->peeling_for_gaps which should have
a similar issue.  We discussed being more precise here but I think
we should do this as followup during stage1.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Can you verify the original aarch64 case is fixed and maybe add
a testcase for that to gcc.target/aarch64?

OK for trunk?

Thanks,
Richard.
>From 03f437c9b2d5260888535e0dad570daee694e4b3 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 27 Apr 2022 14:06:12 +0200
Subject: [PATCH] tree-optimization/105219 - bogus max iters for vectorized
 epilogue
To: gcc-patches@gcc.gnu.org

The following makes sure to take into account prologue peeling
when trying to narrow down the maximum number of iterations
computed for the vectorized epilogue.  A similar issue exists when
peeling for gaps.

2022-04-27  Richard Biener  

PR tree-optimization/105219
* tree-vect-loop.cc (vect_transform_loop): Disable
special code narrowing the vectorized epilogue max
iterations when peeling for alignment or gaps was in effect.

* gcc.dg/vect/pr105219.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr105219.c | 29 
 gcc/tree-vect-loop.cc|  6 +-
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c 
b/gcc/testsuite/gcc.dg/vect/pr105219.c
new file mode 100644
index 000..0cb7ae2f4d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr105219.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */
+
+#include "tree-vect.h"
+
+int data[128];
+
+void __attribute((noipa))
+foo (int *data, int n)
+{
+  for (int i = 0; i < n; ++i)
+data[i] = i;
+}
+
+int main()
+{
+  check_vect ();
+  for (int start = 0; start < 16; ++start)
+for (int n = 1; n < 3*16; ++n)
+  {
+__builtin_memset (data, 0, sizeof (data));
+foo ([start], n);
+for (int j = 0; j < n; ++j)
+  if (data[start + j] != j)
+__builtin_abort ();
+  }
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7bc34636bd..f53a634a390 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9977,7 +9977,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
lowest_vf) - 1
   : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
 lowest_vf) - 1);
-  if (main_vinfo)
+  if (main_vinfo
+ /* Both peeling for alignment and peeling for gaps can end up
+with the scalar epilogue running for more than VF-1 iterations.  */
+ && !main_vinfo->peeling_for_alignment
+ && !main_vinfo->peeling_for_gaps)
{
  unsigned int bound;
  poly_uint64 main_iters
-- 
2.34.1



[PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue

2022-04-27 Thread Richard Biener via Gcc-patches
The following makes sure to take into account prologue peeling
when trying to narrow down the maximum number of iterations
computed for the epilogue of a vectorized epilogue.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

I did not verify this solves the original aarch64 testcase yet
but it looks like a simpler fix and explains why I don't see
the issue on the 11 branch which does otherwise the same transforms.

Richard.

2022-04-27  Richard Biener  

PR tree-optimization/105219
* tree-vect-loop.cc (vect_transform_loop): Disable
special code narrowing the vectorized epilogue epilogue
max iterations when peeling for alignment was in effect.

* gcc.dg/vect/pr105219.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr105219.c | 29 
 gcc/tree-vect-loop.cc|  2 +-
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c 
b/gcc/testsuite/gcc.dg/vect/pr105219.c
new file mode 100644
index 000..0cb7ae2f4d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr105219.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */
+
+#include "tree-vect.h"
+
+int data[128];
+
+void __attribute((noipa))
+foo (int *data, int n)
+{
+  for (int i = 0; i < n; ++i)
+data[i] = i;
+}
+
+int main()
+{
+  check_vect ();
+  for (int start = 0; start < 16; ++start)
+for (int n = 1; n < 3*16; ++n)
+  {
+__builtin_memset (data, 0, sizeof (data));
+foo ([start], n);
+for (int j = 0; j < n; ++j)
+  if (data[start + j] != j)
+__builtin_abort ();
+  }
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7bc34636bd..217abab814b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9977,7 +9977,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
lowest_vf) - 1
   : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
 lowest_vf) - 1);
-  if (main_vinfo)
+  if (main_vinfo && !main_vinfo->peeling_for_alignment)
{
  unsigned int bound;
  poly_uint64 main_iters
-- 
2.34.1