Re: [PATCH] vect: Fix regression for PR104116

Avinash Jayakar Tue, 21 Oct 2025 01:12:00 -0700

Hi Thomas,

Thanks for reviewing the patch.


On Tue, 2025-10-21 at 09:24 +0200, Thomas Schwinge wrote:
> Hi Avinash!
> 
> On 2025-10-21T11:46:04+0530, Avinash Jayakar <[email protected]>
> wrote:
> > Some targets (aarch64 and x86_64 with multilib) reported regression
> > for some
> > test cases made for PR104116.
> 
> Thanks for looking into this.
> 
> I've similarly observed for '--target=amdgcn-amdhsa':
I hope the issue is the same and this patch fixes it. Is it possible to
run this on x86_64, if so I can run and check this.
> 
>     +PASS: gcc.dg/vect/pr104116-ceil-umod-2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-ceil-umod-2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-ceil-umod-2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-ceil-umod-pow2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-ceil-umod-pow2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-ceil-umod-pow2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-div-2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-round-div-2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-div-2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-div-pow2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-round-div-pow2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-div-pow2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-div.c (test for excess errors)
>     +PASS: gcc.dg/vect/pr104116-round-div.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-div.c scan-tree-dump-times vect
> "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-mod-2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-round-mod-2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-mod-2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-mod-pow2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-round-mod-pow2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-mod-pow2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-mod.c (test for excess errors)
>     +PASS: gcc.dg/vect/pr104116-round-mod.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-mod.c scan-tree-dump-times vect
> "optimized: loop vectorized" 1
> 
>     +PASS: gcc.dg/vect/pr104116-round-umod-2.c (test for excess
> errors)
>     +PASS: gcc.dg/vect/pr104116-round-umod-2.c execution test
>     +FAIL: gcc.dg/vect/pr104116-round-umod-2.c scan-tree-dump-times
> vect "optimized: loop vectorized" 1
> 
> > Turned out an extra loop which was for checking
> > results in run-time was also being vectorized and the count of vect
> > loop was 2
> > instead of 1. In this patch I have made sure no other loop other
> > than the one
> > in interest of test case is vectorized. Ok for master?
> 
> > The commit gcc-16-4464-g6883d51304f added 30 new tests for testing
> > vectorization of {FLOOR,MOD,ROUND}_{DIV,MOD}_EXPR. Few of them
> > failed
> > for certain targets due to the vectorization of runtime-check loop
> > which
> > was not intended.
> > This patch disables optimization for all of the run-time check
> > loops so
> > that the count of vectorized loop is always 1.
> > 
> > 2025-10-21  Avinash Jayakar  <[email protected]>
> > 
> > gcc/testsuite/ChangeLog:
> >     PR target/104116
> >         * gcc.dg/vect/pr104116.h: disable optimizations.
> 
> Here, you should list the individual functions that your modifying.
> 
> > --- a/gcc/testsuite/gcc.dg/vect/pr104116.h
> > +++ b/gcc/testsuite/gcc.dg/vect/pr104116.h
> > @@ -106,6 +106,7 @@ int cl_div (int x, int y)
> >    return q;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  unsigned int cl_udiv (unsigned int x, unsigned int y)
> >  {
> >    unsigned int r = x % y;
> 
> As far as I can tell, the standard idiom is to put '#pragma GCC
> novector'
Thanks for this info, i will use this instead, for other loops as well.
I was using the attribute since it would change only in one file and I
was not aware of this idiom.

I will send the reworked patch as v2 in a separate thread.

Thanks and regards,
Avinash Jayakar
> in front of the loop that's not to be vectorized.  That's more
> expressive
> than enforcing '-O0'.  Or is '-O0' necessary for other reasons?
> 
> 
> Grüße
>  Thomas
> 
> 
> > @@ -123,6 +124,7 @@ int cl_mod (int x, int y)
> >    return r;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  unsigned int cl_umod (unsigned int x, unsigned int y)
> >  {
> >    unsigned int r = x % y;
> > @@ -141,7 +143,7 @@ int fl_div (int x, int y)
> >    return q;
> >  }
> >  
> > -
> > +__attribute__((optimize("O0")))
> >  int fl_mod (int x, int y)
> >  {
> >    int r = x % y;
> > @@ -150,12 +152,14 @@ int fl_mod (int x, int y)
> >    return r;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  int abs(int x)
> >  {
> >    if (x < 0) return -x;
> >    return x;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  int rd_mod (int x, int y)
> >  {
> >    int r = x % y;
> > @@ -169,6 +173,7 @@ int rd_mod (int x, int y)
> >    return r;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  int rd_div (int x, int y)
> >  {
> >    int r = x % y;
> > @@ -183,6 +188,7 @@ int rd_div (int x, int y)
> >    return q;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  unsigned int rd_umod (unsigned int x, unsigned int y)
> >  {
> >    unsigned int r = x % y;
> > @@ -191,6 +197,7 @@ unsigned int rd_umod (unsigned int x, unsigned
> > int y)
> >    return r;
> >  }
> >  
> > +__attribute__((optimize("O0")))
> >  unsigned int rd_udiv (unsigned int x, unsigned int y)
> >  {
> >    unsigned int r = x % y;
> > -- 
> > 2.51.0

Re: [PATCH] vect: Fix regression for PR104116

Reply via email to