On Fri, 21 Jun 2024 at 12:14, Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
>
> On 21/06/2024 08:57, Alexandre Oliva wrote:
> > On Jun 20, 2024, Christophe Lyon <christophe.l...@linaro.org> wrote:
> >
> >> Maybe using
> >> if ((unsigned)b[i] >= BITS) \
> >> would be clearer?
> >
> > Heh.  Why make it simpler if we can make it unreadable, right? :-D
> >
> > Thanks, here's another version I've just retested on x-arm-eabi.  Ok?
> >
> > I'm not sure how to credit your suggestion.  It's not like you pretty
> > much wrote the entire patch, as in Richard's case, but it's still a
> > sizable chunk of this two-liner.  Any preferences?
>
> How about mentioning Christophe's simplification in the commit log?

For the avoidance of doubt: it's OK for me (but you don't need to
mention my name in fact ;-)

Thanks,

Christophe

> >
> >
> > The test was too optimistic, alas.  We used to vectorize shifts
> > involving 8-bit and 16-bit integral types by clamping the shift count
> > at the highest in-range shift count, but that was not correct: such
> > narrow shifts expect integral promotion, so larger shift counts should
> > be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> > before the fix).
>
> This is OK, but you might wish to revisit this statement before committing.  
> I think the above is a mis-summary of the original bug report which had a 
> test to pick between 0 and 1 as the result of a shift operation.
>
> If I've understood what's going on here correctly, then we have
>
> (int16_t)32768 >> (int16_t) 16
>
> but shift is always done at int precision, so this is (due to default 
> promotions)
>
> (int)(int16_t)32768 >> 16  // size/type of the shift amount does not matter.
>
> which then simplifies to
>
> -32768 >> 16;  // 0xffff8000 >> 16
>
> = -1;
>
> I think the original bug was that we were losing the cast to short (and hence 
> the sign extension of the intermediate value), so effectively we simplified 
> this to
>
> 32768 >> 16; // 0x00008000 >> 16
>
> = 0;
>
> And the other part of the observation was that it had to be done this way 
> (and couldn't be narrowed for vectorization) because 16 is larger than the 
> maximum shift for a short (actually you say that just below).
>
> R.
>
> >
> > Unfortunately, in the gimple model of vector units, such large shift
> > counts wouldn't be well-defined, so we won't vectorize such shifts any
> > more, unless we can tell they're in range or undefined.
> >
> > So the test that expected the incorrect clamping we no longer perform
> > needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> > suggested annotating the test with the expected ranges so as to enable
> > the optimization.
> >
> >
> > Co-Authored-By: Richard Earnshaw <richard.earns...@arm.com>
> >
> > for  gcc/testsuite/ChangeLog
> >
> >       PR tree-optimization/113281
> >       * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> > ---
> >  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > index 8c7adef9ed8f1..03078de49c65e 100644
> > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > @@ -9,6 +9,8 @@
> >    void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> > __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
> >      int i;                                                           \
> >      for (i=0; i<NB; i++) {                                           \
> > +      if ((unsigned)b[i] >= (unsigned)(BITS))                              
> >   \
> > +     __builtin_unreachable();                                        \
> >        dest[i] = a[i] OP b[i];                                              
> >   \
> >      }                                                                      
> >   \
> >  }
> >
> >
>

Reply via email to