On Fri, 21 Jun 2024 at 12:14, Richard Earnshaw (lists) <richard.earns...@arm.com> wrote: > > On 21/06/2024 08:57, Alexandre Oliva wrote: > > On Jun 20, 2024, Christophe Lyon <christophe.l...@linaro.org> wrote: > > > >> Maybe using > >> if ((unsigned)b[i] >= BITS) \ > >> would be clearer? > > > > Heh. Why make it simpler if we can make it unreadable, right? :-D > > > > Thanks, here's another version I've just retested on x-arm-eabi. Ok? > > > > I'm not sure how to credit your suggestion. It's not like you pretty > > much wrote the entire patch, as in Richard's case, but it's still a > > sizable chunk of this two-liner. Any preferences? > > How about mentioning Christophe's simplification in the commit log?
For the avoidance of doubt: it's OK for me (but you don't need to mention my name in fact ;-) Thanks, Christophe > > > > > > The test was too optimistic, alas. We used to vectorize shifts > > involving 8-bit and 16-bit integral types by clamping the shift count > > at the highest in-range shift count, but that was not correct: such > > narrow shifts expect integral promotion, so larger shift counts should > > be accepted. (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as > > before the fix). > > This is OK, but you might wish to revisit this statement before committing. > I think the above is a mis-summary of the original bug report which had a > test to pick between 0 and 1 as the result of a shift operation. > > If I've understood what's going on here correctly, then we have > > (int16_t)32768 >> (int16_t) 16 > > but shift is always done at int precision, so this is (due to default > promotions) > > (int)(int16_t)32768 >> 16 // size/type of the shift amount does not matter. > > which then simplifies to > > -32768 >> 16; // 0xffff8000 >> 16 > > = -1; > > I think the original bug was that we were losing the cast to short (and hence > the sign extension of the intermediate value), so effectively we simplified > this to > > 32768 >> 16; // 0x00008000 >> 16 > > = 0; > > And the other part of the observation was that it had to be done this way > (and couldn't be narrowed for vectorization) because 16 is larger than the > maximum shift for a short (actually you say that just below). > > R. > > > > > Unfortunately, in the gimple model of vector units, such large shift > > counts wouldn't be well-defined, so we won't vectorize such shifts any > > more, unless we can tell they're in range or undefined. > > > > So the test that expected the incorrect clamping we no longer perform > > needs to be adjusted. Instead of nobbling the test, Richard Earnshaw > > suggested annotating the test with the expected ranges so as to enable > > the optimization. > > > > > > Co-Authored-By: Richard Earnshaw <richard.earns...@arm.com> > > > > for gcc/testsuite/ChangeLog > > > > PR tree-optimization/113281 > > * gcc.target/arm/simd/mve-vshr.c: Add expected ranges. > > --- > > gcc/testsuite/gcc.target/arm/simd/mve-vshr.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c > > b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c > > index 8c7adef9ed8f1..03078de49c65e 100644 > > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c > > @@ -9,6 +9,8 @@ > > void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * > > __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ > > int i; \ > > for (i=0; i<NB; i++) { \ > > + if ((unsigned)b[i] >= (unsigned)(BITS)) > > \ > > + __builtin_unreachable(); \ > > dest[i] = a[i] OP b[i]; > > \ > > } > > \ > > } > > > > >