Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi, > > This patch adds tests to verify that Neon narrowing-shift instructions > clear the top half of the result vector. It is sufficient to show that a > subsequent combine with a zero-vector is optimized away - leaving > just the narrowing-shift instruction. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/testsuite/ChangeLog: > > 2021-06-15 Jonathan Wright <jonathan.wri...@arm.com> > > * gcc.target/aarch64/narrow_zero_high_half.c: New test. > > diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c > b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..27fa0e640ab2b37781376c40ce4ca37602c72393 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c > @@ -0,0 +1,60 @@ > +/* { dg-skip-if "" { arm*-*-* } } */ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > + > +#include <arm_neon.h> > + > +#define TEST_SHIFT(name, rettype, intype, fs, rs) \ > + rettype test_ ## name ## _ ## fs ## _zero_high \ > + (intype a) \ > + { \ > + return vcombine_ ## rs (name ## _ ## fs (a, 4), \ > + vdup_n_ ## rs (0)); \ > + } > + > +TEST_SHIFT (vshrn_n, int8x16_t, int16x8_t, s16, s8) > +TEST_SHIFT (vshrn_n, int16x8_t, int32x4_t, s32, s16) > +TEST_SHIFT (vshrn_n, int32x4_t, int64x2_t, s64, s32) > +TEST_SHIFT (vshrn_n, uint8x16_t, uint16x8_t, u16, u8) > +TEST_SHIFT (vshrn_n, uint16x8_t, uint32x4_t, u32, u16) > +TEST_SHIFT (vshrn_n, uint32x4_t, uint64x2_t, u64, u32) > + > +TEST_SHIFT (vrshrn_n, int8x16_t, int16x8_t, s16, s8) > +TEST_SHIFT (vrshrn_n, int16x8_t, int32x4_t, s32, s16) > +TEST_SHIFT (vrshrn_n, int32x4_t, int64x2_t, s64, s32) > +TEST_SHIFT (vrshrn_n, uint8x16_t, uint16x8_t, u16, u8) > +TEST_SHIFT (vrshrn_n, uint16x8_t, uint32x4_t, u32, u16) > +TEST_SHIFT (vrshrn_n, uint32x4_t, uint64x2_t, u64, u32) > + > +TEST_SHIFT (vqshrn_n, int8x16_t, int16x8_t, s16, s8) > +TEST_SHIFT (vqshrn_n, int16x8_t, int32x4_t, s32, s16) > +TEST_SHIFT (vqshrn_n, int32x4_t, int64x2_t, s64, s32) > +TEST_SHIFT (vqshrn_n, uint8x16_t, uint16x8_t, u16, u8) > +TEST_SHIFT (vqshrn_n, uint16x8_t, uint32x4_t, u32, u16) > +TEST_SHIFT (vqshrn_n, uint32x4_t, uint64x2_t, u64, u32) > + > +TEST_SHIFT (vqrshrn_n, int8x16_t, int16x8_t, s16, s8) > +TEST_SHIFT (vqrshrn_n, int16x8_t, int32x4_t, s32, s16) > +TEST_SHIFT (vqrshrn_n, int32x4_t, int64x2_t, s64, s32) > +TEST_SHIFT (vqrshrn_n, uint8x16_t, uint16x8_t, u16, u8) > +TEST_SHIFT (vqrshrn_n, uint16x8_t, uint32x4_t, u32, u16) > +TEST_SHIFT (vqrshrn_n, uint32x4_t, uint64x2_t, u64, u32) > + > +TEST_SHIFT (vqshrun_n, uint8x16_t, int16x8_t, s16, u8) > +TEST_SHIFT (vqshrun_n, uint16x8_t, int32x4_t, s32, u16) > +TEST_SHIFT (vqshrun_n, uint32x4_t, int64x2_t, s64, u32) > + > +TEST_SHIFT (vqrshrun_n, uint8x16_t, int16x8_t, s16, u8) > +TEST_SHIFT (vqrshrun_n, uint16x8_t, int32x4_t, s32, u16) > +TEST_SHIFT (vqrshrun_n, uint32x4_t, int64x2_t, s64, u32) > + > +/* { dg-final { scan-assembler-not "dup\\t" } } */ > + > +/* { dg-final { scan-assembler-times "\\trshrn\\tv" 6} } */ > +/* { dg-final { scan-assembler-times "\\tshrn\\tv" 6} } */ > +/* { dg-final { scan-assembler-times "\\tsqshrun\\tv" 3} } */ > +/* { dg-final { scan-assembler-times "\\tsqrshrun\\tv" 3} } */ > +/* { dg-final { scan-assembler-times "\\tsqshrn\\tv" 3} } */ > +/* { dg-final { scan-assembler-times "\\tuqshrn\\tv" 3} } */ > +/* { dg-final { scan-assembler-times "\\tsqrshrn\\tv" 3} } */ > +/* { dg-final { scan-assembler-times "\\tuqrshrn\\tv" 3} } */
Very minor, but it would be good to keep the scans in the same order as the functions, to make comparisons easier. OK with or without that change, thanks. Richard