Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> This patch adds tests to verify that Neon narrowing-shift instructions
> clear the top half of the result vector. It is sufficient to show that a
> subsequent combine with a zero-vector is optimized away - leaving
> just the narrowing-shift instruction.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/testsuite/ChangeLog:
>
> 2021-06-15  Jonathan Wright  <jonathan.wri...@arm.com>
>
>       * gcc.target/aarch64/narrow_zero_high_half.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c 
> b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..27fa0e640ab2b37781376c40ce4ca37602c72393
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> @@ -0,0 +1,60 @@
> +/* { dg-skip-if "" { arm*-*-* } } */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#include <arm_neon.h>
> +
> +#define TEST_SHIFT(name, rettype, intype, fs, rs) \
> +  rettype test_ ## name ## _ ## fs ## _zero_high \
> +             (intype a) \
> +     { \
> +             return vcombine_ ## rs (name ## _ ## fs (a, 4), \
> +                                     vdup_n_ ## rs (0)); \
> +     }
> +
> +TEST_SHIFT (vshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vrshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vrshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vrshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vrshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vrshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vrshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vqshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vqshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vqshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vqshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vqshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vqshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vqrshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vqrshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vqrshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vqrshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vqrshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vqrshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vqshrun_n, uint8x16_t, int16x8_t, s16, u8)
> +TEST_SHIFT (vqshrun_n, uint16x8_t, int32x4_t, s32, u16)
> +TEST_SHIFT (vqshrun_n, uint32x4_t, int64x2_t, s64, u32)
> +
> +TEST_SHIFT (vqrshrun_n, uint8x16_t, int16x8_t, s16, u8)
> +TEST_SHIFT (vqrshrun_n, uint16x8_t, int32x4_t, s32, u16)
> +TEST_SHIFT (vqrshrun_n, uint32x4_t, int64x2_t, s64, u32)
> +
> +/* { dg-final { scan-assembler-not "dup\\t" } } */
> +
> +/* { dg-final { scan-assembler-times "\\trshrn\\tv" 6} }  */
> +/* { dg-final { scan-assembler-times "\\tshrn\\tv" 6} }  */
> +/* { dg-final { scan-assembler-times "\\tsqshrun\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tsqrshrun\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tsqshrn\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tuqshrn\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tsqrshrn\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tuqrshrn\\tv" 3} }  */

Very minor, but it would be good to keep the scans in the same
order as the functions, to make comparisons easier.

OK with or without that change, thanks.

Richard

Reply via email to