Re: [PATCH 11/15][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

James Greenhalgh Wed, 29 Jul 2015 05:33:10 -0700

On Tue, Jul 28, 2015 at 12:26:22PM +0100, Alan Lawrence wrote:
> gcc/ChangeLog:
> 
>       * config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
>       vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
>       vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
>       vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
>       vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
>       vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
>       vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
>       vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
>       vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
>       vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
>       vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
>       vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
>       vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16,
>       vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
>       vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
>       vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
>       vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16,
>       vld1q_dup_f16): New.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/vget_high_1.c: Add float16x8->float16x4 case.
>       * gcc.target/aarch64/vget_low_1.c: Likewise.


> @@ -14871,6 +15171,13 @@ vld1q_u64 (const uint64_t *a)
>  
>  /* vld1_dup  */
>  
> +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
> +vld1_dup_f16 (const float16_t* __a)
> +{
> +  float16_t __f = *__a;
> +  return (float16x4_t) { __f, __f, __f, __f };
> +}
> +
>  __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
>  vld1_dup_f32 (const float32_t* __a)
>  {
> @@ -14945,6 +15252,13 @@ vld1_dup_u64 (const uint64_t* __a)
>  
>  /* vld1q_dup  */
>  
> +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
> +vld1q_dup_f16 (const float16_t* __a)
> +{
> +  float16_t __f = *__a;
> +  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
> +}
> +

Did you check that these actually emit the expected instruction?

Applying your patch set I see some fairly unpleasant code generation,
but I might have made an error, or perhaps you have another patch in
waiting?

Thanks,
James

Re: [PATCH 11/15][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

Reply via email to