Hi,

As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
Neon intrinsics in arm_neon.h.

It also adds new code generation tests to verify that superfluous move
instructions are not generated for the vst[234]q or vst1q_x[234] bfloat
intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-30  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/arm_neon.h (vst1_bf16_x2): Use
        __builtin_memcpy instead of constructing an additional
        __builtin_aarch64_simd_oi one vector at a time.
        (vst1q_bf16_x2): Likewise.
        (vst1_bf16_x3): Use __builtin_memcpy instead of constructing
        an additional __builtin_aarch64_simd_ci one vector at a time.
        (vst1q_bf16_x3): Likewise.
        (vst1_bf16_x4): Use __builtin_memcpy instead of a union.
        (vst1q_bf16_x4): Likewise.
        (vst2_bf16): Use __builtin_memcpy instead of constructing an
        additional __builtin_aarch64_simd_oi one vector at a time.
        (vst2q_bf16): Likewise.
        (vst3_bf16): Use __builtin_memcpy instead of constructing an
        additional __builtin_aarch64_simd_ci mode one vector at a
        time.
        (vst3q_bf16): Likewise.
        (vst4_bf16): Use __builtin_memcpy instead of constructing an
        additional __builtin_aarch64_simd_xi one vector at a time.
        (vst4q_bf16): Likewise.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
        tests.

Attachment: rb14731.patch
Description: rb14731.patch

Reply via email to