Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vst2[q] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst2q intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-21  Jonathan Wrightt  <jonathan.wri...@arm.com>

        * config/aarch64/arm_neon.h (vst2_s64): Use __builtin_memcpy
        instead of constructing __builtin_aarch64_simd_oi one vector
        at a time.
        (vst2_u64): Likewise.
        (vst2_f64): Likewise.
        (vst2_s8): Likewise.
        (vst2_p8): Likewise.
        (vst2_s16): Likewise.
        (vst2_p16): Likewise.
        (vst2_s32): Likewise.
        (vst2_u8): Likewise.
        (vst2_u16): Likewise.
        (vst2_u32): Likewise.
        (vst2_f16): Likewise.
        (vst2_f32): Likewise.
        (vst2_p64): Likewise.
        (vst2q_s8): Likewise.
        (vst2q_p8): Likewise.
        (vst2q_s16): Likewise.
        (vst2q_p16): Likewise.
        (vst2q_s32): Likewise.
        (vst2q_s64): Likewise.
        (vst2q_u8): Likewise.
        (vst2q_u16): Likewise.
        (vst2q_u32): Likewise.
        (vst2q_u64): Likewise.
        (vst2q_f16): Likewise.
        (vst2q_f32): Likewise.
        (vst2q_f64): Likewise.
        (vst2q_p64): Likewise.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
        tests.

Attachment: rb14689.patch
Description: rb14689.patch

Reply via email to