Hi, As subject, this patch uses __builtin_memcpy to copy vector structures instead of using a union - or constructing a new opaque structure one vector at a time - in each of the vst3[q]_lane Neon intrinsics in arm_neon.h.
It also adds new code generation tests to verify that superfluous move instructions are not generated for the vst3q_lane intrinsics. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-30 Jonathan Wright <jonathan.wri...@arm.com> * config/aarch64/arm_neon.h (__ST3_LANE_FUNC): Delete. (__ST3Q_LANE_FUNC): Delete. (vst3_lane_f16): Use __builtin_memcpy to copy vector structure instead of constructing __builtin_aarch64_simd_ci one vector at a time. (vst3_lane_f32): Likewise. (vst3_lane_f64): Likewise. (vst3_lane_p8): Likewise. (vst3_lane_p16): Likewise. (vst3_lane_p64): Likewise. (vst3_lane_s8): Likewise. (vst3_lane_s16): Likewise. (vst3_lane_s32): Likewise. (vst3_lane_s64): Likewise. (vst3_lane_u8): Likewise. (vst3_lane_u16): Likewise. (vst3_lane_u32): Likewise. (vst3_lane_u64): Likewise. (vst3_lane_bf16): Likewise. (vst3q_lane_f16): Use __builtin_memcpy to copy vector structure instead of using a union. (vst3q_lane_f32): Likewise. (vst3q_lane_f64): Likewise. (vst3q_lane_p8): Likewise. (vst3q_lane_p16): Likewise. (vst3q_lane_p64): Likewise. (vst3q_lane_s8): Likewise. (vst3q_lane_s16): Likewise. (vst3q_lane_s32): Likewise. (vst3q_lane_s64): Likewise. (vst3q_lane_u8): Likewise. (vst3q_lane_u16): Likewise. (vst3q_lane_u32): Likewise. (vst3q_lane_u64): Likewise. (vst3q_lane_bf16): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vector_structure_intrinsics.c: Add new tests.
rb14729.patch
Description: rb14729.patch