On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi,
>
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
>
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vqtbl[234] intrinsics.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-08  Jonathan Wright  <jonathan.wri...@arm.com>
>
>         * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
>         instead of constructing __builtin_aarch64_simd_oi one vector
>         at a time.
>         (vqtbl2_u8): Likewise.
>         (vqtbl2_p8): Likewise.
>         (vqtbl2q_s8): Likewise.
>         (vqtbl2q_u8): Likewise.
>         (vqtbl2q_p8): Likewise.
>         (vqtbl3_s8): Use __builtin_memcpy instead of constructing
>         __builtin_aarch64_simd_ci one vector at a time.
>         (vqtbl3_u8): Likewise.
>         (vqtbl3_p8): Likewise.
>         (vqtbl3q_s8): Likewise.
>         (vqtbl3q_u8): Likewise.
>         (vqtbl3q_p8): Likewise.
>         (vqtbl4_s8): Use __builtin_memcpy instead of constructing
>         __builtin_aarch64_simd_xi one vector at a time.
>         (vqtbl4_u8): Likewise.
>         (vqtbl4_p8): Likewise.
>         (vqtbl4q_s8): Likewise.
>         (vqtbl4q_u8): Likewise.
>         (vqtbl4q_p8): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/vector_structure_intrinsics.c: New test.
>

Hi,

This new test fails on aarch64_be:
 FAIL: gcc.target/aarch64/vector_structure_intrinsics.c scan-assembler-not
mov\\t

Can you check?

Thanks

Christophe

Reply via email to