[PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

2021-07-23 Thread Jonathan Wright via Gcc-patches
Hi,

This patch uses __builtin_memcpy to copy vector structures instead of
building a new opaque structure one vector at a time in each of the
vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
and also improves code generation - superfluous move instructions
were emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-08  Jonathan Wright  

* config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New test.


rb14639.patch
Description: rb14639.patch


Re: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

2021-08-03 Thread Christophe Lyon via Gcc-patches
On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi,
>
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
>
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vqtbl[234] intrinsics.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-08  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
> instead of constructing __builtin_aarch64_simd_oi one vector
> at a time.
> (vqtbl2_u8): Likewise.
> (vqtbl2_p8): Likewise.
> (vqtbl2q_s8): Likewise.
> (vqtbl2q_u8): Likewise.
> (vqtbl2q_p8): Likewise.
> (vqtbl3_s8): Use __builtin_memcpy instead of constructing
> __builtin_aarch64_simd_ci one vector at a time.
> (vqtbl3_u8): Likewise.
> (vqtbl3_p8): Likewise.
> (vqtbl3q_s8): Likewise.
> (vqtbl3q_u8): Likewise.
> (vqtbl3q_p8): Likewise.
> (vqtbl4_s8): Use __builtin_memcpy instead of constructing
> __builtin_aarch64_simd_xi one vector at a time.
> (vqtbl4_u8): Likewise.
> (vqtbl4_p8): Likewise.
> (vqtbl4q_s8): Likewise.
> (vqtbl4q_u8): Likewise.
> (vqtbl4q_p8): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vector_structure_intrinsics.c: New test.
>

Hi,

This new test fails on aarch64_be:
 FAIL: gcc.target/aarch64/vector_structure_intrinsics.c scan-assembler-not
mov\\t

Can you check?

Thanks

Christophe


RE: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

2021-07-23 Thread Kyrylo Tkachov via Gcc-patches
Hi Jonathan,

> -Original Message-
> From: Jonathan Wright 
> Sent: 23 July 2021 09:22
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH 1/8] aarch64: Use memcpy to copy vector tables in
> vqtbl[234] intrinsics
> 
> Hi,
> 
> This patch uses __builtin_memcpy to copy vector structures instead of
> building a new opaque structure one vector at a time in each of the
> vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file
> and also improves code generation - superfluous move instructions
> were emitted for every register extraction/set in this additional
> structure.
> 
> Add new code generation tests to verify that superfluous move
> instructions are no longer generated for the vqtbl[234] intrinsics.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?
> 

In the testcase:
diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
new file mode 100644
index 
..2fab0f2947b7fa28e4e3a77bd365dcfdf30a9b28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
@@ -0,0 +1,45 @@
+/* { dg-skip-if "" { arm*-*-* } } */

Files in gcc.target/aarch64 won't be attempted on arm* targets so the skip-if 
isn't needed (that's only for tests in gcc.target/aarch64/advsimd-intrinsics/).

Ok with that directive removed, thanks for doing this!
Kyrill


> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-07-08  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
>   instead of constructing __builtin_aarch64_simd_oi one vector
>   at a time.
>   (vqtbl2_u8): Likewise.
>   (vqtbl2_p8): Likewise.
>   (vqtbl2q_s8): Likewise.
>   (vqtbl2q_u8): Likewise.
>   (vqtbl2q_p8): Likewise.
>   (vqtbl3_s8): Use __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_ci one vector at a time.
>   (vqtbl3_u8): Likewise.
>   (vqtbl3_p8): Likewise.
>   (vqtbl3q_s8): Likewise.
>   (vqtbl3q_u8): Likewise.
>   (vqtbl3q_p8): Likewise.
>   (vqtbl4_s8): Use __builtin_memcpy instead of constructing
>   __builtin_aarch64_simd_xi one vector at a time.
>   (vqtbl4_u8): Likewise.
>   (vqtbl4_p8): Likewise.
>   (vqtbl4q_s8): Likewise.
>   (vqtbl4q_u8): Likewise.
>   (vqtbl4q_p8): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vector_structure_intrinsics.c: New test.