Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
On Fri, 6 Mar 2020 at 11:46, Kyrill Tkachov wrote: > > Hi Delia, > > On 3/5/20 4:38 PM, Delia Burduv wrote: > > Hi, > > > > This is the latest version of the patch. I am forcing -mfloat-abi=hard > > because the code generated is slightly differently depending on the > > float-abi used. > > > Thanks, I've pushed it with an updated ChangeLog. > > 2020-03-06 Delia Burduv > > * config/arm/arm_neon.h (vld2_bf16): New. > (vld2q_bf16): New. > (vld3_bf16): New. > (vld3q_bf16): New. > (vld4_bf16): New. > (vld4q_bf16): New. > (vld2_dup_bf16): New. > (vld2q_dup_bf16): New. > (vld3_dup_bf16): New. > (vld3q_dup_bf16): New. > (vld4_dup_bf16): New. > (vld4q_dup_bf16): New. > * config/arm/arm_neon_builtins.def > (vld2): Changed to VAR13 and added v4bf, v8bf > (vld2_dup): Changed to VAR8 and added v4bf, v8bf > (vld3): Changed to VAR13 and added v4bf, v8bf > (vld3_dup): Changed to VAR8 and added v4bf, v8bf > (vld4): Changed to VAR13 and added v4bf, v8bf > (vld4_dup): Changed to VAR8 and added v4bf, v8bf > * config/arm/iterators.md (VDXBF2): New iterator. > *config/arm/neon.md (neon_vld2): Use new iterators. > (neon_vld2_dup (neon_vld3): Likewise. > (neon_vld3qa): Likewise. > (neon_vld3qb): Likewise. > (neon_vld3_dup): Likewise. > (neon_vld4): Likewise. > (neon_vld4qa): Likewise. > (neon_vld4qb): Likewise. > (neon_vld4_dup): Likewise. > (neon_vld2_dupv8bf): New. > (neon_vld3_dupv8bf): Likewise. > (neon_vld4_dupv8bf): Likewise. > > Kyrill Hi! There's a problem with the arm_neon.h update. on arm-none-linux-gnueabihf, there is a regression on g++.dg/other/pr54300.C and g++.dg/other/pr55073.C, because: FAIL: g++.dg/other/pr54300.C -std=gnu++98 (test for excess errors) Excess errors: /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19565:39: error: cannot convert 'const short int*' to 'const __bf16*' /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19574:39: error: cannot convert 'const short int*' to 'const __bf16*' [] The same problem makes a lot (~365) of tests become unsupported on arm-none-linux-gnueabi: g++.dg/abi/mangle-arm-crypto.C g++.dg/abi/mangle-neon.C Can you fix it? Thanks Christophe > > > > > > Thanks, > > Delia > > > > On 3/4/20 5:20 PM, Kyrill Tkachov wrote: > >> Hi Delia, > >> > >> On 3/4/20 2:05 PM, Delia Burduv wrote: > >>> Hi, > >>> > >>> The previous version of this patch shared part of its code with the > >>> store intrinsics patch > >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed > >>> any duplicated code. This patch now depends on the previously mentioned > >>> store intrinsics patch. > >>> > >>> Here is the latest version and the updated ChangeLog. > >>> > >>> gcc/ChangeLog: > >>> > >>> 2019-03-04 Delia Burduv > >>> > >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. > >>> (vld2_bf16): New. > >>> (vld2q_bf16): New. > >>> (vld3_bf16): New. > >>> (vld3q_bf16): New. > >>> (vld4_bf16): New. > >>> (vld4q_bf16): New. > >>> (vld2_dup_bf16): New. > >>> (vld2q_dup_bf16): New. > >>> (vld3_dup_bf16): New. > >>> (vld3q_dup_bf16): New. > >>> (vld4_dup_bf16): New. > >>> (vld4q_dup_bf16): New. > >>> * config/arm/arm_neon_builtins.def > >>> (vld2): Changed to VAR13 and added v4bf, v8bf > >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf > >>> (vld3): Changed to VAR13 and added v4bf, v8bf > >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf > >>> (vld4): Changed to VAR13 and added v4bf, v8bf > >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf > >>> * config/arm/iterators.md (VDXBF): New iterator. > >>> (VQ2BF): New iterator. > >>> *config/arm/neon.md (vld2): Used new iterators. > >>> (vld2_dup): Used new iterators. > >>> (vld2_dupv8bf): New. > >>> (vst3): Used new iterators. > >>> (vst3qa): Used new iterators. > >>> (vst3qb): Used new iterators. > >>> (vld3_dup): Used new iterators. > >>> (vld3_dupv8bf): New. > >>> (vst4): Used new iterators. > >>> (vst4qa): Used new iterators. > >>> (vst4qb): Used new iterators. > >>> (vld4_dup): Used new iterators. > >>> (vld4_dupv8bf): New. > >>> > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> 2019-03-04 Delia Burduv > >>> > >>> * gcc.target/arm/simd/bf16_vldn_1.c: New test. > >>> > >>> Thanks, > >>> Delia > >>> > >>> On 2/19/20 5:25 PM, Delia Burduv wrote: > >>> > > >>> > Hi, > >>> > > >>> > Here is the latest version of the patch. It just has some minor > >>> > formatting changes that were brought up by Richard Sandiford in the > >>> > AArch64 patches > >>> > > >>> >
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi Delia, On 3/5/20 4:38 PM, Delia Burduv wrote: Hi, This is the latest version of the patch. I am forcing -mfloat-abi=hard because the code generated is slightly differently depending on the float-abi used. Thanks, I've pushed it with an updated ChangeLog. 2020-03-06 Delia Burduv * config/arm/arm_neon.h (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF2): New iterator. *config/arm/neon.md (neon_vld2): Use new iterators. (neon_vld2_dup): Likewise. (neon_vld3qa): Likewise. (neon_vld3qb): Likewise. (neon_vld3_dup): Likewise. (neon_vld4): Likewise. (neon_vld4qa): Likewise. (neon_vld4qb): Likewise. (neon_vld4_dup): Likewise. (neon_vld2_dupv8bf): New. (neon_vld3_dupv8bf): Likewise. (neon_vld4_dupv8bf): Likewise. Kyrill Thanks, Delia On 3/4/20 5:20 PM, Kyrill Tkachov wrote: Hi Delia, On 3/4/20 2:05 PM, Delia Burduv wrote: Hi, The previous version of this patch shared part of its code with the store intrinsics patch (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed any duplicated code. This patch now depends on the previously mentioned store intrinsics patch. Here is the latest version and the updated ChangeLog. gcc/ChangeLog: 2019-03-04 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-03-04 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. Thanks, Delia On 2/19/20 5:25 PM, Delia Burduv wrote: > > Hi, > > Here is the latest version of the patch. It just has some minor > formatting changes that were brought up by Richard Sandiford in the > AArch64 patches > > Thanks, > Delia > > On 1/22/20 5:31 PM, Delia Burduv wrote: >> Ping. >> >> I will change the tests to use the exact input and output registers as >> Richard Sandiford suggested for the AArch64 patches. >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >>> vld{q}_bf16 as part of the BFloat16 extension. >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >>> >>> The intrinsics are declared in arm_neon.h . >>> A new test is added to check assembler output. >>> >>> This patch depends on the Arm back-end patche. >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >>> have commit rights, so if this is ok can someone please commit it for >>> me? >>> >>> gcc/ChangeLog: >>> >>> 2019-11-14 Delia Burduv >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. >>> (bfloat16x4x2_t): New typedef. >>> (bfloat16x8x2_t): New typedef. >>> (bfloat16x4x3_t): New typedef. >>> (bfloat16x8x3_t): New typedef. >>> (bfloat16x4x4_t): New typedef. >>> (bfloat16x8x4_t): New typedef. >>> (vld2_bf16): New. >>> (vld2q_bf16): New. >>> (vld3_bf16): New. >>> (vld3q_bf16): New. >>> (vld4_bf16): New. >>> (vld4q_bf16): New. >>>
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi, This is the latest version of the patch. I am forcing -mfloat-abi=hard because the code generated is slightly differently depending on the float-abi used. Thanks, Delia On 3/4/20 5:20 PM, Kyrill Tkachov wrote: Hi Delia, On 3/4/20 2:05 PM, Delia Burduv wrote: Hi, The previous version of this patch shared part of its code with the store intrinsics patch (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed any duplicated code. This patch now depends on the previously mentioned store intrinsics patch. Here is the latest version and the updated ChangeLog. gcc/ChangeLog: 2019-03-04 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-03-04 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. Thanks, Delia On 2/19/20 5:25 PM, Delia Burduv wrote: > > Hi, > > Here is the latest version of the patch. It just has some minor > formatting changes that were brought up by Richard Sandiford in the > AArch64 patches > > Thanks, > Delia > > On 1/22/20 5:31 PM, Delia Burduv wrote: >> Ping. >> >> I will change the tests to use the exact input and output registers as >> Richard Sandiford suggested for the AArch64 patches. >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >>> vld{q}_bf16 as part of the BFloat16 extension. >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >>> >>> The intrinsics are declared in arm_neon.h . >>> A new test is added to check assembler output. >>> >>> This patch depends on the Arm back-end patche. >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >>> have commit rights, so if this is ok can someone please commit it for >>> me? >>> >>> gcc/ChangeLog: >>> >>> 2019-11-14 Delia Burduv >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. >>> (bfloat16x4x2_t): New typedef. >>> (bfloat16x8x2_t): New typedef. >>> (bfloat16x4x3_t): New typedef. >>> (bfloat16x8x3_t): New typedef. >>> (bfloat16x4x4_t): New typedef. >>> (bfloat16x8x4_t): New typedef. >>> (vld2_bf16): New. >>> (vld2q_bf16): New. >>> (vld3_bf16): New. >>> (vld3q_bf16): New. >>> (vld4_bf16): New. >>> (vld4q_bf16): New. >>> (vld2_dup_bf16): New. >>> (vld2q_dup_bf16): New. >>> (vld3_dup_bf16): New. >>> (vld3q_dup_bf16): New. >>> (vld4_dup_bf16): New. >>> (vld4q_dup_bf16): New. >>> * config/arm/arm-builtins.c (E_V2BFmode): New mode. >>> (VAR13): New. >>> (arm_simd_types[Bfloat16x2_t]):New type. >>> * config/arm/arm-modes.def (V2BF): New mode. >>> * config/arm/arm-simd-builtin-types.def >>> (Bfloat16x2_t): New entry. >>> * config/arm/arm_neon_builtins.def >>> (vld2): Changed to VAR13 and added v4bf, v8bf >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>> (vld3): Changed to VAR13 and added v4bf, v8bf >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>> (vld4): Changed to VAR13 and added v4bf, v8bf >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>> * config/arm/iterators.md (VDXBF): New iterator. >>> (VQ2BF): New iterator. >>> (V_elem): Added V4BF, V8BF. >>> (V_sz_elem): Added V4BF, V8BF. >>> (V_mode_nunits): Added V4BF, V8BF. >>> (q): Added V4BF, V8BF. >>> *config/arm/neon.md (vld2): Used new iterators. >>> (vld2_dup): Used new ite
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi Delia, On 3/4/20 2:05 PM, Delia Burduv wrote: Hi, The previous version of this patch shared part of its code with the store intrinsics patch (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed any duplicated code. This patch now depends on the previously mentioned store intrinsics patch. Here is the latest version and the updated ChangeLog. gcc/ChangeLog: 2019-03-04 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-03-04 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. Thanks, Delia On 2/19/20 5:25 PM, Delia Burduv wrote: > > Hi, > > Here is the latest version of the patch. It just has some minor > formatting changes that were brought up by Richard Sandiford in the > AArch64 patches > > Thanks, > Delia > > On 1/22/20 5:31 PM, Delia Burduv wrote: >> Ping. >> >> I will change the tests to use the exact input and output registers as >> Richard Sandiford suggested for the AArch64 patches. >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >>> vld{q}_bf16 as part of the BFloat16 extension. >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >>> >>> The intrinsics are declared in arm_neon.h . >>> A new test is added to check assembler output. >>> >>> This patch depends on the Arm back-end patche. >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >>> have commit rights, so if this is ok can someone please commit it for >>> me? >>> >>> gcc/ChangeLog: >>> >>> 2019-11-14 Delia Burduv >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. >>> (bfloat16x4x2_t): New typedef. >>> (bfloat16x8x2_t): New typedef. >>> (bfloat16x4x3_t): New typedef. >>> (bfloat16x8x3_t): New typedef. >>> (bfloat16x4x4_t): New typedef. >>> (bfloat16x8x4_t): New typedef. >>> (vld2_bf16): New. >>> (vld2q_bf16): New. >>> (vld3_bf16): New. >>> (vld3q_bf16): New. >>> (vld4_bf16): New. >>> (vld4q_bf16): New. >>> (vld2_dup_bf16): New. >>> (vld2q_dup_bf16): New. >>> (vld3_dup_bf16): New. >>> (vld3q_dup_bf16): New. >>> (vld4_dup_bf16): New. >>> (vld4q_dup_bf16): New. >>> * config/arm/arm-builtins.c (E_V2BFmode): New mode. >>> (VAR13): New. >>> (arm_simd_types[Bfloat16x2_t]):New type. >>> * config/arm/arm-modes.def (V2BF): New mode. >>> * config/arm/arm-simd-builtin-types.def >>> (Bfloat16x2_t): New entry. >>> * config/arm/arm_neon_builtins.def >>> (vld2): Changed to VAR13 and added v4bf, v8bf >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>> (vld3): Changed to VAR13 and added v4bf, v8bf >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>> (vld4): Changed to VAR13 and added v4bf, v8bf >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>> * config/arm/iterators.md (VDXBF): New iterator. >>> (VQ2BF): New iterator. >>> (V_elem): Added V4BF, V8BF. >>> (V_sz_elem): Added V4BF, V8BF. >>> (V_mode_nunits): Added V4BF, V8BF. >>> (q): Added V4BF, V8BF. >>> *config/arm/neon.md (vld2): Used new iterators. >>> (vld2_dup): Used new iterators. >>> (vld2_dupv8bf): New. >>> (vst3): Used new iterators. >>> (vst3qa): Used new iterators. >>> (vst3qb): Used new iterators. >>> (vld3_dup): Used new iterators. >>>
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi, The previous version of this patch shared part of its code with the store intrinsics patch (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed any duplicated code. This patch now depends on the previously mentioned store intrinsics patch. Here is the latest version and the updated ChangeLog. gcc/ChangeLog: 2019-03-04 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-03-04 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. Thanks, Delia On 2/19/20 5:25 PM, Delia Burduv wrote: Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/22/20 5:31 PM, Delia Burduv wrote: Ping. I will change the tests to use the exact input and output registers as Richard Sandiford suggested for the AArch64 patches. On 12/20/19 6:48 PM, Delia Burduv wrote: This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics vld{q}_bf16 as part of the BFloat16 extension. (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) The intrinsics are declared in arm_neon.h . A new test is added to check assembler output. This patch depends on the Arm back-end patche. (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have commit rights, so if this is ok can someone please commit it for me? gcc/ChangeLog: 2019-11-14 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (bfloat16x4x2_t): New typedef. (bfloat16x8x2_t): New typedef. (bfloat16x4x3_t): New typedef. (bfloat16x8x3_t): New typedef. (bfloat16x4x4_t): New typedef. (bfloat16x8x4_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm-builtins.c (E_V2BFmode): New mode. (VAR13): New. (arm_simd_types[Bfloat16x2_t]):New type. * config/arm/arm-modes.def (V2BF): New mode. * config/arm/arm-simd-builtin-types.def (Bfloat16x2_t): New entry. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. (V_elem): Added V4BF, V8BF. (V_sz_elem): Added V4BF, V8BF. (V_mode_nunits): Added V4BF, V8BF. (q): Added V4BF, V8BF. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-11-14 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. diff --git a/gcc/conf
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Hi, Here is the latest version of the patch. It just has some minor formatting changes that were brought up by Richard Sandiford in the AArch64 patches Thanks, Delia On 1/22/20 5:31 PM, Delia Burduv wrote: Ping. I will change the tests to use the exact input and output registers as Richard Sandiford suggested for the AArch64 patches. On 12/20/19 6:48 PM, Delia Burduv wrote: This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics vld{q}_bf16 as part of the BFloat16 extension. (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) The intrinsics are declared in arm_neon.h . A new test is added to check assembler output. This patch depends on the Arm back-end patche. (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have commit rights, so if this is ok can someone please commit it for me? gcc/ChangeLog: 2019-11-14 Delia Burduv * config/arm/arm_neon.h (bfloat16_t): New typedef. (bfloat16x4x2_t): New typedef. (bfloat16x8x2_t): New typedef. (bfloat16x4x3_t): New typedef. (bfloat16x8x3_t): New typedef. (bfloat16x4x4_t): New typedef. (bfloat16x8x4_t): New typedef. (vld2_bf16): New. (vld2q_bf16): New. (vld3_bf16): New. (vld3q_bf16): New. (vld4_bf16): New. (vld4q_bf16): New. (vld2_dup_bf16): New. (vld2q_dup_bf16): New. (vld3_dup_bf16): New. (vld3q_dup_bf16): New. (vld4_dup_bf16): New. (vld4q_dup_bf16): New. * config/arm/arm-builtins.c (E_V2BFmode): New mode. (VAR13): New. (arm_simd_types[Bfloat16x2_t]):New type. * config/arm/arm-modes.def (V2BF): New mode. * config/arm/arm-simd-builtin-types.def (Bfloat16x2_t): New entry. * config/arm/arm_neon_builtins.def (vld2): Changed to VAR13 and added v4bf, v8bf (vld2_dup): Changed to VAR8 and added v4bf, v8bf (vld3): Changed to VAR13 and added v4bf, v8bf (vld3_dup): Changed to VAR8 and added v4bf, v8bf (vld4): Changed to VAR13 and added v4bf, v8bf (vld4_dup): Changed to VAR8 and added v4bf, v8bf * config/arm/iterators.md (VDXBF): New iterator. (VQ2BF): New iterator. (V_elem): Added V4BF, V8BF. (V_sz_elem): Added V4BF, V8BF. (V_mode_nunits): Added V4BF, V8BF. (q): Added V4BF, V8BF. *config/arm/neon.md (vld2): Used new iterators. (vld2_dup): Used new iterators. (vld2_dupv8bf): New. (vst3): Used new iterators. (vst3qa): Used new iterators. (vst3qb): Used new iterators. (vld3_dup): Used new iterators. (vld3_dupv8bf): New. (vst4): Used new iterators. (vst4qa): Used new iterators. (vst4qb): Used new iterators. (vld4_dup): Used new iterators. (vld4_dupv8bf): New. gcc/testsuite/ChangeLog: 2019-11-14 Delia Burduv * gcc.target/arm/simd/bf16_vldn_1.c: New test. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 7f279cca6688c6f11948159666ee647ae533c61d..44c6f46fd63d5eaa1c3c84340d9acd017bb663e4 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -318,6 +318,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define v4bf_UP E_V4BFmode #define v2si_UP E_V2SImode #define v2sf_UP E_V2SFmode +#define v2bf_UP E_V2BFmode #define di_UPE_DImode #define v16qi_UP E_V16QImode #define v8hi_UP E_V8HImode @@ -381,6 +382,9 @@ typedef struct { #define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \ VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \ VAR1 (T, N, L) +#define VAR13(T, N, A, B, C, D, E, F, G, H, I, J, K, L, M) \ + VAR12 (T, N, A, B, C, D, E, F, G, H, I, J, K, L) \ + VAR1 (T, N, M) /* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def and arm_acle_builtins.def. The entries in arm_neon_builtins.def require @@ -1013,6 +1017,7 @@ arm_init_simd_builtin_types (void) arm_simd_types[Float32x4_t].eltype = float_type_node; /* Init Bfloat vector types with underlying __bf16 scalar type. */ + arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node; arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node; arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node; diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def index ea92ef35723f979c8bb1f6bfb4fbeb6cd1e4b6e9..6e48223b63d98fcbe38960700dd0949d74629f7f 100644 --- a/gcc/config/arm/arm-modes.def +++ b/gcc/config/arm/arm-modes.def @@ -80,6 +80,7 @@ VECTOR_MODE (FLOAT, HF, 2); /* V2HF */ FLOAT_MODE (BF, 2, 0); ADJUST_FLOAT_FORMAT (BF, &arm_bfloat_half_format); +VECTOR_MODE (FLOAT, BF, 2); /* V2BF. */ VECTOR_MODE (FLOAT, BF, 4); /* V4BF. */ VECTOR_MODE (FLOAT, BF, 8); /* V8BF. */ diff --git a/gcc/config
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Ping. From: Delia Burduv Sent: 22 January 2020 17:31 To: gcc-patches@gcc.gnu.org Cc: ni...@redhat.com ; Richard Earnshaw ; Kyrylo Tkachov ; Ramana Radhakrishnan Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32 Ping. I will change the tests to use the exact input and output registers as Richard Sandiford suggested for the AArch64 patches. On 12/20/19 6:48 PM, Delia Burduv wrote: > This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics vld{q}_bf16 > as part of the BFloat16 extension. > (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) > > The intrinsics are declared in arm_neon.h . > A new test is added to check assembler output. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have > commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-14 Delia Burduv > > * config/arm/arm_neon.h (bfloat16_t): New typedef. > (bfloat16x4x2_t): New typedef. > (bfloat16x8x2_t): New typedef. > (bfloat16x4x3_t): New typedef. > (bfloat16x8x3_t): New typedef. > (bfloat16x4x4_t): New typedef. > (bfloat16x8x4_t): New typedef. > (vld2_bf16): New. > (vld2q_bf16): New. > (vld3_bf16): New. > (vld3q_bf16): New. > (vld4_bf16): New. > (vld4q_bf16): New. > (vld2_dup_bf16): New. > (vld2q_dup_bf16): New. > (vld3_dup_bf16): New. > (vld3q_dup_bf16): New. > (vld4_dup_bf16): New. > (vld4q_dup_bf16): New. > * config/arm/arm-builtins.c (E_V2BFmode): New mode. > (VAR13): New. > (arm_simd_types[Bfloat16x2_t]):New type. > * config/arm/arm-modes.def (V2BF): New mode. > * config/arm/arm-simd-builtin-types.def > (Bfloat16x2_t): New entry. > * config/arm/arm_neon_builtins.def > (vld2): Changed to VAR13 and added v4bf, v8bf > (vld2_dup): Changed to VAR8 and added v4bf, v8bf > (vld3): Changed to VAR13 and added v4bf, v8bf > (vld3_dup): Changed to VAR8 and added v4bf, v8bf > (vld4): Changed to VAR13 and added v4bf, v8bf > (vld4_dup): Changed to VAR8 and added v4bf, v8bf > * config/arm/iterators.md (VDXBF): New iterator. > (VQ2BF): New iterator. > (V_elem): Added V4BF, V8BF. > (V_sz_elem): Added V4BF, V8BF. > (V_mode_nunits): Added V4BF, V8BF. > (q): Added V4BF, V8BF. > *config/arm/neon.md (vld2): Used new iterators. > (vld2_dup): Used new iterators. > (vld2_dupv8bf): New. > (vst3): Used new iterators. > (vst3qa): Used new iterators. > (vst3qb): Used new iterators. > (vld3_dup): Used new iterators. > (vld3_dupv8bf): New. > (vst4): Used new iterators. > (vst4qa): Used new iterators. > (vst4qb): Used new iterators. > (vld4_dup): Used new iterators. > (vld4_dupv8bf): New. > > > gcc/testsuite/ChangeLog: > > 2019-11-14 Delia Burduv > > * gcc.target/arm/simd/bf16_vldn_1.c: New test.
Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Ping. I will change the tests to use the exact input and output registers as Richard Sandiford suggested for the AArch64 patches. On 12/20/19 6:48 PM, Delia Burduv wrote: > This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics vld{q}_bf16 > as part of the BFloat16 extension. > (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) > > > The intrinsics are declared in arm_neon.h . > A new test is added to check assembler output. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > > Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have > commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-14 Delia Burduv > > * config/arm/arm_neon.h (bfloat16_t): New typedef. > (bfloat16x4x2_t): New typedef. > (bfloat16x8x2_t): New typedef. > (bfloat16x4x3_t): New typedef. > (bfloat16x8x3_t): New typedef. > (bfloat16x4x4_t): New typedef. > (bfloat16x8x4_t): New typedef. > (vld2_bf16): New. > (vld2q_bf16): New. > (vld3_bf16): New. > (vld3q_bf16): New. > (vld4_bf16): New. > (vld4q_bf16): New. > (vld2_dup_bf16): New. > (vld2q_dup_bf16): New. > (vld3_dup_bf16): New. > (vld3q_dup_bf16): New. > (vld4_dup_bf16): New. > (vld4q_dup_bf16): New. > * config/arm/arm-builtins.c (E_V2BFmode): New mode. > (VAR13): New. > (arm_simd_types[Bfloat16x2_t]):New type. > * config/arm/arm-modes.def (V2BF): New mode. > * config/arm/arm-simd-builtin-types.def > (Bfloat16x2_t): New entry. > * config/arm/arm_neon_builtins.def > (vld2): Changed to VAR13 and added v4bf, v8bf > (vld2_dup): Changed to VAR8 and added v4bf, v8bf > (vld3): Changed to VAR13 and added v4bf, v8bf > (vld3_dup): Changed to VAR8 and added v4bf, v8bf > (vld4): Changed to VAR13 and added v4bf, v8bf > (vld4_dup): Changed to VAR8 and added v4bf, v8bf > * config/arm/iterators.md (VDXBF): New iterator. > (VQ2BF): New iterator. > (V_elem): Added V4BF, V8BF. > (V_sz_elem): Added V4BF, V8BF. > (V_mode_nunits): Added V4BF, V8BF. > (q): Added V4BF, V8BF. > *config/arm/neon.md (vld2): Used new iterators. > (vld2_dup): Used new iterators. > (vld2_dupv8bf): New. > (vst3): Used new iterators. > (vst3qa): Used new iterators. > (vst3qb): Used new iterators. > (vld3_dup): Used new iterators. > (vld3_dupv8bf): New. > (vst4): Used new iterators. > (vst4qa): Used new iterators. > (vst4qb): Used new iterators. > (vld4_dup): Used new iterators. > (vld4_dupv8bf): New. > > > gcc/testsuite/ChangeLog: > > 2019-11-14 Delia Burduv > > * gcc.target/arm/simd/bf16_vldn_1.c: New test.