On Fri, 6 Mar 2020 at 11:46, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > Hi Delia, > > On 3/5/20 4:38 PM, Delia Burduv wrote: > > Hi, > > > > This is the latest version of the patch. I am forcing -mfloat-abi=hard > > because the code generated is slightly differently depending on the > > float-abi used. > > > Thanks, I've pushed it with an updated ChangeLog. > > 2020-03-06 Delia Burduv <delia.bur...@arm.com> > > * config/arm/arm_neon.h (vld2_bf16): New. > (vld2q_bf16): New. > (vld3_bf16): New. > (vld3q_bf16): New. > (vld4_bf16): New. > (vld4q_bf16): New. > (vld2_dup_bf16): New. > (vld2q_dup_bf16): New. > (vld3_dup_bf16): New. > (vld3q_dup_bf16): New. > (vld4_dup_bf16): New. > (vld4q_dup_bf16): New. > * config/arm/arm_neon_builtins.def > (vld2): Changed to VAR13 and added v4bf, v8bf > (vld2_dup): Changed to VAR8 and added v4bf, v8bf > (vld3): Changed to VAR13 and added v4bf, v8bf > (vld3_dup): Changed to VAR8 and added v4bf, v8bf > (vld4): Changed to VAR13 and added v4bf, v8bf > (vld4_dup): Changed to VAR8 and added v4bf, v8bf > * config/arm/iterators.md (VDXBF2): New iterator. > *config/arm/neon.md (neon_vld2): Use new iterators. > (neon_vld2_dup<mode): Use new iterators. > (neon_vld3<mode>): Likewise. > (neon_vld3qa<mode>): Likewise. > (neon_vld3qb<mode>): Likewise. > (neon_vld3_dup<mode>): Likewise. > (neon_vld4<mode>): Likewise. > (neon_vld4qa<mode>): Likewise. > (neon_vld4qb<mode>): Likewise. > (neon_vld4_dup<mode>): Likewise. > (neon_vld2_dupv8bf): New. > (neon_vld3_dupv8bf): Likewise. > (neon_vld4_dupv8bf): Likewise. > > Kyrill
Hi! There's a problem with the arm_neon.h update. on arm-none-linux-gnueabihf, there is a regression on g++.dg/other/pr54300.C and g++.dg/other/pr55073.C, because: FAIL: g++.dg/other/pr54300.C -std=gnu++98 (test for excess errors) Excess errors: /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19565:39: error: cannot convert 'const short int*' to 'const __bf16*' /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19574:39: error: cannot convert 'const short int*' to 'const __bf16*' [....] The same problem makes a lot (~365) of tests become unsupported on arm-none-linux-gnueabi: g++.dg/abi/mangle-arm-crypto.C g++.dg/abi/mangle-neon.C Can you fix it? Thanks Christophe > > > > > > Thanks, > > Delia > > > > On 3/4/20 5:20 PM, Kyrill Tkachov wrote: > >> Hi Delia, > >> > >> On 3/4/20 2:05 PM, Delia Burduv wrote: > >>> Hi, > >>> > >>> The previous version of this patch shared part of its code with the > >>> store intrinsics patch > >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed > >>> any duplicated code. This patch now depends on the previously mentioned > >>> store intrinsics patch. > >>> > >>> Here is the latest version and the updated ChangeLog. > >>> > >>> gcc/ChangeLog: > >>> > >>> 2019-03-04 Delia Burduv <delia.bur...@arm.com> > >>> > >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. > >>> (vld2_bf16): New. > >>> (vld2q_bf16): New. > >>> (vld3_bf16): New. > >>> (vld3q_bf16): New. > >>> (vld4_bf16): New. > >>> (vld4q_bf16): New. > >>> (vld2_dup_bf16): New. > >>> (vld2q_dup_bf16): New. > >>> (vld3_dup_bf16): New. > >>> (vld3q_dup_bf16): New. > >>> (vld4_dup_bf16): New. > >>> (vld4q_dup_bf16): New. > >>> * config/arm/arm_neon_builtins.def > >>> (vld2): Changed to VAR13 and added v4bf, v8bf > >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf > >>> (vld3): Changed to VAR13 and added v4bf, v8bf > >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf > >>> (vld4): Changed to VAR13 and added v4bf, v8bf > >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf > >>> * config/arm/iterators.md (VDXBF): New iterator. > >>> (VQ2BF): New iterator. > >>> *config/arm/neon.md (vld2): Used new iterators. > >>> (vld2_dup<mode>): Used new iterators. > >>> (vld2_dupv8bf): New. > >>> (vst3): Used new iterators. > >>> (vst3qa): Used new iterators. > >>> (vst3qb): Used new iterators. > >>> (vld3_dup<mode>): Used new iterators. > >>> (vld3_dupv8bf): New. > >>> (vst4): Used new iterators. > >>> (vst4qa): Used new iterators. > >>> (vst4qb): Used new iterators. > >>> (vld4_dup<mode>): Used new iterators. > >>> (vld4_dupv8bf): New. > >>> > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> 2019-03-04 Delia Burduv <delia.bur...@arm.com> > >>> > >>> * gcc.target/arm/simd/bf16_vldn_1.c: New test. > >>> > >>> Thanks, > >>> Delia > >>> > >>> On 2/19/20 5:25 PM, Delia Burduv wrote: > >>> > > >>> > Hi, > >>> > > >>> > Here is the latest version of the patch. It just has some minor > >>> > formatting changes that were brought up by Richard Sandiford in the > >>> > AArch64 patches > >>> > > >>> > Thanks, > >>> > Delia > >>> > > >>> > On 1/22/20 5:31 PM, Delia Burduv wrote: > >>> >> Ping. > >>> >> > >>> >> I will change the tests to use the exact input and output > >>> registers as > >>> >> Richard Sandiford suggested for the AArch64 patches. > >>> >> > >>> >> On 12/20/19 6:48 PM, Delia Burduv wrote: > >>> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics > >>> >>> vld<n>{q}_bf16 as part of the BFloat16 extension. > >>> >>> > >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) > >>> > >>> >>> > >>> >>> The intrinsics are declared in arm_neon.h . > >>> >>> A new test is added to check assembler output. > >>> >>> > >>> >>> This patch depends on the Arm back-end patche. > >>> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > >>> >>> > >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't > >>> >>> have commit rights, so if this is ok can someone please commit > >>> it for > >>> >>> me? > >>> >>> > >>> >>> gcc/ChangeLog: > >>> >>> > >>> >>> 2019-11-14 Delia Burduv <delia.bur...@arm.com> > >>> >>> > >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. > >>> >>> (bfloat16x4x2_t): New typedef. > >>> >>> (bfloat16x8x2_t): New typedef. > >>> >>> (bfloat16x4x3_t): New typedef. > >>> >>> (bfloat16x8x3_t): New typedef. > >>> >>> (bfloat16x4x4_t): New typedef. > >>> >>> (bfloat16x8x4_t): New typedef. > >>> >>> (vld2_bf16): New. > >>> >>> (vld2q_bf16): New. > >>> >>> (vld3_bf16): New. > >>> >>> (vld3q_bf16): New. > >>> >>> (vld4_bf16): New. > >>> >>> (vld4q_bf16): New. > >>> >>> (vld2_dup_bf16): New. > >>> >>> (vld2q_dup_bf16): New. > >>> >>> (vld3_dup_bf16): New. > >>> >>> (vld3q_dup_bf16): New. > >>> >>> (vld4_dup_bf16): New. > >>> >>> (vld4q_dup_bf16): New. > >>> >>> * config/arm/arm-builtins.c (E_V2BFmode): New mode. > >>> >>> (VAR13): New. > >>> >>> (arm_simd_types[Bfloat16x2_t]):New type. > >>> >>> * config/arm/arm-modes.def (V2BF): New mode. > >>> >>> * config/arm/arm-simd-builtin-types.def > >>> >>> (Bfloat16x2_t): New entry. > >>> >>> * config/arm/arm_neon_builtins.def > >>> >>> (vld2): Changed to VAR13 and added v4bf, v8bf > >>> >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf > >>> >>> (vld3): Changed to VAR13 and added v4bf, v8bf > >>> >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf > >>> >>> (vld4): Changed to VAR13 and added v4bf, v8bf > >>> >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf > >>> >>> * config/arm/iterators.md (VDXBF): New iterator. > >>> >>> (VQ2BF): New iterator. > >>> >>> (V_elem): Added V4BF, V8BF. > >>> >>> (V_sz_elem): Added V4BF, V8BF. > >>> >>> (V_mode_nunits): Added V4BF, V8BF. > >>> >>> (q): Added V4BF, V8BF. > >>> >>> *config/arm/neon.md (vld2): Used new iterators. > >>> >>> (vld2_dup<mode>): Used new iterators. > >>> >>> (vld2_dupv8bf): New. > >>> >>> (vst3): Used new iterators. > >>> >>> (vst3qa): Used new iterators. > >>> >>> (vst3qb): Used new iterators. > >>> >>> (vld3_dup<mode>): Used new iterators. > >>> >>> (vld3_dupv8bf): New. > >>> >>> (vst4): Used new iterators. > >>> >>> (vst4qa): Used new iterators. > >>> >>> (vst4qb): Used new iterators. > >>> >>> (vld4_dup<mode>): Used new iterators. > >>> >>> (vld4_dupv8bf): New. > >>> >>> > >>> >>> > >>> >>> gcc/testsuite/ChangeLog: > >>> >>> > >>> >>> 2019-11-14 Delia Burduv <delia.bur...@arm.com> > >>> >>> > >>> >>> * gcc.target/arm/simd/bf16_vldn_1.c: New test. > >> > >> > >> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > >> b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > >> new file mode 100644 > >> index > >> 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31 > >> > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > >> @@ -0,0 +1,152 @@ > >> +/* { dg-do assemble } */ > >> +/* { dg-options "-save-temps" } */ > >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ > >> +/* { dg-add-options arm_v8_2a_bf16_neon } */ > >> +/* { dg-final { check-function-bodies "**" "" } } */ > >> > >> > >> I think this should include an optimisation option like -O2 because... > >> > >> + > >> +#include "arm_neon.h" > >> + > >> + > >> +/* > >> +**test_vld2_bf16: > >> +** ... > >> +** vld2.16 {d16-d17}, \[r3\] > >> > >> ... this is unstable codegen depending on the -O0 register allocator > >> moving the ptr argument to r3 from its initial r0. > >> This should really be r0 and the load instruction should load the low > >> D regs. > >> So let's add an -O2 to the dg-options and scan for the result of that. > >> > >> > >> Otherwise this is ok. > >> Thanks! > >> Kyrill > >> > >> > >> +** ... > >> +*/ > >> +bfloat16x4x2_t > >> +test_vld2_bf16 (bfloat16_t * ptr) > >> +{ > >> + vld2_bf16 (ptr); > >> +} > >> + > >>