Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

Christophe Lyon Mon, 09 Mar 2020 03:19:31 -0700

On Fri, 6 Mar 2020 at 11:46, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote:
>
> Hi Delia,
>
> On 3/5/20 4:38 PM, Delia Burduv wrote:
> > Hi,
> >
> > This is the latest version of the patch. I am forcing -mfloat-abi=hard
> > because the code generated is slightly differently depending on the
> > float-abi used.
>
>
> Thanks, I've pushed it with an updated ChangeLog.
>
> 2020-03-06  Delia Burduv  <delia.bur...@arm.com>
>
>      * config/arm/arm_neon.h (vld2_bf16): New.
>      (vld2q_bf16): New.
>      (vld3_bf16): New.
>      (vld3q_bf16): New.
>      (vld4_bf16): New.
>      (vld4q_bf16): New.
>      (vld2_dup_bf16): New.
>      (vld2q_dup_bf16): New.
>      (vld3_dup_bf16): New.
>      (vld3q_dup_bf16): New.
>      (vld4_dup_bf16): New.
>      (vld4q_dup_bf16): New.
>      * config/arm/arm_neon_builtins.def
>      (vld2): Changed to VAR13 and added v4bf, v8bf
>      (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>      (vld3): Changed to VAR13 and added v4bf, v8bf
>      (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>      (vld4): Changed to VAR13 and added v4bf, v8bf
>      (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>      * config/arm/iterators.md (VDXBF2): New iterator.
>      *config/arm/neon.md (neon_vld2): Use new iterators.
>      (neon_vld2_dup<mode): Use new iterators.
>      (neon_vld3<mode>): Likewise.
>      (neon_vld3qa<mode>): Likewise.
>      (neon_vld3qb<mode>): Likewise.
>      (neon_vld3_dup<mode>): Likewise.
>      (neon_vld4<mode>): Likewise.
>      (neon_vld4qa<mode>): Likewise.
>      (neon_vld4qb<mode>): Likewise.
>      (neon_vld4_dup<mode>): Likewise.
>      (neon_vld2_dupv8bf): New.
>      (neon_vld3_dupv8bf): Likewise.
>      (neon_vld4_dupv8bf): Likewise.
>
> Kyrill


Hi!

There's a problem with the arm_neon.h update.
on arm-none-linux-gnueabihf, there is a regression on
g++.dg/other/pr54300.C and g++.dg/other/pr55073.C, because:
FAIL: g++.dg/other/pr54300.C  -std=gnu++98 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19565:39:
error: cannot convert 'const short int*' to 'const __bf16*'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19574:39:
error: cannot convert 'const short int*' to 'const __bf16*'
[....]

The same problem makes a lot (~365) of tests become unsupported on
arm-none-linux-gnueabi:
g++.dg/abi/mangle-arm-crypto.C
g++.dg/abi/mangle-neon.C

Can you fix it?

Thanks

Christophe

>
>
> >
> > Thanks,
> > Delia
> >
> > On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
> >> Hi Delia,
> >>
> >> On 3/4/20 2:05 PM, Delia Burduv wrote:
> >>> Hi,
> >>>
> >>> The previous version of this patch shared part of its code with the
> >>> store intrinsics patch
> >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
> >>> any duplicated code. This patch now depends on the previously mentioned
> >>> store intrinsics patch.
> >>>
> >>> Here is the latest version and the updated ChangeLog.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2019-03-04  Delia Burduv  <delia.bur...@arm.com>
> >>>
> >>>         * config/arm/arm_neon.h (bfloat16_t): New typedef.
> >>>          (vld2_bf16): New.
> >>>         (vld2q_bf16): New.
> >>>         (vld3_bf16): New.
> >>>         (vld3q_bf16): New.
> >>>         (vld4_bf16): New.
> >>>         (vld4q_bf16): New.
> >>>         (vld2_dup_bf16): New.
> >>>         (vld2q_dup_bf16): New.
> >>>          (vld3_dup_bf16): New.
> >>>         (vld3q_dup_bf16): New.
> >>>         (vld4_dup_bf16): New.
> >>>         (vld4q_dup_bf16): New.
> >>>          * config/arm/arm_neon_builtins.def
> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          * config/arm/iterators.md (VDXBF): New iterator.
> >>>          (VQ2BF): New iterator.
> >>>          *config/arm/neon.md (vld2): Used new iterators.
> >>>          (vld2_dup<mode>): Used new iterators.
> >>>          (vld2_dupv8bf): New.
> >>>          (vst3): Used new iterators.
> >>>          (vst3qa): Used new iterators.
> >>>          (vst3qb): Used new iterators.
> >>>          (vld3_dup<mode>): Used new iterators.
> >>>          (vld3_dupv8bf): New.
> >>>          (vst4): Used new iterators.
> >>>          (vst4qa): Used new iterators.
> >>>          (vst4qb): Used new iterators.
> >>>          (vld4_dup<mode>): Used new iterators.
> >>>          (vld4_dupv8bf): New.
> >>>
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> 2019-03-04  Delia Burduv  <delia.bur...@arm.com>
> >>>
> >>>         * gcc.target/arm/simd/bf16_vldn_1.c: New test.
> >>>
> >>> Thanks,
> >>> Delia
> >>>
> >>> On 2/19/20 5:25 PM, Delia Burduv wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Here is the latest version of the patch. It just has some minor
> >>> > formatting changes that were brought up by Richard Sandiford in the
> >>> > AArch64 patches
> >>> >
> >>> > Thanks,
> >>> > Delia
> >>> >
> >>> > On 1/22/20 5:31 PM, Delia Burduv wrote:
> >>> >> Ping.
> >>> >>
> >>> >> I will change the tests to use the exact input and output
> >>> registers as
> >>> >> Richard Sandiford suggested for the AArch64 patches.
> >>> >>
> >>> >> On 12/20/19 6:48 PM, Delia Burduv wrote:
> >>> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
> >>> >>> vld<n>{q}_bf16 as part of the BFloat16 extension.
> >>> >>>
> >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
> >>>
> >>> >>>
> >>> >>> The intrinsics are declared in arm_neon.h .
> >>> >>> A new test is added to check assembler output.
> >>> >>>
> >>> >>> This patch depends on the Arm back-end patche.
> >>> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
> >>> >>>
> >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
> >>> >>> have commit rights, so if this is ok can someone please commit
> >>> it for
> >>> >>> me?
> >>> >>>
> >>> >>> gcc/ChangeLog:
> >>> >>>
> >>> >>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
> >>> >>>
> >>> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
> >>> >>>          (bfloat16x4x2_t): New typedef.
> >>> >>>          (bfloat16x8x2_t): New typedef.
> >>> >>>          (bfloat16x4x3_t): New typedef.
> >>> >>>          (bfloat16x8x3_t): New typedef.
> >>> >>>          (bfloat16x4x4_t): New typedef.
> >>> >>>          (bfloat16x8x4_t): New typedef.
> >>> >>>          (vld2_bf16): New.
> >>> >>>      (vld2q_bf16): New.
> >>> >>>      (vld3_bf16): New.
> >>> >>>      (vld3q_bf16): New.
> >>> >>>      (vld4_bf16): New.
> >>> >>>      (vld4q_bf16): New.
> >>> >>>      (vld2_dup_bf16): New.
> >>> >>>      (vld2q_dup_bf16): New.
> >>> >>>       (vld3_dup_bf16): New.
> >>> >>>      (vld3q_dup_bf16): New.
> >>> >>>      (vld4_dup_bf16): New.
> >>> >>>      (vld4q_dup_bf16): New.
> >>> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
> >>> >>>          (VAR13): New.
> >>> >>>          (arm_simd_types[Bfloat16x2_t]):New type.
> >>> >>>          * config/arm/arm-modes.def (V2BF): New mode.
> >>> >>>          * config/arm/arm-simd-builtin-types.def
> >>> >>>          (Bfloat16x2_t): New entry.
> >>> >>>          * config/arm/arm_neon_builtins.def
> >>> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf
> >>> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
> >>> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf
> >>> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
> >>> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf
> >>> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
> >>> >>>          * config/arm/iterators.md (VDXBF): New iterator.
> >>> >>>          (VQ2BF): New iterator.
> >>> >>>          (V_elem): Added V4BF, V8BF.
> >>> >>>          (V_sz_elem): Added V4BF, V8BF.
> >>> >>>          (V_mode_nunits): Added V4BF, V8BF.
> >>> >>>          (q): Added V4BF, V8BF.
> >>> >>>          *config/arm/neon.md (vld2): Used new iterators.
> >>> >>>          (vld2_dup<mode>): Used new iterators.
> >>> >>>          (vld2_dupv8bf): New.
> >>> >>>          (vst3): Used new iterators.
> >>> >>>          (vst3qa): Used new iterators.
> >>> >>>          (vst3qb): Used new iterators.
> >>> >>>          (vld3_dup<mode>): Used new iterators.
> >>> >>>          (vld3_dupv8bf): New.
> >>> >>>          (vst4): Used new iterators.
> >>> >>>          (vst4qa): Used new iterators.
> >>> >>>          (vst4qb): Used new iterators.
> >>> >>>          (vld4_dup<mode>): Used new iterators.
> >>> >>>          (vld4_dupv8bf): New.
> >>> >>>
> >>> >>>
> >>> >>> gcc/testsuite/ChangeLog:
> >>> >>>
> >>> >>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
> >>> >>>
> >>> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test.
> >>
> >>
> >> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
> >> b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
> >> new file mode 100644
> >> index
> >> 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31
> >>
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
> >> @@ -0,0 +1,152 @@
> >> +/* { dg-do assemble } */
> >> +/* { dg-options "-save-temps" }  */
> >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> >> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >>
> >>
> >> I think this should include an optimisation option like -O2 because...
> >>
> >>   +
> >> +#include "arm_neon.h"
> >> +
> >> +
> >> +/*
> >> +**test_vld2_bf16:
> >> +**    ...
> >> +**    vld2.16    {d16-d17}, \[r3\]
> >>
> >> ... this is unstable codegen depending on the -O0 register allocator
> >> moving the ptr argument to r3 from its initial r0.
> >> This should really be r0 and the load instruction should load the low
> >> D regs.
> >> So let's add an -O2 to the dg-options and scan for the result of that.
> >>
> >>
> >> Otherwise this is ok.
> >> Thanks!
> >> Kyrill
> >>
> >>
> >>   +**    ...
> >> +*/
> >> +bfloat16x4x2_t
> >> +test_vld2_bf16 (bfloat16_t * ptr)
> >> +{
> >> +  vld2_bf16 (ptr);
> >> +}
> >> +
> >>

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

Reply via email to