Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

Richard Sandiford Tue, 07 Jan 2020 09:15:49 -0800

Thanks for the update.  The new patch looks really good, just some
minor comments.


Stam Markianos-Wright <stam.markianos-wri...@arm.com> writes:
> [...]
> Also I've update the filenames of all our tests to make them a bit clearer:
>
> C tests:
>
> __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of 
> scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 
> target 
> pragma", "-march8.2-a" (now does not error out at all). There now include 
> register asms to check more MOV alternatives.
>
> __ bfloat16_scalar_compile_4.c: The _Complex error test.
>
> __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to 
> x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler 
> scan).

Sounds good to me, although TBH the "_compile" feels a bit redundant.

> I had also done a small c++ test, but have chosen to shift that to the [2/2] 
> patch because it is currently being blocked by target_invalid_conversion.

OK.  Does that include the mangling test?

> [...]
>>>> - a test that involves moving constants, for both scalars and vectors.
>>>>     You can create zero scalar constants in C++ using bfloat16_t() etc.
>>>>     For vectors it's possible to do things like:
>>>>
>>>>       typedef short v2bf __attribute__((vector_size(4)));
>>>>       v2hi foo (void) { return (v2hi) 0x12345678; }
>>>>
>>>>     The same sort of things should work for bfloat16x4_t and bfloat16x8_t.
>>>
>>> Leaving this as an open issue for now because I'm not 100% sure what we
>>> should/shouldn't be allowing past the tree-level target hooks.
>>>
>>> If we do want to block this we would do this in the [2/2] patch.
>>> I will come back to it and create a scan-assembler test when I'm more clear 
>>> on
>>> what we should and shouldn't allow at the higher level :)
>> 
>> FWIW, I'm not sure we should go out of our way to disallow this.
>> Preventing bfloat16_t() in C++ would IMO be unnatural.  And the
>> "(vector) vector-sized-integer" syntax specifically treats the vector
>> as a bundle of bits without really caring what the element type is.
>> Even if we did manage to forbid the conversion in that context,
>> it would still be possible to achieve the same thing using:
>> 
>>     v2hi
>>     foo (void)
>>     {
>>       union { v2hi v; unsigned int i; } u;
>>       u.i = 0x12345678;
>>       return u.v;
>>     }
>> 
> Added the compilation of "(vector) vector-sized-integer" in the vector tests.
>
> But target_invalid_conversion in the [2/2] patch is a complication to this 
> (as 
> with bfloat_16t() in c++.
>
> I was under the impression that the original intent of bfloat was for it to 
> be 
> storage only, with any initialisation happening through the float32 convert 
> intrinsic.
>
> Either I'd be happy to allow it, but it does feel like we'd slightly be going 
> against what's the ACLE currently.
> However, looking back at it now, it only mentions using ACLE intrinsics over 
> C 
> operators, so I'd be happy to allow this for vectors.
>
> For scalars though, if we e.g. were to allow:
>
> bfloat16_t (0x1234);
>
> on a single bfloat, I don't see how we could still block conversions like:
>
> bfloat16_t scalar1 = 0.1;
> bfloat16_t scalar2 = 0;
> bfloat16_t scalar3 = is_a_float;
>
> Agreed that the union {} would still always slip through, though.

It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e.
construction with zero initialisation.  I agree we don't want to
support "bfloat16_t(0.25)" etc.

> [...]
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c 
>>> b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>> new file mode 100644
>>> index 00000000000..f2bef671deb
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>> @@ -0,0 +1,51 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>>> +/* { dg-additional-options "-O3 --save-temps" } */
>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>> +
>>> +#include <arm_neon.h>
>>> +
>>> +/*
>>> +**stacktest1:
>>> +** ...
>>> +** str     h0, \[sp, [0-9]+\]
>>> +** ldr     h0, \[sp, [0-9]+\]
>>> +** ...
>>> +** ret
>>> +*/
>>> +bfloat16_t stacktest1 (bfloat16_t __a)
>>> +{
>>> +  volatile bfloat16_t b = __a;
>>> +  return b;
>>> +}
>>> +
>>> +/*
>>> +**stacktest2:
>>> +** ...
>>> +** str     d0, \[sp, [0-9]+\]
>>> +** ldr     d0, \[sp, [0-9]+\]
>>> +** ...
>>> +** ret
>>> +*/
>>> +bfloat16x4_t stacktest2 (bfloat16x4_t __a)
>>> +{
>>> +  volatile bfloat16x4_t b = __a;
>>> +  return b;
>>> +}
>>> +
>>> +/*
>>> +**stacktest3:
>>> +** ...
>>> +** str     q0, \[sp\]
>>> +** ldr     q0, \[sp\]
>>> +** ...
>>> +** ret
>>> +*/
>>> +bfloat16x8_t stacktest3 (bfloat16x8_t __a)
>>> +{
>>> +  volatile bfloat16x8_t b = __a;
>>> +  return b;
>>> +}
>> 
>> Might be a daft question, but why do we have an offset for the first
>> two and not for the last one?  Might be worth hard-coding whatever
>> offset we use.

I should have realised first time, but it's because we allocate the
local variable area downwards from the soft frame pointer.  So the
area gets padded downwards rather than upwards.

> [...]
> @@ -97,6 +107,12 @@
>  ;; Copy of the above.
>  (define_mode_iterator VQ2 [V16QI V8HI V4SI V2DI V8HF V4SF V2DF])
>  
> +;; Quad vector modes suitable for moving.  Includes BFmode.
> +(define_mode_iterator VQMOV [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
> +
> +;; Quad vector modes suitable for moving.  Includes BFmode.
> +(define_mode_iterator VQMOV_NO2E [V16QI V8HI V4SI V8HF V8BF V4SF])

Comment pasto for VQMOV_NO2E.  Think it should be:

;; VQMOV without 2-element modes.

>  ;; Quad integer vector modes.
>  (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI])
>  
> @@ -160,6 +176,11 @@
>  (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
>                               V4HF V8HF V2SF V4SF V2DF])
>  
> +;; All Advanced SIMD modes suitable for moving, loading, and storing,
> +;; including special Bfloat vector types.
> +(define_mode_iterator VALL_F16MOV [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
> +                             V4HF V8HF V4BF V8BF V2SF V4SF V2DF])

Nit: line should be indented below "V8QI".

> @@ -226,6 +247,9 @@
>  ;; Advanced SIMD modes for Q and H types.
>  (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
>  
> +;; Advanced SIMD modes for BF vector types.
> +(define_mode_iterator VBF [V4BF V8BF])

Nothing in this patch uses VBF, so probably best to leave it until later.

> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c 
> b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c
> new file mode 100644
> index 00000000000..5186d0e3d24
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c
> @@ -0,0 +1,118 @@
> [...]
> +/*
> +**bfloat_mov_rm:
> +**   ...
> +**   strh    w2, \[sp, 14\]
> +**   ...
> +**   ret
> +*/
> +void bfloat_mov_rm (void)
> +{
> +  register bfloat16_t x asm ("w2");
> +  volatile bfloat16_t y;
> +  asm volatile ("#foo" : "=r" (x));
> +  y = x;
> +  asm volatile ("#foo" : : : "memory");
> +}

Probably simpler as:

/*
**bfloat_mov_rm:
**      strh    w2, \[x0\]
**      ret
*/
void bfloat_mov_rm (bfloat16_t *ptr)
{
  register bfloat16_t x asm ("w2");
  asm volatile ("#foo" : "=r" (x));
  *ptr = x;
}

> +/*
> +**bfloat_mov_mr:
> +**   ...
> +**   ldrh    w2, \[sp, 14\]
> +**   ...
> +**   ret
> +*/
> +void bfloat_mov_mr (void)
> +{
> +  volatile bfloat16_t x;
> +  register bfloat16_t y asm ("w2");
> +  asm volatile ("#foo" : : : "memory");
> +  y = x;
> +  asm volatile ("#foo" :: "r" (y));
> +}

Similarly here:

/*
**bfloat_mov_mr:
**      ldrh    w2, \[x0\]
**      ret
*/
void bfloat_mov_mr (bfloat16_t *ptr)
{
  register bfloat16_t y asm ("w2");
  y = *ptr;
  asm volatile ("#foo" :: "r" (y));
}

Same for _2.d and _3.c

> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c 
> b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c
> new file mode 100644
> index 00000000000..02656d32f14
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c
> @@ -0,0 +1,122 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-additional-options "-march=armv8.2-a -O3 --save-temps -std=gnu90" } 
> */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#pragma GCC push_options
> +#pragma GCC target ("+bf16")
> +
> +#include <arm_bf16.h>

This effectively tests the same thing as bfloat16_scalar_compile_1.c.
IMO the more interesting way round is:

#include <arm_bf16.h>

#pragma GCC push_options
#pragma GCC target ("+bf16")

like for the simd tests.  So _1.c is the normal "enable before include"
case, _2.c is "enable after include" and _3.c is "don't enable at all".

Thanks,
Richard

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

Reply via email to