Thanks for the update. The new patch looks really good, just some minor comments.
Stam Markianos-Wright <stam.markianos-wri...@arm.com> writes: > [...] > Also I've update the filenames of all our tests to make them a bit clearer: > > C tests: > > __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of > scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 > target > pragma", "-march8.2-a" (now does not error out at all). There now include > register asms to check more MOV alternatives. > > __ bfloat16_scalar_compile_4.c: The _Complex error test. > > __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to > x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler > scan). Sounds good to me, although TBH the "_compile" feels a bit redundant. > I had also done a small c++ test, but have chosen to shift that to the [2/2] > patch because it is currently being blocked by target_invalid_conversion. OK. Does that include the mangling test? > [...] >>>> - a test that involves moving constants, for both scalars and vectors. >>>> You can create zero scalar constants in C++ using bfloat16_t() etc. >>>> For vectors it's possible to do things like: >>>> >>>> typedef short v2bf __attribute__((vector_size(4))); >>>> v2hi foo (void) { return (v2hi) 0x12345678; } >>>> >>>> The same sort of things should work for bfloat16x4_t and bfloat16x8_t. >>> >>> Leaving this as an open issue for now because I'm not 100% sure what we >>> should/shouldn't be allowing past the tree-level target hooks. >>> >>> If we do want to block this we would do this in the [2/2] patch. >>> I will come back to it and create a scan-assembler test when I'm more clear >>> on >>> what we should and shouldn't allow at the higher level :) >> >> FWIW, I'm not sure we should go out of our way to disallow this. >> Preventing bfloat16_t() in C++ would IMO be unnatural. And the >> "(vector) vector-sized-integer" syntax specifically treats the vector >> as a bundle of bits without really caring what the element type is. >> Even if we did manage to forbid the conversion in that context, >> it would still be possible to achieve the same thing using: >> >> v2hi >> foo (void) >> { >> union { v2hi v; unsigned int i; } u; >> u.i = 0x12345678; >> return u.v; >> } >> > Added the compilation of "(vector) vector-sized-integer" in the vector tests. > > But target_invalid_conversion in the [2/2] patch is a complication to this > (as > with bfloat_16t() in c++. > > I was under the impression that the original intent of bfloat was for it to > be > storage only, with any initialisation happening through the float32 convert > intrinsic. > > Either I'd be happy to allow it, but it does feel like we'd slightly be going > against what's the ACLE currently. > However, looking back at it now, it only mentions using ACLE intrinsics over > C > operators, so I'd be happy to allow this for vectors. > > For scalars though, if we e.g. were to allow: > > bfloat16_t (0x1234); > > on a single bfloat, I don't see how we could still block conversions like: > > bfloat16_t scalar1 = 0.1; > bfloat16_t scalar2 = 0; > bfloat16_t scalar3 = is_a_float; > > Agreed that the union {} would still always slip through, though. It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e. construction with zero initialisation. I agree we don't want to support "bfloat16_t(0.25)" etc. > [...] >>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c >>> b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c >>> new file mode 100644 >>> index 00000000000..f2bef671deb >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c >>> @@ -0,0 +1,51 @@ >>> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >>> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >>> +/* { dg-additional-options "-O3 --save-temps" } */ >>> +/* { dg-final { check-function-bodies "**" "" } } */ >>> + >>> +#include <arm_neon.h> >>> + >>> +/* >>> +**stacktest1: >>> +** ... >>> +** str h0, \[sp, [0-9]+\] >>> +** ldr h0, \[sp, [0-9]+\] >>> +** ... >>> +** ret >>> +*/ >>> +bfloat16_t stacktest1 (bfloat16_t __a) >>> +{ >>> + volatile bfloat16_t b = __a; >>> + return b; >>> +} >>> + >>> +/* >>> +**stacktest2: >>> +** ... >>> +** str d0, \[sp, [0-9]+\] >>> +** ldr d0, \[sp, [0-9]+\] >>> +** ... >>> +** ret >>> +*/ >>> +bfloat16x4_t stacktest2 (bfloat16x4_t __a) >>> +{ >>> + volatile bfloat16x4_t b = __a; >>> + return b; >>> +} >>> + >>> +/* >>> +**stacktest3: >>> +** ... >>> +** str q0, \[sp\] >>> +** ldr q0, \[sp\] >>> +** ... >>> +** ret >>> +*/ >>> +bfloat16x8_t stacktest3 (bfloat16x8_t __a) >>> +{ >>> + volatile bfloat16x8_t b = __a; >>> + return b; >>> +} >> >> Might be a daft question, but why do we have an offset for the first >> two and not for the last one? Might be worth hard-coding whatever >> offset we use. I should have realised first time, but it's because we allocate the local variable area downwards from the soft frame pointer. So the area gets padded downwards rather than upwards. > [...] > @@ -97,6 +107,12 @@ > ;; Copy of the above. > (define_mode_iterator VQ2 [V16QI V8HI V4SI V2DI V8HF V4SF V2DF]) > > +;; Quad vector modes suitable for moving. Includes BFmode. > +(define_mode_iterator VQMOV [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF]) > + > +;; Quad vector modes suitable for moving. Includes BFmode. > +(define_mode_iterator VQMOV_NO2E [V16QI V8HI V4SI V8HF V8BF V4SF]) Comment pasto for VQMOV_NO2E. Think it should be: ;; VQMOV without 2-element modes. > ;; Quad integer vector modes. > (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) > > @@ -160,6 +176,11 @@ > (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI > V4HF V8HF V2SF V4SF V2DF]) > > +;; All Advanced SIMD modes suitable for moving, loading, and storing, > +;; including special Bfloat vector types. > +(define_mode_iterator VALL_F16MOV [V8QI V16QI V4HI V8HI V2SI V4SI V2DI > + V4HF V8HF V4BF V8BF V2SF V4SF V2DF]) Nit: line should be indented below "V8QI". > @@ -226,6 +247,9 @@ > ;; Advanced SIMD modes for Q and H types. > (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI]) > > +;; Advanced SIMD modes for BF vector types. > +(define_mode_iterator VBF [V4BF V8BF]) Nothing in this patch uses VBF, so probably best to leave it until later. > diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c > b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c > new file mode 100644 > index 00000000000..5186d0e3d24 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_1.c > @@ -0,0 +1,118 @@ > [...] > +/* > +**bfloat_mov_rm: > +** ... > +** strh w2, \[sp, 14\] > +** ... > +** ret > +*/ > +void bfloat_mov_rm (void) > +{ > + register bfloat16_t x asm ("w2"); > + volatile bfloat16_t y; > + asm volatile ("#foo" : "=r" (x)); > + y = x; > + asm volatile ("#foo" : : : "memory"); > +} Probably simpler as: /* **bfloat_mov_rm: ** strh w2, \[x0\] ** ret */ void bfloat_mov_rm (bfloat16_t *ptr) { register bfloat16_t x asm ("w2"); asm volatile ("#foo" : "=r" (x)); *ptr = x; } > +/* > +**bfloat_mov_mr: > +** ... > +** ldrh w2, \[sp, 14\] > +** ... > +** ret > +*/ > +void bfloat_mov_mr (void) > +{ > + volatile bfloat16_t x; > + register bfloat16_t y asm ("w2"); > + asm volatile ("#foo" : : : "memory"); > + y = x; > + asm volatile ("#foo" :: "r" (y)); > +} Similarly here: /* **bfloat_mov_mr: ** ldrh w2, \[x0\] ** ret */ void bfloat_mov_mr (bfloat16_t *ptr) { register bfloat16_t y asm ("w2"); y = *ptr; asm volatile ("#foo" :: "r" (y)); } Same for _2.d and _3.c > diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c > b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c > new file mode 100644 > index 00000000000..02656d32f14 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_compile_2.c > @@ -0,0 +1,122 @@ > +/* { dg-do assemble { target { aarch64*-*-* } } } */ > +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ > +/* { dg-additional-options "-march=armv8.2-a -O3 --save-temps -std=gnu90" } > */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +#pragma GCC push_options > +#pragma GCC target ("+bf16") > + > +#include <arm_bf16.h> This effectively tests the same thing as bfloat16_scalar_compile_1.c. IMO the more interesting way round is: #include <arm_bf16.h> #pragma GCC push_options #pragma GCC target ("+bf16") like for the simd tests. So _1.c is the normal "enable before include" case, _2.c is "enable after include" and _3.c is "don't enable at all". Thanks, Richard