Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

Delia Burduv Wed, 04 Mar 2020 09:22:23 -0800

Hi, This is the latest version of the patch.
Thanks, Delia On 2/21/20 11:41 AM, Kyrill Tkachov wrote:
Hi Delia, On 2/19/20 5:23 PM, Delia Burduv wrote:
Hi,
Here is the latest version of the patch. It just has some minorformatting changes that were brought up by Richard Sandiford in theAArch64 patches
Thanks, Delia On 1/31/20 3:23 PM, Delia Burduv wrote:
Here is the updated patch. The changes are minor, so let me know ifthere is anything else to fix or if it can be committed.
Thank you, Delia On 1/30/20 2:55 PM, Kyrill Tkachov wrote:
Hi Delia, On 1/28/20 4:44 PM, Delia Burduv wrote:
Ping.
------------------------------------------------------------------------
*From:* Delia Burduv <delia.bur...@arm.com> *Sent:* 22 January 2020 17:26 *To:* gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
*Cc:* ni...@redhat.com <ni...@redhat.com>; Richard Earnshaw<richard.earns...@arm.com>; Ramana Radhakrishnan<ramana.radhakrish...@arm.com>; Kyrylo Tkachov<kyrylo.tkac...@arm.com>*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmlaand vfma<b/t> for AArch32 AdvSIMD
Ping. I have read Richard Sandiford's comments on the AArch64 patches and I
will apply what is relevant to this patch as well. Particularly, Iwillchange the tests to use the exact input and output registers and Iwill
change the types of the rtl patterns.

Please send the updated patches so that someone can commit them foryou once they're reviewed.
Thanks, Kyrill
On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab andvfmat
> as part of the BFloat16 extension. > (https://developer.arm.com/docs/101028/latest.) > The intrinsics are declared in arm_neon.h and the RTL patterns are > defined in neon.md. > Two new tests are added to check assembler output and lane indices. > > This patch depends on the Arm back-end patche. > (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >
> Tested for regression on arm-none-eabi and armeb-none-eabi. Idon't have
> commit rights, so if this is ok can someone please commit it for me? > > gcc/ChangeLog: > > 2019-11-12ï¿½ Delia Burduv <delia.bur...@arm.com> > >ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon.h (vbfmmlaq_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_lane_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_lane_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlalbq_laneq_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmlaltq_laneq_f32): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/arm_neon_builtins.def (vbfmmla): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_lane): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_lane): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmab_laneq): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (vbfmat_laneq): New. >ï¿½ ï¿½ï¿½ï¿½ï¿½ * config/arm/iterators.md (BF_MA): New int iterator. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (bt): New int attribute. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (VQXBF): Copy of VQX with V8BF. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V_HALF): Added V8BF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ * config/arm/neon.md (neon_vbfmmlav8hi): Newinsn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma<bt>v8hi): New insn. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma<bt>_lanev8hi): New insn.
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vbfma<bt>_laneqv8hi): Newexpand.>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (neon_vget_high<mode>): Changediterator to VQXBF.
>ï¿½ ï¿½ï¿½ï¿½ï¿½* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAB): New UNSPEC. >ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_BFMAT): New UNSPEC. > > 2019-11-12ï¿½ Delia Burduv <delia.bur...@arm.com> >
>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_1.c: Newtest.>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_ma_2.c: Newtest.>ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * gcc.target/arm/simd/bf16_mmla_1.c: Newtest.
This looks good, a few minor things though... diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248100644
--- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r,float32x4_t __a, float32x4_t __b,
ï¿½ï¿½ return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); ï¿½} +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+bf16") + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ +ï¿½ return __builtin_neon_vbfmmlav8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ +ï¿½ return __builtin_neon_vbfmabv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ +ï¿½ return __builtin_neon_vbfmatv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ const int __index) +{ +ï¿½ return __builtin_neon_vbfmab_lanev8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ const int __index) +{ +ï¿½ return __builtin_neon_vbfmat_lanev8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ const int __index) +{ +ï¿½ return __builtin_neon_vbfmab_laneqv8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ const int __index) +{ +ï¿½ return __builtin_neon_vbfmat_laneqv8bf (__r, __a, __b, __index); +} + +#pragma GCC pop_options + ï¿½#pragma GCC pop_options ï¿½#endif
diff --git a/gcc/config/arm/arm_neon_builtins.defb/gcc/config/arm/arm_neon_builtins.defindexe9ff4e501cbb5d16b9211f5bc96db376ddf21afc..cc06783daf393f7166fd922f86b3db79c02ba188100644
--- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -373,3 +373,12 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf) ï¿½VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf) ï¿½VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf) ï¿½VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf) + +VAR1 (TERNOP, vbfmmla, v8bf) + +VAR1 (TERNOP, vbfmab, v8bf) +VAR1 (TERNOP, vbfmat, v8bf) +VAR1 (MAC_LANE, vbfmab_lane, v8bf) +VAR1 (MAC_LANE, vbfmat_lane, v8bf) +VAR1 (MAC_LANE, vbfmab_laneq, v8bf) +VAR1 (MAC_LANE, vbfmat_laneq, v8bf)
The instructions produced from these intrinsics have the form vmlla,vfmab, vfmat. Let's use those names here rather than the "vbf*" ones toavoid confusion in the future.
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index33e29509f00a89fa23d0546687c0e4643f0b32d2..72b8ce0bb26dcd520603b907b4f86a74d0164332100644
--- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -106,6 +106,9 @@ ï¿½;; Quad-width vector modes plus 64-bit elements. ï¿½(define_mode_iterator VQX [V16QI V8HI V8HF V4SI V4SF V2DI]) +;; Quad-width vector modes plus 64-bit elements and V8BF.
+(define_mode_iterator VQXBF [V16QI V8HI V8HF (V8BF "TARGET_BF16_SIMD")V4SI V4SF V2DI])
+ ï¿½;; Quad-width vector modes without floating-point elements. ï¿½(define_mode_iterator VQI [V16QI V8HI V4SI]) @@ -485,6 +488,8 @@ ï¿½(define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270])
ï¿½(define_int_iterator VCMLA [UNSPEC_VCMLA UNSPEC_VCMLA90UNSPEC_VCMLA180 UNSPEC_VCMLA270])
+(define_int_iterator BF_MA [UNSPEC_BFMAB UNSPEC_BFMAT]) + ï¿½;;---------------------------------------------------------------------------- ï¿½;; Mode attributes ï¿½;;---------------------------------------------------------------------------- @@ -609,7 +614,8 @@ ï¿½(define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI") ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V8HF "V4HF") (V4SIï¿½ "V2SI") ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V4SF "V2SF") (V2DF "DF") -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V2DI "DI") (V4HF "HF")]) +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V2DI "DI") (V4HF "HF") +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (V8BF "V4BF")]) ï¿½;; Same, but lower-case. ï¿½(define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi") @@ -1171,4 +1177,7 @@ ï¿½(define_int_attr opsuffix [(UNSPEC_DOT_S "s8") ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (UNSPEC_DOT_U "u8")]) +;; An iterator for VFMA<bt> +(define_int_attr bt [(UNSPEC_BFMAB "b") (UNSPEC_BFMAT "t")]) +
ï¿½(define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT"smlawt")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index6087ca6f2badde6a492bb515a2cb5846f3d4ad8e..4e0d0b5c317a81839de9dee581c5e351d3193dfa100644
--- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3875,7 +3875,7 @@ if (BYTES_BIG_ENDIAN) ï¿½(define_expand "neon_vget_high<mode>" ï¿½ï¿½ [(match_operand:<V_HALF> 0 "s_register_operand") -ï¿½ï¿½ (match_operand:VQX 1 "s_register_operand")] +ï¿½ï¿½ (match_operand:VQXBF 1 "s_register_operand")] ï¿½ï¿½ "TARGET_NEON" ï¿½{ ï¿½ï¿½ emit_move_insn (operands[0], @@ -6552,3 +6552,64 @@ if (BYTES_BIG_ENDIAN) ï¿½ "vabd.<V_if_elem> %<V_reg>0, %<V_reg>1, %<V_reg>2" ï¿½ [(set_attr "type" "neon_fp_abd_s<q>")] ï¿½) + +(define_insn "neon_vbfmmlav8bf" +ï¿½ [(set (match_operand:V4SF 0 "register_operand" "=w") +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (unspec:V4SF [(match_operand:V8BF 2"register_operand" "w")+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (match_operand:V8BF 3"register_operand" "w")]
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ UNSPEC_BFMMLA)))] +ï¿½ "TARGET_BF16_SIMD" +ï¿½ "vmmla.bf16\\t%q0, %q2, %q3" +ï¿½ [(set_attr "type" "neon_fp_mla_s_q")] +) + +(define_insn "neon_vbfma<bt>v8bf" +ï¿½ [(set (match_operand:V4SF 0 "register_operand" "=w") +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (unspec:V4SF [(match_operand:V8BF 2"register_operand" "w")+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (match_operand:V8BF 3"register_operand" "w")]
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ BF_MA)))] +ï¿½ "TARGET_BF16_SIMD" +ï¿½ "vfma<bt>.bf16\\t%q0, %q2, %q3" +ï¿½ [(set_attr "type" "neon_fp_mla_s_q")] +) + +(define_insn "neon_vbfma<bt>_lanev8bf" +ï¿½ [(set (match_operand:V4SF 0 "register_operand" "=w") +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (unspec:V4SF [(match_operand:V8BF 2"register_operand" "w")+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (match_operand:V4BF 3"register_operand" "x")+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (match_operand:SI 4"const_int_operand" "n")]
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ BF_MA)))] +ï¿½ "TARGET_BF16_SIMD" +ï¿½ "vfma<bt>.bf16\\t%q0, %q2, %P3[%c4]" +ï¿½ [(set_attr "type" "neon_fp_mla_s_scalar_q")] +) + +(define_expand "neon_vbfma<bt>_laneqv8bf" +ï¿½ [(set (match_operand:V4SF 0 "register_operand" "=w") +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (unspec:V4SF [(match_operand:V8BF 2"register_operand" "w")+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (match_operand:V8BF 3"register_operand" "x")+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (match_operand:SI 4"const_int_operand" "n")]
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ BF_MA)))] +ï¿½ "TARGET_BF16_SIMD" +ï¿½ { +ï¿½ï¿½ï¿½ int lane = INTVAL (operands[4]); +ï¿½ï¿½ï¿½ gcc_assert (lane >=0 && lane <=7); Let's use the IN_RANGE macro to assert this. +ï¿½ï¿½ï¿½ if (lane < 4) +ï¿½ï¿½ï¿½ {
+ï¿½ï¿½ï¿½ emit_insn (gen_neon_vbfma<bt>_lanev8bf (operands[0], operands[1],operands[2], operands[3], operands[4]));
+ï¿½ï¿½ï¿½ } +ï¿½ï¿½ï¿½ else +ï¿½ï¿½ï¿½ï¿½ï¿½ { +ï¿½ï¿½ï¿½ rtx op_highpart = gen_reg_rtx (V4BFmode); +ï¿½ï¿½ï¿½ emit_insn (gen_neon_vget_highv8bf (op_highpart, operands[3])); +ï¿½ï¿½ï¿½ operands[4] = GEN_INT (lane - 4);
+ï¿½ï¿½ï¿½ emit_insn (gen_neon_vbfma<bt>_lanev8bf (operands[0], operands[1],operands[2], op_highpart, operands[4]));
+ï¿½ï¿½ï¿½ï¿½ï¿½ } +ï¿½ï¿½ï¿½ DONE; +ï¿½ } +ï¿½ [(set_attr "type" "neon_fp_mla_s_scalar_q")] +) diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index8f4a705f43efdb6baf03b39cee589cf728620687..97f08abec0a089b5cd95840da12ae22f7c960b28100644
--- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -501,4 +501,7 @@ ï¿½ï¿½ UNSPEC_VCMLA90 ï¿½ï¿½ UNSPEC_VCMLA180 ï¿½ï¿½ UNSPEC_VCMLA270 +ï¿½ UNSPEC_BFMMLA +ï¿½ UNSPEC_BFMAB +ï¿½ UNSPEC_BFMAT ï¿½])
diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.cb/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c
new file mode 100644
index0000000000000000000000000000000000000000..7602db9597a955b2a303f2dc55b9ff80f81b3b6f
--- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c @@ -0,0 +1,79 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" }ï¿½ */ +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ + +#include "arm_neon.h" + +/* +**test_vbfmlalbq_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ vfmab.bf16ï¿½ï¿½ï¿½ q0, q1, q2 +**ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmlalbq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ +ï¿½ return vbfmlalbq_f32 (r, a, b); +} + +/* +**test_vbfmlaltq_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ vfmat.bf16ï¿½ï¿½ï¿½ q0, q1, q2 +**ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmlaltq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ +ï¿½ return vbfmlaltq_f32 (r, a, b); +} + +/* +**test_vbfmlalbq_lane_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ vfmab.bf16ï¿½ï¿½ï¿½ q0, q1, d4[0] +**ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ +ï¿½ return vbfmlalbq_lane_f32 (r, a, b, 0); +} + +/* +**test_vbfmlaltq_lane_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ vfmat.bf16ï¿½ï¿½ï¿½ q0, q1, d4[2] +**ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ +ï¿½ return vbfmlaltq_lane_f32 (r, a, b, 2); +} + +/* +**test_vbfmlalbq_laneq_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ vfmab.bf16ï¿½ï¿½ï¿½ q0, q1, d5[1] +**ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ +ï¿½ return vbfmlalbq_laneq_f32 (r, a, b, 5); +} + +/* +**test_vbfmlaltq_laneq_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ vfmat.bf16ï¿½ï¿½ï¿½ q0, q1, d5[3] +**ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ +ï¿½ return vbfmlaltq_laneq_f32 (r, a, b, 7); +}
diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.cb/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c
new file mode 100644
index0000000000000000000000000000000000000000..226ed7e1d8e4747d73b0518c809aaf0e3c5bc78d
--- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c @@ -0,0 +1,31 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include "arm_neon.h" + +/* Test lane index limits for vbfmlalbq_lane_f32ï¿½ */ +float32x4_t
+test_vbfmlalbq_lane_f32_low (float32x4_t r, bfloat16x8_t a,bfloat16x4_t b)
+{
+ï¿½ return __builtin_neon_vbfmab_lanev8bf (r, a, b, -1); /* { dg-error{lane -1 out of range 0 - 3} } */
+} + +float32x4_t
+test_vbfmlalbq_lane_f32_high (float32x4_t r, bfloat16x8_t a,bfloat16x4_t b)
+{
+ï¿½ return __builtin_neon_vbfmab_lanev8bf (r, a, b, 4); /* { dg-error{lane 4 out of range 0 - 3} } */
+} + +/* Test lane index limits for vbfmlaltq_lane_f32ï¿½ */ +float32x4_t
+test_vbfmlaltq_lane_f32_low (float32x4_t r, bfloat16x8_t a,bfloat16x4_t b)
+{
+ï¿½ return __builtin_neon_vbfmat_lanev8bf (r, a, b, -1); /* { dg-error{lane -1 out of range 0 - 3} } */
+} + +float32x4_t
+test_vbfmlaltq_lane_f32_high (float32x4_t r, bfloat16x8_t a,bfloat16x4_t b)
+{
+ï¿½ return __builtin_neon_vbfmat_lanev8bf (r, a, b, 4); /* { dg-error{lane 4 out of range 0 - 3} } */
+}
We want to be testing the ACLE intrinsics here rather than the__builtin_neon* builtins directly. The builtins are an implementationdetail that the user should not rely on.
Ok with these changes. Thanks, Kyrill
diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.cb/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c
new file mode 100644
index0000000000000000000000000000000000000000..d8118a7111a359464f1508e92ac6183ea1f4eeed
--- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c @@ -0,0 +1,18 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" } */ +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ + +#include <arm_neon.h> + +/*test_vbfmmlaq_f32: +**ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ ... +**ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ vmmla.bf16ï¿½ï¿½ï¿½ q0, q1, q2 +**ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ bxï¿½ï¿½ï¿½ lr +*/ +float32x4_t +test_vbfmmlaq_f32 (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) +{ +ï¿½ return vbfmmlaq_f32 (r, x, y); +}
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index a66961d0c513323844dd069b05cdfccc3e432cfc..1974967b171c28b95b21dc27837d7fe69f2d9f64 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -19426,6 +19426,59 @@ vcvtq_high_bf16_f32 (bfloat16x8_t inactive, float32x4_t __a) return __builtin_neon_vbfcvtv4sf_highv8bf (inactive, __a); } +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vmmlav8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vfmabv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vfmatv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, + const int __index) +{ + return __builtin_neon_vfmab_lanev8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, + const int __index) +{ + return __builtin_neon_vfmat_lanev8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, + const int __index) +{ + return __builtin_neon_vfmab_laneqv8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, + const int __index) +{ + return __builtin_neon_vfmat_laneqv8bf (__r, __a, __b, __index); +} + #pragma GCC pop_options #ifdef __cplusplus diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index 48c06c43a1744da7e143f6070ac945e8dd7225b6..38c8bb0b0ebe2c3cc59da629c7630c389ddd8317 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -391,3 +391,12 @@ VAR2 (UNOP, vbfcvt, v4bf, v8bf) VAR1 (UNOP, vbfcvt_high, v8bf) VAR2 (UNOP, vbfcvtv4sf, v4bf, v8bf) VAR1 (BINOP, vbfcvtv4sf_high, v8bf) + +VAR1 (TERNOP, vmmla, v8bf) + +VAR1 (TERNOP, vfmab, v8bf) +VAR1 (TERNOP, vfmat, v8bf) +VAR1 (MAC_LANE, vfmab_lane, v8bf) +VAR1 (MAC_LANE, vfmat_lane, v8bf) +VAR1 (MAC_LANE, vfmab_laneq, v8bf) +VAR1 (MAC_LANE, vfmat_laneq, v8bf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 5f4e3d1235813ab81c176505f9a98d702359f7ec..831400192280d892835055174d9daab22ab08c92 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -106,6 +106,9 @@ ;; Quad-width vector modes plus 64-bit elements. (define_mode_iterator VQX [V16QI V8HI V8HF V8BF V4SI V4SF V2DI]) +;; Quad-width vector modes plus 64-bit elements and V8BF. +(define_mode_iterator VQXBF [V16QI V8HI V8HF (V8BF "TARGET_BF16_SIMD") V4SI V4SF V2DI]) + ;; Quad-width vector modes without floating-point elements. (define_mode_iterator VQI [V16QI V8HI V4SI]) @@ -493,6 +496,8 @@ (define_int_iterator MATMUL [UNSPEC_MATMUL_S UNSPEC_MATMUL_U UNSPEC_MATMUL_US]) +(define_int_iterator BF_MA [UNSPEC_BFMAB UNSPEC_BFMAT]) + ;;---------------------------------------------------------------------------- ;; Mode attributes ;;---------------------------------------------------------------------------- @@ -1209,3 +1214,6 @@ ]) (define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")]) + +;; An iterator for VFMA<bt> +(define_int_attr bt [(UNSPEC_BFMAB "b") (UNSPEC_BFMAT "t")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index f5286d9c4b1a309f6ebe864c86596aaceb05c05b..75cc31a0d144724e8e51cb7f05a27e71a77eed25 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3924,7 +3924,7 @@ if (BYTES_BIG_ENDIAN) (define_expand "neon_vget_high<mode>" [(match_operand:<V_HALF> 0 "s_register_operand") - (match_operand:VQX 1 "s_register_operand")] + (match_operand:VQXBF 1 "s_register_operand")] "TARGET_NEON" { emit_move_insn (operands[0], @@ -6737,3 +6737,64 @@ if (BYTES_BIG_ENDIAN) "TARGET_BF16_FP" "" ) + +(define_insn "neon_vmmlav8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus:V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V8BF 3 "register_operand" "w")] + UNSPEC_BFMMLA)))] + "TARGET_BF16_SIMD" + "vmmla.bf16\\t%q0, %q2, %q3" + [(set_attr "type" "neon_fp_mla_s_q")] +) + +(define_insn "neon_vfma<bt>v8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus: V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V8BF 3 "register_operand" "w")] + BF_MA)))] + "TARGET_BF16_SIMD" + "vfma<bt>.bf16\\t%q0, %q2, %q3" + [(set_attr "type" "neon_fp_mla_s_q")] +) + +(define_insn "neon_vfma<bt>_lanev8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus: V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V4BF 3 "register_operand" "x") + (match_operand:SI 4 "const_int_operand" "n")] + BF_MA)))] + "TARGET_BF16_SIMD" + "vfma<bt>.bf16\\t%q0, %q2, %P3[%c4]" + [(set_attr "type" "neon_fp_mla_s_scalar_q")] +) + +(define_expand "neon_vfma<bt>_laneqv8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus: V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V8BF 3 "register_operand" "x") + (match_operand:SI 4 "const_int_operand" "n")] + BF_MA)))] + "TARGET_BF16_SIMD" + { + int lane = INTVAL (operands[4]); + gcc_assert (IN_RANGE(lane, 0, 7)); + if (lane < 4) + { + emit_insn (gen_neon_vfma<bt>_lanev8bf (operands[0], operands[1], operands[2], operands[3], operands[4])); + } + else + { + rtx op_highpart = gen_reg_rtx (V4BFmode); + emit_insn (gen_neon_vget_highv8bf (op_highpart, operands[3])); + operands[4] = GEN_INT (lane - 4); + emit_insn (gen_neon_vfma<bt>_lanev8bf (operands[0], operands[1], operands[2], op_highpart, operands[4])); + } + DONE; + } + [(set_attr "type" "neon_fp_mla_s_scalar_q")] +) diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index b36ae512a6ebcf231b46a24e127c62e22e71a34f..f0b1f465de4b63d624510783576700519044717d 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -508,4 +508,7 @@ UNSPEC_MATMUL_US UNSPEC_BFCVT UNSPEC_BFCVT_HIGH + UNSPEC_BFMMLA + UNSPEC_BFMAB + UNSPEC_BFMAT ]) diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c new file mode 100644 index 0000000000000000000000000000000000000000..d7a944923cc889bc5f8eaeaa6a4de7672bacb8c3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c @@ -0,0 +1,79 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" } */ +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ + +#include "arm_neon.h" + +/* +**test_vfmabq_f32: +** ... +** vfmab.bf16 q0, q1, q2 +** bx lr +*/ +float32x4_t +test_vfmabq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlalbq_f32 (r, a, b); +} + +/* +**test_vfmatq_f32: +** ... +** vfmat.bf16 q0, q1, q2 +** bx lr +*/ +float32x4_t +test_vfmatq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlaltq_f32 (r, a, b); +} + +/* +**test_vfmabq_lane_f32: +** ... +** vfmab.bf16 q0, q1, d4[0] +** bx lr +*/ +float32x4_t +test_vfmabq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return vbfmlalbq_lane_f32 (r, a, b, 0); +} + +/* +**test_vfmatq_lane_f32: +** ... +** vfmat.bf16 q0, q1, d4[2] +** bx lr +*/ +float32x4_t +test_vfmatq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return vbfmlaltq_lane_f32 (r, a, b, 2); +} + +/* +**test_vfmabq_laneq_f32: +** ... +** vfmab.bf16 q0, q1, d5[1] +** bx lr +*/ +float32x4_t +test_vfmabq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlalbq_laneq_f32 (r, a, b, 5); +} + +/* +**test_vfmatq_laneq_f32: +** ... +** vfmat.bf16 q0, q1, d5[3] +** bx lr +*/ +float32x4_t +test_vfmatq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlaltq_laneq_f32 (r, a, b, 7); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c new file mode 100644 index 0000000000000000000000000000000000000000..5a7a2a71791968045b413fc6c1d7daade5cf30f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c @@ -0,0 +1,35 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include "arm_neon.h" + +/* Test lane index limits for vfmabq_lane_f32 */ +float32x4_t +test_vfmabq_lane_f32_low (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */ + return vbfmlalbq_lane_f32 (r, a, b, -1); +} + +float32x4_t +test_vfmabq_lane_f32_high (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */ + return vbfmlalbq_lane_f32 (r, a, b, 4); +} + +/* Test lane index limits for vfmatq_lane_f32 */ +float32x4_t +test_vfmatq_lane_f32_low (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + /* { dg-error "lane -2 out of range 0 - 3" "" { target *-*-* } 0 } */ + return vbfmlaltq_lane_f32 (r, a, b, -2); +} + +float32x4_t +test_vfmatq_lane_f32_high (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + /* { dg-error "lane 5 out of range 0 - 3" "" { target *-*-* } 0 } */ + return vbfmlaltq_lane_f32 (r, a, b, 5); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c new file mode 100644 index 0000000000000000000000000000000000000000..0b74e19203bbdbf8668f6c214843870338d27655 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c @@ -0,0 +1,18 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" } */ +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ + +#include <arm_neon.h> + +/*test_vfmmlaq_f32: +** ... +** vmmla.bf16 q0, q1, q2 +** bx lr +*/ +float32x4_t +test_vmmlaq_f32 (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) +{ + return vbfmmlaq_f32 (r, x, y); +}

Previous message

View by thread

View by date

Next message

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla an... Delia Burduv

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmm... Kyrill Tkachov

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16... Delia Burduv

Re: [GCC][PATCH][AArch32] ACLE intrinsics bflo... Kyrill Tkachov

Re: [GCC][PATCH][AArch32] ACLE intrinsics ... Kyrill Tkachov

Reply via email to