Hi Saurabh,
> On 6 Nov 2024, at 11:03, [email protected] wrote:
>
>
> The AArch64 FEAT_FP8 extension introduces instructions for conversion
> and scaling.
>
> This patch introduces the following intrinsics:
> 1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm.
> 2. vcvt{q}_mf8_f16_fpm.
> 3. vcvt_{high}_mf8_f32_fpm.
> 4. vscale{q}_{f16|f32|f64}.
>
> We introduced three new aarch64_builtin_signatures enum variants:
> 1. binary_fpm.
> 2. ternary_fpm.
> 3. unary_fpm.
>
> We added support for these variants for declaring types and for expanding to
> RTL.
>
> We added new simd_types for integers (s32, s32q, and s64q) and for
> fp8 (f8, and f8q).
>
> Also changed the faminmax intrinsic instruction pattern so that it works
> better with the new fscale pattern.
>
> Because we added support for fp8 intrinsics here, we modified the check
> in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not
> defined.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-builtins.cc
> (enum class): New variants to support new signatures.
> (aarch64_fntype): Handle new signatures.
> (aarch64_expand_pragma_builtin): Handle new signatures.
> * config/aarch64/aarch64-c.cc
> (aarch64_update_cpp_builtins): New flag for FP8.
> * config/aarch64/aarch64-simd-pragma-builtins.def
> (ENTRY_BINARY_FPM): Macro to declare unary fpm intrinsics.
> (ENTRY_TERNARY_FPM): Macro to declare ternary fpm intrinsics.
> (ENTRY_UNARY_FPM): Macro to declare unary fpm intrinsics.
> (ENTRY_VHSDF_VHSDI): Macro to declare binary intrinsics.
> * config/aarch64/aarch64-simd.md
> (@aarch64_<faminmax_uns_op><mode>): Renamed.
> (@aarch64_<faminmax_uns_op><VHSDF:mode><VHSDF:mode>): Renamed.
> (@aarch64_<fpm_uns_name><V8HFBF:mode><VB:mode>): Unary fpm
> pattern.
> (@aarch64_<fpm_uns_name><V8HFBF:mode><V16QI_ONLY:mode>): Unary
> fpm pattern.
> (@aarch64_<fpm_uns_name><VB:mode><VCVTFPM:mode><VH_SF:mode>):
> Binary fpm pattern.
> (@aarch64_<fpm_uns_name><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>):
> Ternary fpm pattern.
> (@aarch64_<fpm_uns_op><VHSDF:mode><VHSDI:mode>): Scale fpm
> pattern.
> * config/aarch64/iterators.md: New attributes and iterators.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature
> macro doesn't exist.
> * gcc.target/aarch64/simd/scale_fpm.c: New test.
> * gcc.target/aarch64/simd/vcvt_fpm.c: New test.
>
> ---
>
> I could not find a way to compress declarations in
> aarch64-simd-pragma-builtins.def for convert instructions as there was
> no pattern apart from the repetion for vcvt1/vcvt2 types. Let me know
> if those declrations can be expressed more concisely.
>
> In the scale instructions, I am not doing any casting from float to int
> modes in the second operand. Let me know if that's a problem.
> ---
> gcc/config/aarch64/aarch64-builtins.cc | 132 ++++++++++--
> gcc/config/aarch64/aarch64-c.cc | 2 +
> .../aarch64/aarch64-simd-pragma-builtins.def | 56 +++++
> gcc/config/aarch64/aarch64-simd.md | 72 ++++++-
> gcc/config/aarch64/iterators.md | 99 +++++++++
> gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 10 -
> .../gcc.target/aarch64/simd/scale_fpm.c | 60 ++++++
> .../gcc.target/aarch64/simd/vcvt_fpm.c | 197 ++++++++++++++++++
> 8 files changed, 603 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/scale_fpm.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vcvt_fpm.c
>
> <0001-aarch64-Add-support-for-fp8-convert-and-scale.patch>
diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index cfe95bd4c31..87bbfb0e586 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -9982,13 +9982,13 @@
)
;; faminmax
-(define_insn "@aarch64_<faminmax_uns_op><mode>"
+(define_insn "@aarch64_<faminmax_uns_op><VHSDF:mode><VHSDF:mode>"
[(set (match_operand:VHSDF 0 "register_operand" "=w")
(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
(match_operand:VHSDF 2 "register_operand" "w")]
FAMINMAX_UNS))]
"TARGET_FAMINMAX"
- "<faminmax_uns_op>\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+ "<faminmax_uns_op>\t%0.<Vtype>, %1.<VHSDF:Vtype>, %2.<VHSDF:Vtype>"
)
(define_insn "*aarch64_faminmax_fused"
@@ -9999,3 +9999,71 @@
"TARGET_FAMINMAX"
"<faminmax_op>\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
)
+
+;; fpm unary instructions.
+(define_insn "@aarch64_<fpm_uns_name><V8HFBF:mode><VB:mode>"
+ [(set (match_operand:V8HFBF 0 "register_operand" "=w")
+ (unspec:V8HFBF
+ [(match_operand:VB 1 "register_operand" "w")
+ (reg:DI FPM_REGNUM)]
+ FPM_UNARY_UNS))]
+ "TARGET_FP8"
+ "<fpm_uns_op>\t%0.<V8HFBF:Vtype>, %1.<VB:Vtype>"
+)
+
+;; fpm unary instructions, where the input is lowered from V16QI to
+;; V8QI.
+(define_insn "@aarch64_<fpm_uns_name><V8HFBF:mode><V16QI_ONLY:mode>"
+ [(set (match_operand:V8HFBF 0 "register_operand" "=w")
+ (unspec:V8HFBF
+ [(match_operand:V16QI_ONLY 1 "register_operand" "w")
+ (reg:DI FPM_REGNUM)]
+ FPM_UNARY_LOW_UNS))]
+ "TARGET_FP8"
+ {
+ operands[1] = force_lowpart_subreg (V8QImode,
+ operands[1],
+ recog_data.operand[1]->mode);
I don’t think this is needed? This code is only executed in the final assembly
output stage and you already explicitly print operand 1 with a “.8b” suffix so
changing the mode here doesn’t matter.
+ return "<fpm_uns_op>\t%0.<V8HFBF:Vtype>, %1.8b";
+ }
+)
+;; fpm ternary instructions.
+(define_insn
+
"@aarch64_<fpm_uns_name><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>"
+ [(set (match_operand:V16QI_ONLY 0 "register_operand" "=w")
+ (unspec:V16QI_ONLY
+ [(match_operand:V8QI_ONLY 1 "register_operand" "w")
+ (match_operand:V4SF_ONLY 2 "register_operand" "w")
+ (match_operand:V4SF_ONLY 3 "register_operand" "w")
+ (reg:DI FPM_REGNUM)]
+ FPM_TERNARY_VCVT_UNS))]
+ "TARGET_FP8"
+ {
+ operands[1] = force_reg (V16QImode, operands[1]);
+ return "<fpm_uns_op>\t%1.16b, %2.<V4SF_ONLY:Vtype>, %3.<V4SF_ONLY:Vtype>";
+ }
+)
Same here. But more worryingly the destination operand 0 is not being printed
out anywhere here. Was there supposed to be a tie of one of the input operands
to operand 0 in this pattern?
I haven’t looked deeply into what exactly these instructions do, but please
double check the operands here.
Thanks,
Kyrill