On Wed, Mar 4, 2026 at 12:12 AM
<[email protected]> wrote:
>
> From: Abhishek Kaushik <[email protected]>
>
> The FMA folds in match.pd currently only matches (negate @0) directly.
> When the negated operand is wrapped in a type conversion
> (e.g. (convert (negate @0))), the simplification to IFN_FNMA does not
> trigger.
>
> This prevents folding of patterns such as:
>
> *c = *c - (v8u)(*a * *b);
>
> when the multiply operands undergo vector type conversions before being
> passed to FMA. In such cases the expression lowers to neg + mla instead
> of the more optimal msb on AArch64 SVE, because the canonicalization
> step cannot see through the casts.
>
> Extend the match pattern to allow optional conversions on the negated
> operand and the second multiplicand:
>
> (fmas:c (convert? (negate @0)) (convert? @1) @2)
>
> and explicitly rebuild the converted operands in the IFN_FNMA
> replacement. This enables recognition of the subtraction-of-product form
> even when vector element type casts are present.
>
> With this change, AArch64 SVE code generation is able to select msb
> instead of emitting a separate neg followed by mla.
>
> This patch was bootstrapped and regression tested on aarch64-linux-gnu.
>
> gcc/
> PR target/123897
> * match.pd: Allow optional conversions in FMA-to-FNMA
> canonicalization and reconstruct converted operands in
> the replacement.
>
> gcc/testsuite/
> PR target/123897
> * gcc.target/aarch64/sve/fnma_match.c: New test.
> * gcc.target/aarch64/sve/pr123897.c:
> Update the test to scan for FNMA in the tree dump.
> ---
> gcc/match.pd | 4 +--
> .../gcc.target/aarch64/sve/fnma_match.c | 28 +++++++++++++++++++
> .../gcc.target/aarch64/sve/pr123897.c | 3 +-
> 3 files changed, 32 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f16fd4e081..4cce9463f8f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -10255,8 +10255,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (if (canonicalize_math_after_vectorization_p ())
> (for fmas (FMA)
> (simplify
> - (fmas:c (negate @0) @1 @2)
> - (IFN_FNMA @0 @1 @2))
> + (fmas:c (convert? (negate @0)) (convert? @1) @2)
> + (IFN_FNMA (convert @0) (convert @1) @2))
I think you need to check the types are nop conversions rather than
just convert.
So using nop_convert here would be better instead of adding the
tree_nop_conversion_p check.
Can you check if using nop_convert would work?
Thanks,
Andrew
> (simplify
> (fmas @0 @1 (negate @2))
> (IFN_FMS @0 @1 @2))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
> b/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
> new file mode 100644
> index 00000000000..08607b172e2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv9-a -msve-vector-bits=256" } */
> +
> +typedef __attribute__((__vector_size__(sizeof(int)*8))) signed int v8i;
> +typedef __attribute__((__vector_size__(sizeof(int)*8))) unsigned int v8u;
> +
> +void g(v8i *a,v8i *b,v8u *c)
> +{
> + *c = *c - (v8u)(*a * *b);
> +}
> +
> +void h(v8u *a,v8u *b,v8i *c)
> +{
> + *c = *c - (v8i)(*a * *b);
> +}
> +
> +void x(v8i *a,v8i *b,v8i *c)
> +{
> + *c = *c - (*a * *b);
> +}
> +
> +void y(v8u *a,v8u *b,v8u *c)
> +{
> + *c = *c - (*a * *b);
> +}
> +
> +/* { dg-final { scan-assembler-times "\\tmsb\\t" 4 } } */
> +/* { dg-final { scan-assembler-not "\\tneg\\t" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> b/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> index d74efabb7f8..45bc52522a9 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
> @@ -13,4 +13,5 @@ void g(v8i *a,v8i *b,v8u *c)
> *c = *c - (v8u)(*a * *b);
> }
>
> -/* { dg-final { scan-tree-dump-times "\.FMA" 2 "widening_mul" } } */
> +/* { dg-final { scan-tree-dump-times "\.FMA" 1 "widening_mul" } } */
> +/* { dg-final { scan-tree-dump-times "\.FNMA" 1 "widening_mul" } } */
> --
> 2.43.0
>