Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Kyrylo Tkachov Mon, 30 Sep 2024 23:55:42 -0700

Hi Soumya

> On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch uses the FSCALE instruction provided by SVE to implement the
> standard ldexp family of functions.
> 
> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
> following code:
> 
> float
> test_ldexpf (float x, int i)
> {
>        return __builtin_ldexpf (x, i);
> }
> 
> double
> test_ldexp (double x, int i)
> {
>        return __builtin_ldexp(x, i);
> }
> 
> GCC Output:
> 
> test_ldexpf:
>        b ldexpf
> 
> test_ldexp:
>        b ldexp
> 
> Since SVE has support for an FSCALE instruction, we can use this to process
> scalar floats by moving them to a vector register and performing an fscale 
> call,
> similar to how LLVM tackles an ldexp builtin as well.
> 
> New Output:
> 
> test_ldexpf:
>        fmov s31, w0
>        ptrue p7.b, all
>        fscale z0.s, p7/m, z0.s, z31.s
>        ret
> 
> test_ldexp:
>        sxtw x0, w0
>        ptrue p7.b, all
>        fmov d31, x0
>        fscale z0.d, p7/m, z0.d, z31.d
>        ret
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Soumya AR <soum...@nvidia.com>
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve.md
> (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
> floating modes and expand to the existing pattern for FSCALE.
> (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
> operands as well as scalar floating modes.
> 
> * config/aarch64/iterators.md:
> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
> as SF and DF.
> VPRED: Extended the attribute to handle GPF modes.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/sve/fscale.c: New test.


This patch fixes the bugzilla report at 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733
So it should be referenced in the ChangeLog entries like so:

        PR target/111733
        * config/aarch64/aarch64-sve.md <rest of ChangeLog>

That way the commit hooks will pick it up and updated the bug tracker 
accordingly

> 
> <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch>

+(define_expand "ldexp<mode>3"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+       (unspec:GPF
+         [(match_operand:GPF 1 "register_operand" "w")
+          (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
+         UNSPEC_COND_FSCALE))]
+  "TARGET_SVE"
+  {
+    rtx ptrue = aarch64_ptrue_reg (<VPRED>mode);
+    rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
+    emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue, operands[1], 
operands[2], strictness));
+    DONE;
+  }

Lines should not exceed 80 columns, this should be wrapped around

The patch looks good to me otherwise.
Thanks,
Kyrill

Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Reply via email to