> On 1 Oct 2024, at 1:18 PM, Tamar Christina <tamar.christ...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Hi Soumya, > > Nice patch! > >> -----Original Message----- >> From: Kyrylo Tkachov <ktkac...@nvidia.com> >> Sent: Tuesday, October 1, 2024 7:55 AM >> To: Soumya AR <soum...@nvidia.com> >> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com> >> Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE >> instruction >> >> Hi Soumya >> >>> On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> This patch uses the FSCALE instruction provided by SVE to implement the >>> standard ldexp family of functions. >>> >>> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the >>> following code: >>> >>> float >>> test_ldexpf (float x, int i) >>> { >>> return __builtin_ldexpf (x, i); >>> } >>> >>> double >>> test_ldexp (double x, int i) >>> { >>> return __builtin_ldexp(x, i); >>> } >>> >>> GCC Output: >>> >>> test_ldexpf: >>> b ldexpf >>> >>> test_ldexp: >>> b ldexp >>> >>> Since SVE has support for an FSCALE instruction, we can use this to process >>> scalar floats by moving them to a vector register and performing an fscale >>> call, >>> similar to how LLVM tackles an ldexp builtin as well. >>> >>> New Output: >>> >>> test_ldexpf: >>> fmov s31, w0 >>> ptrue p7.b, all >>> fscale z0.s, p7/m, z0.s, z31.s >>> ret >>> >>> test_ldexp: >>> sxtw x0, w0 >>> ptrue p7.b, all >>> fmov d31, x0 >>> fscale z0.d, p7/m, z0.d, z31.d >>> ret >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>> regression. >>> OK for mainline? >>> >>> Signed-off-by: Soumya AR <soum...@nvidia.com> >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64-sve.md >>> (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar >>> floating modes and expand to the existing pattern for FSCALE. >>> (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE >>> operands as well as scalar floating modes. >>> >>> * config/aarch64/iterators.md: >>> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well >>> as SF and DF. >>> VPRED: Extended the attribute to handle GPF modes. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/sve/fscale.c: New test. >> >> This patch fixes the bugzilla report at >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733 >> So it should be referenced in the ChangeLog entries like so: >> >> PR target/111733 >> * config/aarch64/aarch64-sve.md <rest of ChangeLog> >> >> That way the commit hooks will pick it up and updated the bug tracker >> accordingly >> >>> >>> <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch> >> >> +(define_expand "ldexp<mode>3" >> + [(set (match_operand:GPF 0 "register_operand" "=w") >> + (unspec:GPF >> + [(match_operand:GPF 1 "register_operand" "w") >> + (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")] >> + UNSPEC_COND_FSCALE))] >> + "TARGET_SVE" >> + { >> + rtx ptrue = aarch64_ptrue_reg (<VPRED>mode); >> + rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode); >> + emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue, >> operands[1], operands[2], strictness)); >> + DONE; >> + } >> >> Lines should not exceed 80 columns, this should be wrapped around > > And Nit: perhaps slightly more idiomatic to the other patterns in SVE is this: > > (define_expand "ldexp<mode>3" > [(set (match_operand:GPF 0 "register_operand") > (unspec:GPF > [(match_dup 3) > (const_int SVE_RELAXED_GP) > (match_operand:GPF 1 "register_operand") > (match_operand:<V_INT_EQUIV> 2 "register_operand")] > UNSPEC_COND_FSCALE))] > "TARGET_SVE" > { > operands[3] = aarch64_ptrue_reg (<VPRED>mode); > } > ) > > It removes the dependency on the exact name of the pattern. > Also note the dropping of the constraints, expand patterns don't use > the constraints, only the predicates are checked. > > Cheers, > Tamar
Thanks for this suggestion! This makes a lot of sense. Edited the patch with this change. Also referenced the PR as suggested by Kyrill earlier. Thanks, Soumya >> >> The patch looks good to me otherwise. >> Thanks, >> Kyrill
0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch