Hi Soumya > On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote: > > External email: Use caution opening links or attachments > > > This patch uses the FSCALE instruction provided by SVE to implement the > standard ldexp family of functions. > > Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the > following code: > > float > test_ldexpf (float x, int i) > { > return __builtin_ldexpf (x, i); > } > > double > test_ldexp (double x, int i) > { > return __builtin_ldexp(x, i); > } > > GCC Output: > > test_ldexpf: > b ldexpf > > test_ldexp: > b ldexp > > Since SVE has support for an FSCALE instruction, we can use this to process > scalar floats by moving them to a vector register and performing an fscale > call, > similar to how LLVM tackles an ldexp builtin as well. > > New Output: > > test_ldexpf: > fmov s31, w0 > ptrue p7.b, all > fscale z0.s, p7/m, z0.s, z31.s > ret > > test_ldexp: > sxtw x0, w0 > ptrue p7.b, all > fmov d31, x0 > fscale z0.d, p7/m, z0.d, z31.d > ret > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Soumya AR <soum...@nvidia.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve.md > (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar > floating modes and expand to the existing pattern for FSCALE. > (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE > operands as well as scalar floating modes. > > * config/aarch64/iterators.md: > SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well > as SF and DF. > VPRED: Extended the attribute to handle GPF modes. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/fscale.c: New test.
This patch fixes the bugzilla report at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733 So it should be referenced in the ChangeLog entries like so: PR target/111733 * config/aarch64/aarch64-sve.md <rest of ChangeLog> That way the commit hooks will pick it up and updated the bug tracker accordingly > > <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch> +(define_expand "ldexp<mode>3" + [(set (match_operand:GPF 0 "register_operand" "=w") + (unspec:GPF + [(match_operand:GPF 1 "register_operand" "w") + (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")] + UNSPEC_COND_FSCALE))] + "TARGET_SVE" + { + rtx ptrue = aarch64_ptrue_reg (<VPRED>mode); + rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode); + emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue, operands[1], operands[2], strictness)); + DONE; + } Lines should not exceed 80 columns, this should be wrapped around The patch looks good to me otherwise. Thanks, Kyrill