Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Soumya AR Thu, 03 Oct 2024 00:59:02 -0700


> On 1 Oct 2024, at 1:18 PM, Tamar Christina <tamar.christ...@arm.com> wrote:
>
> External email: Use caution opening links or attachments
>
>
> Hi Soumya,
>
> Nice patch!
>
>> -----Original Message-----
>> From: Kyrylo Tkachov <ktkac...@nvidia.com>
>> Sent: Tuesday, October 1, 2024 7:55 AM
>> To: Soumya AR <soum...@nvidia.com>
>> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>
>> Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE 
>> instruction
>>
>> Hi Soumya
>>
>>> On 30 Sep 2024, at 18:26, Soumya AR <soum...@nvidia.com> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> This patch uses the FSCALE instruction provided by SVE to implement the
>>> standard ldexp family of functions.
>>>
>>> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
>>> following code:
>>>
>>> float
>>> test_ldexpf (float x, int i)
>>> {
>>>       return __builtin_ldexpf (x, i);
>>> }
>>>
>>> double
>>> test_ldexp (double x, int i)
>>> {
>>>       return __builtin_ldexp(x, i);
>>> }
>>>
>>> GCC Output:
>>>
>>> test_ldexpf:
>>>       b ldexpf
>>>
>>> test_ldexp:
>>>       b ldexp
>>>
>>> Since SVE has support for an FSCALE instruction, we can use this to process
>>> scalar floats by moving them to a vector register and performing an fscale 
>>> call,
>>> similar to how LLVM tackles an ldexp builtin as well.
>>>
>>> New Output:
>>>
>>> test_ldexpf:
>>>       fmov s31, w0
>>>       ptrue p7.b, all
>>>       fscale z0.s, p7/m, z0.s, z31.s
>>>       ret
>>>
>>> test_ldexp:
>>>       sxtw x0, w0
>>>       ptrue p7.b, all
>>>       fmov d31, x0
>>>       fscale z0.d, p7/m, z0.d, z31.d
>>>       ret
>>>
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>>
>>> Signed-off-by: Soumya AR <soum...@nvidia.com>
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aarch64/aarch64-sve.md
>>> (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
>>> floating modes and expand to the existing pattern for FSCALE.
>>> (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
>>> operands as well as scalar floating modes.
>>>
>>> * config/aarch64/iterators.md:
>>> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
>>> as SF and DF.
>>> VPRED: Extended the attribute to handle GPF modes.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/aarch64/sve/fscale.c: New test.
>>
>> This patch fixes the bugzilla report at
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733
>> So it should be referenced in the ChangeLog entries like so:
>>
>>      PR target/111733
>>      * config/aarch64/aarch64-sve.md <rest of ChangeLog>
>>
>> That way the commit hooks will pick it up and updated the bug tracker 
>> accordingly
>>
>>>
>>> <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch>
>>
>> +(define_expand "ldexp<mode>3"
>> +  [(set (match_operand:GPF 0 "register_operand" "=w")
>> +     (unspec:GPF
>> +       [(match_operand:GPF 1 "register_operand" "w")
>> +        (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
>> +       UNSPEC_COND_FSCALE))]
>> +  "TARGET_SVE"
>> +  {
>> +    rtx ptrue = aarch64_ptrue_reg (<VPRED>mode);
>> +    rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
>> +    emit_insn (gen_aarch64_pred_fscale<mode> (operands[0], ptrue,
>> operands[1], operands[2], strictness));
>> +    DONE;
>> +  }
>>
>> Lines should not exceed 80 columns, this should be wrapped around
>
> And Nit: perhaps slightly more idiomatic to the other patterns in SVE is this:
>
> (define_expand "ldexp<mode>3"
>  [(set (match_operand:GPF 0 "register_operand")
>        (unspec:GPF
>          [(match_dup 3)
>           (const_int SVE_RELAXED_GP)
>           (match_operand:GPF 1 "register_operand")
>           (match_operand:<V_INT_EQUIV> 2 "register_operand")]
>          UNSPEC_COND_FSCALE))]
>  "TARGET_SVE"
>  {
>    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
>  }
> )
>
> It removes the dependency on the exact name of the pattern.
> Also note the dropping of the constraints, expand patterns don't use
> the constraints, only the predicates are checked.
>
> Cheers,
> Tamar


Thanks for this suggestion! This makes a lot of sense. Edited the patch with
this change.

Also referenced the PR as suggested by Kyrill earlier.

Thanks,
Soumya

>>
>> The patch looks good to me otherwise.
>> Thanks,
>> Kyrill

0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch

Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Reply via email to