Hi Richard, > On 7 Nov 2024, at 3:19 PM, Richard Sandiford <[email protected]> > wrote: > > External email: Use caution opening links or attachments > > > Soumya AR <[email protected]> writes: >> Changes since v1: >> >> This revision makes use of the extended definition of aarch64_ptrue_reg to >> generate predicate registers with the appropriate set bits. >> >> Earlier, there was a suggestion to add support for half floats as well. I >> extended the patch to include HFs but GCC still emits a libcall for ldexpf16. >> For example, in the following case, the call does not lower to fscale: >> >> _Float16 test_ldexpf16 (_Float16 x, int i) { >> return __builtin_ldexpf16 (x, i); >> } >> >> Any suggestions as to why this may be? > > You'd need to change: > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 2d455938271..469835b1d62 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, > vec_fmaddsub, ternary) > DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > /* FP scales. */ > -DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > +DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary) > > /* Ternary math functions. */ > DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
Thanks for this! It works for FP16 now.
>
> A couple of comments below, but otherwise it looks good:
>
>> diff --git a/gcc/config/aarch64/iterators.md
>> b/gcc/config/aarch64/iterators.md
>> index 0bc98315bb6..7f708ea14f9 100644
>> --- a/gcc/config/aarch64/iterators.md
>> +++ b/gcc/config/aarch64/iterators.md
>> @@ -449,6 +449,9 @@
>> ;; All fully-packed SVE floating-point vector modes.
>> (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF])
>>
>> +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit
>> floats.
>> +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF])
>
> The comment is out of date. How about:
>
> ;; Fully-packed SVE floating-point vector modes and their scalar equivalents.
> (define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF])
Edited.
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> new file mode 100644
>> index 00000000000..251b4ef9188
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +float
>> +test_ldexpf (float x, int i)
>> +{
>> + return __builtin_ldexpf (x, i);
>> +}
>> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m,
>> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +
>> +double
>> +test_ldexp (double x, int i)
>> +{
>> + return __builtin_ldexp (x, i);
>> +}
>> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m,
>> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
>
> It would be good to check the ptrues as well, to make sure that we only
> enable one lane.
>
Makes sense, I’ve changed the test case to use check-function-bodies instead so
we can check for ptrues as well.
Best,
Soumya
> Thanks,
> Richard
0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
