Soumya AR <[email protected]> writes:
> Changes since v1:
>
> This revision makes use of the extended definition of aarch64_ptrue_reg to
> generate predicate registers with the appropriate set bits.
>
> Earlier, there was a suggestion to add support for half floats as well. I
> extended the patch to include HFs but GCC still emits a libcall for ldexpf16.
> For example, in the following case, the call does not lower to fscale:
>
> _Float16 test_ldexpf16 (_Float16 x, int i) {
> return __builtin_ldexpf16 (x, i);
> }
>
> Any suggestions as to why this may be?
You'd need to change:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 2d455938271..469835b1d62 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST,
vec_fmaddsub, ternary)
DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
/* FP scales. */
-DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
+DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary)
/* Ternary math functions. */
DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
A couple of comments below, but otherwise it looks good:
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 0bc98315bb6..7f708ea14f9 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -449,6 +449,9 @@
> ;; All fully-packed SVE floating-point vector modes.
> (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF])
>
> +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit floats.
> +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF])
The comment is out of date. How about:
;; Fully-packed SVE floating-point vector modes and their scalar equivalents.
(define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF])
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> new file mode 100644
> index 00000000000..251b4ef9188
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +
> +float
> +test_ldexpf (float x, int i)
> +{
> + return __builtin_ldexpf (x, i);
> +}
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m,
> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +
> +double
> +test_ldexp (double x, int i)
> +{
> + return __builtin_ldexp (x, i);
> +}
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m,
> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
It would be good to check the ptrues as well, to make sure that we only
enable one lane.
Thanks,
Richard