> Am 07.10.2023 um 11:23 schrieb Richard Sandiford <richard.sandif...@arm.com>:
>
> Richard Biener <rguent...@suse.de> writes:
>> On Thu, 5 Oct 2023, Tamar Christina wrote:
>>
>>>> I suppose the idea is that -abs(x) might be easier to optimize with other
>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
>>>>
>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
>>>> canonical than copysign.
>>>>
>>>>> Should I try removing this?
>>>>
>>>> I'd say yes (and put the reverse canonicalization next to this pattern).
>>>>
>>>
>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>>> canonical and allows a target to expand this sequence efficiently. Such
>>> sequences are common in scientific code working with gradients.
>>>
>>> various optimizations in match.pd only happened on COPYSIGN but not
>>> COPYSIGN_ALL
>>> which means they exclude IFN_COPYSIGN. COPYSIGN however is restricted to
>>> only
>>
>> That's not true:
>>
>> (define_operator_list COPYSIGN
>> BUILT_IN_COPYSIGNF
>> BUILT_IN_COPYSIGN
>> BUILT_IN_COPYSIGNL
>> IFN_COPYSIGN)
>>
>> but they miss the extended float builtin variants like
>> __builtin_copysignf16. Also see below
>>
>>> the C99 builtins and so doesn't work for vectors.
>>>
>>> The patch expands these optimizations to work on COPYSIGN_ALL.
>>>
>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
>>> which I remove since this is a less efficient form. The testsuite is also
>>> updated in light of this.
>>>
>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>>
>>> Ok for master?
>>>
>>> Thanks,
>>> Tamar
>>>
>>> gcc/ChangeLog:
>>>
>>> PR tree-optimization/109154
>>> * match.pd: Add new neg+abs rule, remove inverse copysign rule and
>>> expand existing copysign optimizations.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR tree-optimization/109154
>>> * gcc.dg/fold-copysign-1.c: Updated.
>>> * gcc.dg/pr55152-2.c: Updated.
>>> * gcc.dg/tree-ssa/abs-4.c: Updated.
>>> * gcc.dg/tree-ssa/backprop-6.c: Updated.
>>> * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>>> * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>>> * gcc.target/aarch64/fneg-abs_1.c: New test.
>>> * gcc.target/aarch64/fneg-abs_2.c: New test.
>>> * gcc.target/aarch64/fneg-abs_3.c: New test.
>>> * gcc.target/aarch64/fneg-abs_4.c: New test.
>>> * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>>> * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>>> * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>>> * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>>>
>>> --- inline copy of patch ---
>>>
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index
>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>>> 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>>
>>> /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */
>>> (for coss (COS COSH)
>>> - copysigns (COPYSIGN)
>>> - (simplify
>>> - (coss (copysigns @0 @1))
>>> - (coss @0)))
>>> + (for copysigns (COPYSIGN_ALL)
>>
>> So this ends up generating for example the match
>> (cosf (copysignl ...)) which doesn't make much sense.
>>
>> The lock-step iteration did
>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
>> which is leaner but misses the case of
>> (cosf (ifn_copysign ..)) - that's probably what you are
>> after with this change.
>>
>> That said, there isn't a nice solution (without altering the match.pd
>> IL). There's the explicit solution, spelling out all combinations.
>>
>> So if we want to go with yout pragmatic solution changing this
>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
>> for iteration to a cross product for iteration is.
>>
>> Changing just this pattern to
>>
>> (for coss (COS COSH)
>> (for copysigns (COPYSIGN)
>> (simplify
>> (coss (copysigns @0 @1))
>> (coss @0))))
>>
>> increases the total number of gimple-match-x.cc lines from
>> 234988 to 235324.
>
> I guess the difference between this and the later suggestions is that
> this one allows builtin copysign to be paired with ifn cos, which would
> be potentially useful in other situations. (It isn't here because
> ifn_cos is rarely provided.) How much of the growth is due to that,
> and much of it is from nonsensical combinations like
> (builtin_cosf (builtin_copysignl ...))?
>
> If it's mostly from nonsensical combinations then would it be possible
> to make genmatch drop them?
>
>> The alternative is to do
>>
>> (for coss (COS COSH)
>> copysigns (COPYSIGN)
>> (simplify
>> (coss (copysigns @0 @1))
>> (coss @0))
>> (simplify
>> (coss (IFN_COPYSIGN @0 @1))
>> (coss @0)))
>>
>> which properly will diagnose a duplicate pattern. Ther are
>> currently no operator lists with just builtins defined (that
>> could be fixed, see gencfn-macros.cc), supposed we'd have
>> COS_C we could do
>>
>> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
>> copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN
>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN
>> IFN_COPYSIGN)
>> (simplify
>> (coss (copysigns @0 @1))
>> (coss @0)))
>>
>> which of course still looks ugly ;) (some syntax extension like
>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
>> enough to do)
>>
>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
>> re-do it to only split the fors, keeping COPYSIGN and provide
>> some statistics on the gimple-match-* size? I think this might
>> be the pragmatic solution for now.
>>
>> Richard - can you think of a clever way to express the desired
>> iteration? How do RTL macro iterations address cases like this?
>
> I don't think .md files have an equivalent construct, unfortunately.
> (I also regret some of the choices I made for .md iterators, but that's
> another story.)
>
> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
> with the "..." meaning "fill to match the longest operator list
> in the loop".
Hm, I’ll think about this. It would be useful to have a function like
Internal_fn ifn_for (combined_fn);
So we can indirectly match all builtins with a switch on the ifn code.
Richard
> Thanks,
> Richard
>
>> Richard.
>>
>>> + (simplify
>>> + (coss (copysigns @0 @1))
>>> + (coss @0))))
>>>
>>> /* pow(copysign(x, y), z) -> pow(x, z) if z is an even integer. */
>>> (for pows (POW)
>>> - copysigns (COPYSIGN)
>>> - (simplify
>>> - (pows (copysigns @0 @2) REAL_CST@1)
>>> - (with { HOST_WIDE_INT n; }
>>> - (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>>> - (pows @0 @1)))))
>>> + (for copysigns (COPYSIGN_ALL)
>>> + (simplify
>>> + (pows (copysigns @0 @2) REAL_CST@1)
>>> + (with { HOST_WIDE_INT n; }
>>> + (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>>> + (pows @0 @1))))))
>>> /* Likewise for powi. */
>>> (for pows (POWI)
>>> - copysigns (COPYSIGN)
>>> - (simplify
>>> - (pows (copysigns @0 @2) INTEGER_CST@1)
>>> - (if ((wi::to_wide (@1) & 1) == 0)
>>> - (pows @0 @1))))
>>> + (for copysigns (COPYSIGN_ALL)
>>> + (simplify
>>> + (pows (copysigns @0 @2) INTEGER_CST@1)
>>> + (if ((wi::to_wide (@1) & 1) == 0)
>>> + (pows @0 @1)))))
>>>
>>> (for hypots (HYPOT)
>>> - copysigns (COPYSIGN)
>>> - /* hypot(copysign(x, y), z) -> hypot(x, z). */
>>> - (simplify
>>> - (hypots (copysigns @0 @1) @2)
>>> - (hypots @0 @2))
>>> - /* hypot(x, copysign(y, z)) -> hypot(x, y). */
>>> - (simplify
>>> - (hypots @0 (copysigns @1 @2))
>>> - (hypots @0 @1)))
>>> + (for copysigns (COPYSIGN)
>>> + /* hypot(copysign(x, y), z) -> hypot(x, z). */
>>> + (simplify
>>> + (hypots (copysigns @0 @1) @2)
>>> + (hypots @0 @2))
>>> + /* hypot(x, copysign(y, z)) -> hypot(x, y). */
>>> + (simplify
>>> + (hypots @0 (copysigns @1 @2))
>>> + (hypots @0 @1))))
>>>
>>> -/* copysign(x, CST) -> [-]abs (x). */
>>> -(for copysigns (COPYSIGN_ALL)
>>> - (simplify
>>> - (copysigns @0 REAL_CST@1)
>>> - (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
>>> - (negate (abs @0))
>>> - (abs @0))))
>>> +/* Transform fneg (fabs (X)) -> copysign (X, -1). */
>>> +
>>> +(simplify
>>> + (negate (abs @0))
>>> + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
>>>
>>> /* copysign(copysign(x, y), z) -> copysign(x, z). */
>>> (for copysigns (COPYSIGN_ALL)
>>> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c
>>> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
>>> index
>>> f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6
>>> 100644
>>> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
>>> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
>>> @@ -12,5 +12,5 @@ double bar (double x)
>>> return __builtin_copysign (x, minuszero);
>>> }
>>>
>>> -/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */
>>> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */
>>> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
>>> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
>>> diff --git a/gcc/testsuite/gcc.dg/pr55152-2.c
>>> b/gcc/testsuite/gcc.dg/pr55152-2.c
>>> index
>>> 54db0f2062da105a829d6690ac8ed9891fe2b588..605f202ed6bc7aa8fe921457b02ff0b88cc63ce6
>>> 100644
>>> --- a/gcc/testsuite/gcc.dg/pr55152-2.c
>>> +++ b/gcc/testsuite/gcc.dg/pr55152-2.c
>>> @@ -10,4 +10,5 @@ int f(int a)
>>> return (a<-a)?a:-a;
>>> }
>>>
>>> -/* { dg-final { scan-tree-dump-times "ABS_EXPR" 2 "optimized" } } */
>>> +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 1 "optimized" } } */
>>> +/* { dg-final { scan-tree-dump-times "ABS_EXPR" 1 "optimized" } } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>>> index
>>> 6197519faf7b55aed7bc162cd0a14dd2145210ca..e1b825f37f69ac3c4666b3a52d733368805ad31d
>>> 100644
>>> --- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>>> @@ -9,5 +9,6 @@ long double abs_ld(long double x) { return
>>> __builtin_signbit(x) ? x : -x; }
>>>
>>> /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP<x> */
>>> /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */
>>> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized"} } */
>>> -/* { dg-final { scan-tree-dump-times "= -" 3 "optimized"} } */
>>> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized"} } */
>>> +/* { dg-final { scan-tree-dump-times "= -" 1 "optimized"} } */
>>> +/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized"} } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>>> index
>>> 31f05716f1498dc709cac95fa20fb5796642c77e..c3a138642d6ff7be984e91fa1343cb2718db7ae1
>>> 100644
>>> --- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>>> @@ -26,5 +26,6 @@ TEST_FUNCTION (float, f)
>>> TEST_FUNCTION (double, )
>>> TEST_FUNCTION (long double, l)
>>>
>>> -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } }
>>> */
>>> -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3
>>> "backprop" } } */
>>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" } }
>>> */
>>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2
>>> "backprop" } } */
>>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1
>>> "backprop" } } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>>> index
>>> de52c5f7c8062958353d91f5031193defc9f3f91..e5d565c4b9832c00106588ef411fbd8c292a5cad
>>> 100644
>>> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>>> @@ -10,4 +10,5 @@ float f1(float x)
>>> float t = __builtin_copysignf (1.0f, -x);
>>> return x * t;
>>> }
>>> -/* { dg-final { scan-tree-dump-times "ABS" 2 "optimized"} } */
>>> +/* { dg-final { scan-tree-dump-times "ABS" 1 "optimized"} } */
>>> +/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized"} } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>>> index
>>> a41f1baf25669a4fd301a586a49ba5e3c5b966ab..a22896b21c8b5a4d5d8e28bd8ae0db896e63ade0
>>> 100644
>>> --- a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>>> @@ -34,4 +34,5 @@ float i1(float x)
>>> {
>>> return x * (x <= 0.f ? 1.f : -1.f);
>>> }
>>> -/* { dg-final { scan-tree-dump-times "ABS" 8 "gimple"} } */
>>> +/* { dg-final { scan-tree-dump-times "ABS" 4 "gimple"} } */
>>> +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 4 "gimple"} } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..f823013c3ddf6b3a266c3abfcbf2642fc2a75fa6
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>>> @@ -0,0 +1,39 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#pragma GCC target "+nosve"
>>> +
>>> +#include <arm_neon.h>
>>> +
>>> +/*
>>> +** t1:
>>> +** orr v[0-9]+.2s, #128, lsl #24
>>> +** ret
>>> +*/
>>> +float32x2_t t1 (float32x2_t a)
>>> +{
>>> + return vneg_f32 (vabs_f32 (a));
>>> +}
>>> +
>>> +/*
>>> +** t2:
>>> +** orr v[0-9]+.4s, #128, lsl #24
>>> +** ret
>>> +*/
>>> +float32x4_t t2 (float32x4_t a)
>>> +{
>>> + return vnegq_f32 (vabsq_f32 (a));
>>> +}
>>> +
>>> +/*
>>> +** t3:
>>> +** adrp x0, .LC[0-9]+
>>> +** ldr q[0-9]+, \[x0, #:lo12:.LC0\]
>>> +** orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>>> +** ret
>>> +*/
>>> +float64x2_t t3 (float64x2_t a)
>>> +{
>>> + return vnegq_f64 (vabsq_f64 (a));
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..141121176b309e4b2aa413dc55271a6e3c93d5e1
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>>> @@ -0,0 +1,31 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#pragma GCC target "+nosve"
>>> +
>>> +#include <arm_neon.h>
>>> +#include <math.h>
>>> +
>>> +/*
>>> +** f1:
>>> +** movi v[0-9]+.2s, 0x80, lsl 24
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +float32_t f1 (float32_t a)
>>> +{
>>> + return -fabsf (a);
>>> +}
>>> +
>>> +/*
>>> +** f2:
>>> +** mov x0, -9223372036854775808
>>> +** fmov d[0-9]+, x0
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +float64_t f2 (float64_t a)
>>> +{
>>> + return -fabs (a);
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..b4652173a95d104ddfa70c497f0627a61ea89d3b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>>> @@ -0,0 +1,36 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#pragma GCC target "+nosve"
>>> +
>>> +#include <arm_neon.h>
>>> +#include <math.h>
>>> +
>>> +/*
>>> +** f1:
>>> +** ...
>>> +** ldr q[0-9]+, \[x0\]
>>> +** orr v[0-9]+.4s, #128, lsl #24
>>> +** str q[0-9]+, \[x0\], 16
>>> +** ...
>>> +*/
>>> +void f1 (float32_t *a, int n)
>>> +{
>>> + for (int i = 0; i < (n & -8); i++)
>>> + a[i] = -fabsf (a[i]);
>>> +}
>>> +
>>> +/*
>>> +** f2:
>>> +** ...
>>> +** ldr q[0-9]+, \[x0\]
>>> +** orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>>> +** str q[0-9]+, \[x0\], 16
>>> +** ...
>>> +*/
>>> +void f2 (float64_t *a, int n)
>>> +{
>>> + for (int i = 0; i < (n & -8); i++)
>>> + a[i] = -fabs (a[i]);
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..10879dea74462d34b26160eeb0bd54ead063166b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>>> @@ -0,0 +1,39 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#pragma GCC target "+nosve"
>>> +
>>> +#include <string.h>
>>> +
>>> +/*
>>> +** negabs:
>>> +** mov x0, -9223372036854775808
>>> +** fmov d[0-9]+, x0
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +double negabs (double x)
>>> +{
>>> + unsigned long long y;
>>> + memcpy (&y, &x, sizeof(double));
>>> + y = y | (1UL << 63);
>>> + memcpy (&x, &y, sizeof(double));
>>> + return x;
>>> +}
>>> +
>>> +/*
>>> +** negabsf:
>>> +** movi v[0-9]+.2s, 0x80, lsl 24
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +float negabsf (float x)
>>> +{
>>> + unsigned int y;
>>> + memcpy (&y, &x, sizeof(float));
>>> + y = y | (1U << 31);
>>> + memcpy (&x, &y, sizeof(float));
>>> + return x;
>>> +}
>>> +
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..0c7664e6de77a497682952653ffd417453854d52
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>>> @@ -0,0 +1,37 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#include <arm_neon.h>
>>> +
>>> +/*
>>> +** t1:
>>> +** orr v[0-9]+.2s, #128, lsl #24
>>> +** ret
>>> +*/
>>> +float32x2_t t1 (float32x2_t a)
>>> +{
>>> + return vneg_f32 (vabs_f32 (a));
>>> +}
>>> +
>>> +/*
>>> +** t2:
>>> +** orr v[0-9]+.4s, #128, lsl #24
>>> +** ret
>>> +*/
>>> +float32x4_t t2 (float32x4_t a)
>>> +{
>>> + return vnegq_f32 (vabsq_f32 (a));
>>> +}
>>> +
>>> +/*
>>> +** t3:
>>> +** adrp x0, .LC[0-9]+
>>> +** ldr q[0-9]+, \[x0, #:lo12:.LC0\]
>>> +** orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>>> +** ret
>>> +*/
>>> +float64x2_t t3 (float64x2_t a)
>>> +{
>>> + return vnegq_f64 (vabsq_f64 (a));
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..a60cd31b9294af2dac69eed1c93f899bd5c78fca
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>>> @@ -0,0 +1,29 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#include <arm_neon.h>
>>> +#include <math.h>
>>> +
>>> +/*
>>> +** f1:
>>> +** movi v[0-9]+.2s, 0x80, lsl 24
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +float32_t f1 (float32_t a)
>>> +{
>>> + return -fabsf (a);
>>> +}
>>> +
>>> +/*
>>> +** f2:
>>> +** mov x0, -9223372036854775808
>>> +** fmov d[0-9]+, x0
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +float64_t f2 (float64_t a)
>>> +{
>>> + return -fabs (a);
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..1bf34328d8841de8e6b0a5458562a9f00e31c275
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>>> @@ -0,0 +1,34 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#include <arm_neon.h>
>>> +#include <math.h>
>>> +
>>> +/*
>>> +** f1:
>>> +** ...
>>> +** ld1w z[0-9]+.s, p[0-9]+/z, \[x0, x2, lsl 2\]
>>> +** orr z[0-9]+.s, z[0-9]+.s, #0x80000000
>>> +** st1w z[0-9]+.s, p[0-9]+, \[x0, x2, lsl 2\]
>>> +** ...
>>> +*/
>>> +void f1 (float32_t *a, int n)
>>> +{
>>> + for (int i = 0; i < (n & -8); i++)
>>> + a[i] = -fabsf (a[i]);
>>> +}
>>> +
>>> +/*
>>> +** f2:
>>> +** ...
>>> +** ld1d z[0-9]+.d, p[0-9]+/z, \[x0, x2, lsl 3\]
>>> +** orr z[0-9]+.d, z[0-9]+.d, #0x8000000000000000
>>> +** st1d z[0-9]+.d, p[0-9]+, \[x0, x2, lsl 3\]
>>> +** ...
>>> +*/
>>> +void f2 (float64_t *a, int n)
>>> +{
>>> + for (int i = 0; i < (n & -8); i++)
>>> + a[i] = -fabs (a[i]);
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>>> new file mode 100644
>>> index
>>> 0000000000000000000000000000000000000000..21f2a8da2a5d44e3d01f6604ca7be87e3744d494
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>>> @@ -0,0 +1,37 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3" } */
>>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>>> +
>>> +#include <string.h>
>>> +
>>> +/*
>>> +** negabs:
>>> +** mov x0, -9223372036854775808
>>> +** fmov d[0-9]+, x0
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +double negabs (double x)
>>> +{
>>> + unsigned long long y;
>>> + memcpy (&y, &x, sizeof(double));
>>> + y = y | (1UL << 63);
>>> + memcpy (&x, &y, sizeof(double));
>>> + return x;
>>> +}
>>> +
>>> +/*
>>> +** negabsf:
>>> +** movi v[0-9]+.2s, 0x80, lsl 24
>>> +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>>> +** ret
>>> +*/
>>> +float negabsf (float x)
>>> +{
>>> + unsigned int y;
>>> + memcpy (&y, &x, sizeof(float));
>>> + y = y | (1U << 31);
>>> + memcpy (&x, &y, sizeof(float));
>>> + return x;
>>> +}
>>> +
>>>