Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

Richard Sandiford Sat, 07 Oct 2023 04:34:29 -0700

Richard Biener <rguent...@suse.de> writes:
>> Am 07.10.2023 um 11:23 schrieb Richard Sandiford 
>> <richard.sandif...@arm.com>>> Richard Biener <rguent...@suse.de> writes:
>>> On Thu, 5 Oct 2023, Tamar Christina wrote:
>>> 
>>>>> I suppose the idea is that -abs(x) might be easier to optimize with other
>>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
>>>>> 
>>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
>>>>> canonical than copysign.
>>>>> 
>>>>>> Should I try removing this?
>>>>> 
>>>>> I'd say yes (and put the reverse canonicalization next to this pattern).
>>>>> 
>>>> 
>>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>>>> canonical and allows a target to expand this sequence efficiently.  Such
>>>> sequences are common in scientific code working with gradients.
>>>> 
>>>> various optimizations in match.pd only happened on COPYSIGN but not 
>>>> COPYSIGN_ALL
>>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to 
>>>> only
>>> 
>>> That's not true:
>>> 
>>> (define_operator_list COPYSIGN
>>>    BUILT_IN_COPYSIGNF
>>>    BUILT_IN_COPYSIGN
>>>    BUILT_IN_COPYSIGNL
>>>    IFN_COPYSIGN)
>>> 
>>> but they miss the extended float builtin variants like
>>> __builtin_copysignf16.  Also see below
>>> 
>>>> the C99 builtins and so doesn't work for vectors.
>>>> 
>>>> The patch expands these optimizations to work on COPYSIGN_ALL.
>>>> 
>>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs 
>>>> (x))
>>>> which I remove since this is a less efficient form.  The testsuite is also
>>>> updated in light of this.
>>>> 
>>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>>> 
>>>> Ok for master?
>>>> 
>>>> Thanks,
>>>> Tamar
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>>    PR tree-optimization/109154
>>>>    * match.pd: Add new neg+abs rule, remove inverse copysign rule and
>>>>    expand existing copysign optimizations.
>>>> 
>>>> gcc/testsuite/ChangeLog:
>>>> 
>>>>    PR tree-optimization/109154
>>>>    * gcc.dg/fold-copysign-1.c: Updated.
>>>>    * gcc.dg/pr55152-2.c: Updated.
>>>>    * gcc.dg/tree-ssa/abs-4.c: Updated.
>>>>    * gcc.dg/tree-ssa/backprop-6.c: Updated.
>>>>    * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>>>>    * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>>>>    * gcc.target/aarch64/fneg-abs_1.c: New test.
>>>>    * gcc.target/aarch64/fneg-abs_2.c: New test.
>>>>    * gcc.target/aarch64/fneg-abs_3.c: New test.
>>>>    * gcc.target/aarch64/fneg-abs_4.c: New test.
>>>>    * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>>>>    * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>>>>    * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>>>>    * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>>>> 
>>>> --- inline copy of patch ---
>>>> 
>>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>>> index 
>>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>>>>  100644
>>>> --- a/gcc/match.pd
>>>> +++ b/gcc/match.pd
>>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>>> 
>>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>>>> (for coss (COS COSH)
>>>> -     copysigns (COPYSIGN)
>>>> - (simplify
>>>> -  (coss (copysigns @0 @1))
>>>> -   (coss @0)))
>>>> + (for copysigns (COPYSIGN_ALL)
>>> 
>>> So this ends up generating for example the match
>>> (cosf (copysignl ...)) which doesn't make much sense.
>>> 
>>> The lock-step iteration did
>>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
>>> which is leaner but misses the case of
>>> (cosf (ifn_copysign ..)) - that's probably what you are
>>> after with this change.
>>> 
>>> That said, there isn't a nice solution (without altering the match.pd
>>> IL).  There's the explicit solution, spelling out all combinations.
>>> 
>>> So if we want to go with yout pragmatic solution changing this
>>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
>>> for iteration to a cross product for iteration is.
>>> 
>>> Changing just this pattern to
>>> 
>>> (for coss (COS COSH)
>>> (for copysigns (COPYSIGN)
>>>  (simplify
>>>   (coss (copysigns @0 @1))
>>>   (coss @0))))
>>> 
>>> increases the total number of gimple-match-x.cc lines from
>>> 234988 to 235324.
>> 
>> I guess the difference between this and the later suggestions is that
>> this one allows builtin copysign to be paired with ifn cos, which would
>> be potentially useful in other situations.  (It isn't here because
>> ifn_cos is rarely provided.)  How much of the growth is due to that,
>> and much of it is from nonsensical combinations like
>> (builtin_cosf (builtin_copysignl ...))?
>> 
>> If it's mostly from nonsensical combinations then would it be possible
>> to make genmatch drop them?
>> 
>>> The alternative is to do
>>> 
>>> (for coss (COS COSH)
>>>     copysigns (COPYSIGN)
>>> (simplify
>>>  (coss (copysigns @0 @1))
>>>   (coss @0))
>>> (simplify
>>>  (coss (IFN_COPYSIGN @0 @1))
>>>   (coss @0)))
>>> 
>>> which properly will diagnose a duplicate pattern.  Ther are
>>> currently no operator lists with just builtins defined (that
>>> could be fixed, see gencfn-macros.cc), supposed we'd have
>>> COS_C we could do
>>> 
>>> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
>>>     copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN 
>>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN 
>>> IFN_COPYSIGN)
>>> (simplify
>>>  (coss (copysigns @0 @1))
>>>   (coss @0)))
>>> 
>>> which of course still looks ugly ;) (some syntax extension like
>>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
>>> enough to do)
>>> 
>>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
>>> re-do it to only split the fors, keeping COPYSIGN and provide
>>> some statistics on the gimple-match-* size?  I think this might
>>> be the pragmatic solution for now.
>>> 
>>> Richard - can you think of a clever way to express the desired
>>> iteration?  How do RTL macro iterations address cases like this?
>> 
>> I don't think .md files have an equivalent construct, unfortunately.
>> (I also regret some of the choices I made for .md iterators, but that's
>> another story.)
>> 
>> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
>> with the "..." meaning "fill to match the longest operator list
>> in the loop".
>
> Hm, I’ll think about this.  It would be useful to have a function like
>
> Internal_fn ifn_for (combined_fn);
>
> So we can indirectly match all builtins with a switch on the ifn code.


There's:

extern internal_fn associated_internal_fn (combined_fn, tree);
extern internal_fn associated_internal_fn (tree);
extern internal_fn replacement_internal_fn (gcall *);

where the first one requires the return type, and the second one
operates on CALL_EXPRs.

Thanks,
Richard

Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

Reply via email to