Richard Biener <rguent...@suse.de> writes: >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford >> <richard.sandif...@arm.com>>> Richard Biener <rguent...@suse.de> writes: >>> On Thu, 5 Oct 2023, Tamar Christina wrote: >>> >>>>> I suppose the idea is that -abs(x) might be easier to optimize with other >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)). >>>>> >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less >>>>> canonical than copysign. >>>>> >>>>>> Should I try removing this? >>>>> >>>>> I'd say yes (and put the reverse canonicalization next to this pattern). >>>>> >>>> >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more >>>> canonical and allows a target to expand this sequence efficiently. Such >>>> sequences are common in scientific code working with gradients. >>>> >>>> various optimizations in match.pd only happened on COPYSIGN but not >>>> COPYSIGN_ALL >>>> which means they exclude IFN_COPYSIGN. COPYSIGN however is restricted to >>>> only >>> >>> That's not true: >>> >>> (define_operator_list COPYSIGN >>> BUILT_IN_COPYSIGNF >>> BUILT_IN_COPYSIGN >>> BUILT_IN_COPYSIGNL >>> IFN_COPYSIGN) >>> >>> but they miss the extended float builtin variants like >>> __builtin_copysignf16. Also see below >>> >>>> the C99 builtins and so doesn't work for vectors. >>>> >>>> The patch expands these optimizations to work on COPYSIGN_ALL. >>>> >>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs >>>> (x)) >>>> which I remove since this is a less efficient form. The testsuite is also >>>> updated in light of this. >>>> >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >>>> >>>> Ok for master? >>>> >>>> Thanks, >>>> Tamar >>>> >>>> gcc/ChangeLog: >>>> >>>> PR tree-optimization/109154 >>>> * match.pd: Add new neg+abs rule, remove inverse copysign rule and >>>> expand existing copysign optimizations. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> PR tree-optimization/109154 >>>> * gcc.dg/fold-copysign-1.c: Updated. >>>> * gcc.dg/pr55152-2.c: Updated. >>>> * gcc.dg/tree-ssa/abs-4.c: Updated. >>>> * gcc.dg/tree-ssa/backprop-6.c: Updated. >>>> * gcc.dg/tree-ssa/copy-sign-2.c: Updated. >>>> * gcc.dg/tree-ssa/mult-abs-2.c: Updated. >>>> * gcc.target/aarch64/fneg-abs_1.c: New test. >>>> * gcc.target/aarch64/fneg-abs_2.c: New test. >>>> * gcc.target/aarch64/fneg-abs_3.c: New test. >>>> * gcc.target/aarch64/fneg-abs_4.c: New test. >>>> * gcc.target/aarch64/sve/fneg-abs_1.c: New test. >>>> * gcc.target/aarch64/sve/fneg-abs_2.c: New test. >>>> * gcc.target/aarch64/sve/fneg-abs_3.c: New test. >>>> * gcc.target/aarch64/sve/fneg-abs_4.c: New test. >>>> >>>> --- inline copy of patch --- >>>> >>>> diff --git a/gcc/match.pd b/gcc/match.pd >>>> index >>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2 >>>> 100644 >>>> --- a/gcc/match.pd >>>> +++ b/gcc/match.pd >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >>>> >>>> /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */ >>>> (for coss (COS COSH) >>>> - copysigns (COPYSIGN) >>>> - (simplify >>>> - (coss (copysigns @0 @1)) >>>> - (coss @0))) >>>> + (for copysigns (COPYSIGN_ALL) >>> >>> So this ends up generating for example the match >>> (cosf (copysignl ...)) which doesn't make much sense. >>> >>> The lock-step iteration did >>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...)) >>> which is leaner but misses the case of >>> (cosf (ifn_copysign ..)) - that's probably what you are >>> after with this change. >>> >>> That said, there isn't a nice solution (without altering the match.pd >>> IL). There's the explicit solution, spelling out all combinations. >>> >>> So if we want to go with yout pragmatic solution changing this >>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step >>> for iteration to a cross product for iteration is. >>> >>> Changing just this pattern to >>> >>> (for coss (COS COSH) >>> (for copysigns (COPYSIGN) >>> (simplify >>> (coss (copysigns @0 @1)) >>> (coss @0)))) >>> >>> increases the total number of gimple-match-x.cc lines from >>> 234988 to 235324. >> >> I guess the difference between this and the later suggestions is that >> this one allows builtin copysign to be paired with ifn cos, which would >> be potentially useful in other situations. (It isn't here because >> ifn_cos is rarely provided.) How much of the growth is due to that, >> and much of it is from nonsensical combinations like >> (builtin_cosf (builtin_copysignl ...))? >> >> If it's mostly from nonsensical combinations then would it be possible >> to make genmatch drop them? >> >>> The alternative is to do >>> >>> (for coss (COS COSH) >>> copysigns (COPYSIGN) >>> (simplify >>> (coss (copysigns @0 @1)) >>> (coss @0)) >>> (simplify >>> (coss (IFN_COPYSIGN @0 @1)) >>> (coss @0))) >>> >>> which properly will diagnose a duplicate pattern. Ther are >>> currently no operator lists with just builtins defined (that >>> could be fixed, see gencfn-macros.cc), supposed we'd have >>> COS_C we could do >>> >>> (for coss (COS_C COSH_C IFN_COS IFN_COSH) >>> copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN >>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN >>> IFN_COPYSIGN) >>> (simplify >>> (coss (copysigns @0 @1)) >>> (coss @0))) >>> >>> which of course still looks ugly ;) (some syntax extension like >>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy >>> enough to do) >>> >>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL, >>> re-do it to only split the fors, keeping COPYSIGN and provide >>> some statistics on the gimple-match-* size? I think this might >>> be the pragmatic solution for now. >>> >>> Richard - can you think of a clever way to express the desired >>> iteration? How do RTL macro iterations address cases like this? >> >> I don't think .md files have an equivalent construct, unfortunately. >> (I also regret some of the choices I made for .md iterators, but that's >> another story.) >> >> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...", >> with the "..." meaning "fill to match the longest operator list >> in the loop". > > Hm, I’ll think about this. It would be useful to have a function like > > Internal_fn ifn_for (combined_fn); > > So we can indirectly match all builtins with a switch on the ifn code.
There's: extern internal_fn associated_internal_fn (combined_fn, tree); extern internal_fn associated_internal_fn (tree); extern internal_fn replacement_internal_fn (gcall *); where the first one requires the return type, and the second one operates on CALL_EXPRs. Thanks, Richard