On Sat, 7 Oct 2023, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford 
> >> <richard.sandif...@arm.com>>> Richard Biener <rguent...@suse.de> writes:
> >>> On Thu, 5 Oct 2023, Tamar Christina wrote:
> >>> 
> >>>>> I suppose the idea is that -abs(x) might be easier to optimize with 
> >>>>> other
> >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> >>>>> 
> >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is 
> >>>>> less
> >>>>> canonical than copysign.
> >>>>> 
> >>>>>> Should I try removing this?
> >>>>> 
> >>>>> I'd say yes (and put the reverse canonicalization next to this pattern).
> >>>>> 
> >>>> 
> >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> >>>> canonical and allows a target to expand this sequence efficiently.  Such
> >>>> sequences are common in scientific code working with gradients.
> >>>> 
> >>>> various optimizations in match.pd only happened on COPYSIGN but not 
> >>>> COPYSIGN_ALL
> >>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted 
> >>>> to only
> >>> 
> >>> That's not true:
> >>> 
> >>> (define_operator_list COPYSIGN
> >>>    BUILT_IN_COPYSIGNF
> >>>    BUILT_IN_COPYSIGN
> >>>    BUILT_IN_COPYSIGNL
> >>>    IFN_COPYSIGN)
> >>> 
> >>> but they miss the extended float builtin variants like
> >>> __builtin_copysignf16.  Also see below
> >>> 
> >>>> the C99 builtins and so doesn't work for vectors.
> >>>> 
> >>>> The patch expands these optimizations to work on COPYSIGN_ALL.
> >>>> 
> >>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs 
> >>>> (x))
> >>>> which I remove since this is a less efficient form.  The testsuite is 
> >>>> also
> >>>> updated in light of this.
> >>>> 
> >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>>> 
> >>>> Ok for master?
> >>>> 
> >>>> Thanks,
> >>>> Tamar
> >>>> 
> >>>> gcc/ChangeLog:
> >>>> 
> >>>>    PR tree-optimization/109154
> >>>>    * match.pd: Add new neg+abs rule, remove inverse copysign rule and
> >>>>    expand existing copysign optimizations.
> >>>> 
> >>>> gcc/testsuite/ChangeLog:
> >>>> 
> >>>>    PR tree-optimization/109154
> >>>>    * gcc.dg/fold-copysign-1.c: Updated.
> >>>>    * gcc.dg/pr55152-2.c: Updated.
> >>>>    * gcc.dg/tree-ssa/abs-4.c: Updated.
> >>>>    * gcc.dg/tree-ssa/backprop-6.c: Updated.
> >>>>    * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> >>>>    * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> >>>>    * gcc.target/aarch64/fneg-abs_1.c: New test.
> >>>>    * gcc.target/aarch64/fneg-abs_2.c: New test.
> >>>>    * gcc.target/aarch64/fneg-abs_3.c: New test.
> >>>>    * gcc.target/aarch64/fneg-abs_4.c: New test.
> >>>>    * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> >>>>    * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> >>>>    * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> >>>>    * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
> >>>> 
> >>>> --- inline copy of patch ---
> >>>> 
> >>>> diff --git a/gcc/match.pd b/gcc/match.pd
> >>>> index 
> >>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
> >>>>  100644
> >>>> --- a/gcc/match.pd
> >>>> +++ b/gcc/match.pd
> >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>>> 
> >>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
> >>>> (for coss (COS COSH)
> >>>> -     copysigns (COPYSIGN)
> >>>> - (simplify
> >>>> -  (coss (copysigns @0 @1))
> >>>> -   (coss @0)))
> >>>> + (for copysigns (COPYSIGN_ALL)
> >>> 
> >>> So this ends up generating for example the match
> >>> (cosf (copysignl ...)) which doesn't make much sense.
> >>> 
> >>> The lock-step iteration did
> >>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
> >>> which is leaner but misses the case of
> >>> (cosf (ifn_copysign ..)) - that's probably what you are
> >>> after with this change.
> >>> 
> >>> That said, there isn't a nice solution (without altering the match.pd
> >>> IL).  There's the explicit solution, spelling out all combinations.
> >>> 
> >>> So if we want to go with yout pragmatic solution changing this
> >>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
> >>> for iteration to a cross product for iteration is.
> >>> 
> >>> Changing just this pattern to
> >>> 
> >>> (for coss (COS COSH)
> >>> (for copysigns (COPYSIGN)
> >>>  (simplify
> >>>   (coss (copysigns @0 @1))
> >>>   (coss @0))))
> >>> 
> >>> increases the total number of gimple-match-x.cc lines from
> >>> 234988 to 235324.
> >> 
> >> I guess the difference between this and the later suggestions is that
> >> this one allows builtin copysign to be paired with ifn cos, which would
> >> be potentially useful in other situations.  (It isn't here because
> >> ifn_cos is rarely provided.)  How much of the growth is due to that,
> >> and much of it is from nonsensical combinations like
> >> (builtin_cosf (builtin_copysignl ...))?
> >> 
> >> If it's mostly from nonsensical combinations then would it be possible
> >> to make genmatch drop them?
> >> 
> >>> The alternative is to do
> >>> 
> >>> (for coss (COS COSH)
> >>>     copysigns (COPYSIGN)
> >>> (simplify
> >>>  (coss (copysigns @0 @1))
> >>>   (coss @0))
> >>> (simplify
> >>>  (coss (IFN_COPYSIGN @0 @1))
> >>>   (coss @0)))
> >>> 
> >>> which properly will diagnose a duplicate pattern.  Ther are
> >>> currently no operator lists with just builtins defined (that
> >>> could be fixed, see gencfn-macros.cc), supposed we'd have
> >>> COS_C we could do
> >>> 
> >>> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
> >>>     copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN 
> >>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN 
> >>> IFN_COPYSIGN)
> >>> (simplify
> >>>  (coss (copysigns @0 @1))
> >>>   (coss @0)))
> >>> 
> >>> which of course still looks ugly ;) (some syntax extension like
> >>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
> >>> enough to do)
> >>> 
> >>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
> >>> re-do it to only split the fors, keeping COPYSIGN and provide
> >>> some statistics on the gimple-match-* size?  I think this might
> >>> be the pragmatic solution for now.
> >>> 
> >>> Richard - can you think of a clever way to express the desired
> >>> iteration?  How do RTL macro iterations address cases like this?
> >> 
> >> I don't think .md files have an equivalent construct, unfortunately.
> >> (I also regret some of the choices I made for .md iterators, but that's
> >> another story.)
> >> 
> >> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
> >> with the "..." meaning "fill to match the longest operator list
> >> in the loop".
> >
> > Hm, I?ll think about this.  It would be useful to have a function like
> >
> > Internal_fn ifn_for (combined_fn);
> >
> > So we can indirectly match all builtins with a switch on the ifn code.
> 
> There's:
> 
> extern internal_fn associated_internal_fn (combined_fn, tree);
> extern internal_fn associated_internal_fn (tree);
> extern internal_fn replacement_internal_fn (gcall *);
> 
> where the first one requires the return type, and the second one
> operates on CALL_EXPRs.

Hmm, for full generality the way we code-generate would need to change
quite a bit.  Instead I've come up with the following quite limited
approach.  You can write

(for coss (COS COSH)
 (simplify
  (coss (ANY_COPYSIGN @0 @1))
  (coss @0))))

with it.  For each internal function the following patch adds a
ANY_<name> identifier.  The use is somewhat limited - you cannot
use it as the outermost operation in the match part and you cannot
use it in the replacement part at all.  The nice thing is there's
no "iteration" at all, the ANY_COPYSIGN doesn't cause any pattern
duplication, instead we match it via CASE_CFN_<name> so it will
happily match mis-matched (typewise) calls (but those shouldn't
be there...).

The patch doesn't contain any defensiveness in the parser for the
use restriction, but you should get compile failures for misuses
at least.

It should match quite some of the copysign cases, I suspect its
of no use for most of the operators so maybe less general handling
and only specifically introducing ANY_COPYSIGN would be better.
At least I cannot think of any other functions that are matched
but disappear in the resulting replacement?

Richard.

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 03d325efdf6..f7d3f51c013 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -524,10 +524,14 @@ class fn_id : public id_base
 {
 public:
   fn_id (enum built_in_function fn_, const char *id_)
-      : id_base (id_base::FN, id_), fn (fn_) {}
+      : id_base (id_base::FN, id_), fn (fn_), case_macro (nullptr) {}
   fn_id (enum internal_fn fn_, const char *id_)
-      : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {}
+      : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)),
+    case_macro (nullptr) {}
+  fn_id (const char *case_macro_, const char *id_)
+      : id_base (id_base::FN, id_), fn (-1U), case_macro (case_macro_) {}
   unsigned int fn;
+  const char *case_macro;
 };
 
 class simplify;
@@ -3262,6 +3266,10 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
              if (user_id *u = dyn_cast <user_id *> (e->operation))
                for (auto id : u->substitutes)
                  fprintf_indent (f, indent, "case %s:\n", id->id);
+             else if (is_a <fn_id *> (e->operation)
+                      && as_a <fn_id *> (e->operation)->case_macro)
+               fprintf_indent (f, indent, "%s:\n",
+                               as_a <fn_id *> (e->operation)->case_macro);
              else
                fprintf_indent (f, indent, "case %s:\n", e->operation->id);
              /* We need to be defensive against bogus prototypes allowing
@@ -3337,9 +3345,12 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
       for (unsigned j = 0; j < generic_fns.length (); ++j)
        {
          expr *e = as_a <expr *>(generic_fns[j]->op);
-         gcc_assert (e->operation->kind == id_base::FN);
+         fn_id *oper = as_a <fn_id *> (e->operation);
 
-         fprintf_indent (f, indent, "case %s:\n", e->operation->id);
+         if (oper->case_macro)
+           fprintf_indent (f, indent, "%s:\n", oper->case_macro);
+         else
+           fprintf_indent (f, indent, "case %s:\n", e->operation->id);
          fprintf_indent (f, indent, "  if (call_expr_nargs (%s) == %d)\n"
                                     "    {\n", kid_opname, e->ops.length ());
          generic_fns[j]->gen (f, indent + 6, false, depth);
@@ -5496,7 +5507,8 @@ main (int argc, char **argv)
 #include "builtins.def"
 
 #define DEF_INTERNAL_FN(CODE, NAME, FNSPEC) \
-  add_function (IFN_##CODE, "CFN_" #CODE);
+  add_function (IFN_##CODE, "CFN_" #CODE); \
+  add_function ("CASE_CFN_" # CODE, "ANY_" # CODE);
 #include "internal-fn.def"
 
   /* Parse ahead!  */

Reply via email to