On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> > Also use rs6000_cost only for speed.
> 
> More directly: use something completely different for !speed, namely,
> code size.

Yes, that might be better.

> > -      if (CONST_INT_P (XEXP (x, 1))
> > -     && satisfies_constraint_I (XEXP (x, 1)))
> > +      if (!speed)
> > +   /* A little more than one insn so that nothing is tempted to
> > +      turn a shift left into a multiply.  */
> > +   *total = COSTS_N_INSNS (1) + 1;
> 
> Please don't.  We have a lot of code elsewhere to handle this directly,
> already.  Also, this is just wrong for size.  Five shifts is *not*
> better than four muls.  If that is the only way to get good results,
> than unfortunately we probably have to; but do not do this without any
> proof.

Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
result for !speed rs6000_rtx_costs is the entire basis for the !speed
changes.  I don't have any proof that this is correct.

> >      case FMA:
> > -      if (mode == SFmode)
> > +      if (!speed)
> > +   *total = COSTS_N_INSNS (1) + 1;
> 
> Not here, either.
> 
> >      case DIV:
> >      case MOD:
> >        if (FLOAT_MODE_P (mode))
> >     {
> > -     *total = mode == DFmode ? rs6000_cost->ddiv
> > -                             : rs6000_cost->sdiv;
> > +     if (!speed)
> > +       *total = COSTS_N_INSNS (1) + 2;
> 
> And why + 2 even?
> 
> > -     if (GET_MODE (XEXP (x, 1)) == DImode)
> > +     if (!speed)
> > +       *total = COSTS_N_INSNS (1) + 2;
> > +     else if (GET_MODE (XEXP (x, 1)) == DImode)
> >         *total = rs6000_cost->divdi;
> >       else
> >         *total = rs6000_cost->divsi;
> 
> (more)

OK, I can remove all the !speed changes.  To be honest, I didn't look
anywhere near as much at code size changes as I worried about
performance.  And about not regressing any fiddly testcase we have.

> > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >        return false;
> >  
> >      case AND:
> > +      *total = COSTS_N_INSNS (1);
> >        right = XEXP (x, 1);
> >        if (CONST_INT_P (right))
> >     {
> > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >            || left_code == LSHIFTRT)
> >           && rs6000_is_valid_shift_mask (right, left, mode))
> >         {
> > -         *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > -         if (!CONST_INT_P (XEXP (left, 1)))
> > -           *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> > speed);
> > -         *total += COSTS_N_INSNS (1);
> > +         rtx reg_op = XEXP (left, 0);
> > +         if (!REG_P (reg_op))
> > +           *total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > +         reg_op = XEXP (left, 1);
> > +         if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > +           *total += rtx_cost (reg_op, mode, left_code, 1, speed);
> >           return true;
> >         }
> >     }
> > -
> > -      *total = COSTS_N_INSNS (1);
> >        return false;
> 
> This doesn't improve anything?  It just makes it different from all
> surrounding code?

So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
regs, like it doesn't for const_int.

> > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code,
> >        if (outer_code == TRUNCATE
> >       && GET_CODE (XEXP (x, 0)) == MULT)
> >     {
> > -     if (mode == DImode)
> > +     if (!speed)
> > +       *total = COSTS_N_INSNS (1) + 1;
> 
> (more)
> 
> > +    case SET:
> > +      /* The default cost of a SET is the number of general purpose
> > +    regs being set multiplied by COSTS_N_INSNS (1).  That only
> > +    works where the incremental cost of the operation and
> > +    operands is zero, when the operation performed can be done in
> > +    one instruction.  For other cases where we add COSTS_N_INSNS
> > +    for some operation (see point 5 above), COSTS_N_INSNS (1)
> > +    should be subtracted from the total cost.  */
> 
> What does "incremental cost" mean there?  If what increases?
> 
> > +      {
> > +   rtx_code src_code = GET_CODE (SET_SRC (x));
> > +   if (src_code == CONST_INT
> > +       || src_code == CONST_DOUBLE
> > +       || src_code == CONST_WIDE_INT)
> > +     return false;
> > +   int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > +                   + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> 
> This should use set_src_cost, if anything.  But that will recurse then,
> currently?  Ugh.
> 
> Using rtx_cost for SET_DEST is problematic, too.
> 
> What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> that, hrm.
> 
> rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> (ab)used for that, sigh.
> 
> Many targets have something for it already, but all quite different from
> this.

Right, you are starting to understand just how difficult it is to do
anything at all to rs6000_rtx_costs.

> > +   if (set_cost >= COSTS_N_INSNS (1))
> > +     *total += set_cost - COSTS_N_INSNS (1);
> 
> I don't understand this part at all, for example.  Why not just
>   *total += set_cost - COSTS_N_INSNS (1);
> ?  If set_cost is lower than one insn's cost, don't we have a problem
> already?

The set_cost I calculate here from src and dest can easily be zero.
(set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
dest cost of zero and a src cost of zero.  That can't change without
breaking places where rtx_costs is called to compare pieces of RTL.
Here though we happen to be looking at a SET, so have an entire
instruction.  The value returned should be comparable to our
instruction costs.  That's tricky to do, and this change is just a
hack.  Without the hack I saw some testcases regress.

I don't like this hack any more than you do reviewing it!

> 
> Generic things.  Please split this patch up when sending it again, it
> does too many different things, and many of those are not obvious.
> 
> All such changes that aren't completely obvious (like the previous ones
> were) should have some measurement.  We are in stage1, and we will
> notice (non-trivial) degradations, but if we can expect degradations
> (like for this patch), it needs benchmarking.

Pat did benchmark these changes..  I was somewhat surprised to see
a small improvement in spec results.

> Since you add !speed all over the place, maybe we should just have a
> separate function that does !speed?  It looks like quite a few things
> will simplify.

Revised patch as follows.

        * config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
        when insn operation cost handled on recursive call.  Tidy
        break/return.  Tidy AND costing.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6af8a9a31cb..26c2f443502 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21397,7 +21397,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
        *total = rs6000_cost->fp;
       else
        *total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
@@ -21457,6 +21457,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
       return false;
 
     case AND:
+      *total = COSTS_N_INSNS (1);
       right = XEXP (x, 1);
       if (CONST_INT_P (right))
        {
@@ -21469,15 +21470,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
               || left_code == LSHIFTRT)
              && rs6000_is_valid_shift_mask (right, left, mode))
            {
-             *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-             if (!CONST_INT_P (XEXP (left, 1)))
-               *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
speed);
-             *total += COSTS_N_INSNS (1);
+             rtx reg_op = XEXP (left, 0);
+             if (!REG_P (reg_op))
+               *total += rtx_cost (reg_op, mode, left_code, 0, speed);
+             reg_op = XEXP (left, 1);
+             if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+               *total += rtx_cost (reg_op, mode, left_code, 1, speed);
              return true;
            }
        }
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
@@ -21575,7 +21576,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
          *total = rs6000_cost->fp;
          return false;
        }
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21613,13 +21614,40 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
          *total = 0;
          return true;
        }
-      break;
+      return false;
+
+    case SET:
+      /* The default cost of a SET is the number of general purpose
+        regs being set multiplied by COSTS_N_INSNS (1).  Here we
+        happen to be looking at a SET, so have an instruction rather
+        than just a piece of RTL and want to return a cost comparable
+        to rs6000 instruction costing.  That's a little complicated
+        because in some cases the cost of SET operands is non-zero,
+        see point 5 above and cost of PLUS for example, and in
+        others it is zero, for example for (set (reg) (reg)).
+        But (set (reg) (reg)) actually costs the same as 
+        (set (reg) (plus (reg) (reg))).  Hack around this by
+        subtracting COSTS_N_INSNS (1) from the operand cost in cases
+        were we add COSTS_N_INSNS (1) for some operation.  Don't do
+        so for constants that might cost more than zero because they
+        don't fit in one instruction.  FIXME: rtx_costs should not be
+        looking at entire instructions.  */
+      {
+       rtx_code src_code = GET_CODE (SET_SRC (x));
+       if (src_code == CONST_INT
+           || src_code == CONST_DOUBLE
+           || src_code == CONST_WIDE_INT)
+         return false;
+       int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+                       + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+       if (set_cost >= COSTS_N_INSNS (1))
+         *total += set_cost - COSTS_N_INSNS (1);
+       return true;
+      }
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */


-- 
Alan Modra
Australia Development Lab, IBM

Reply via email to