On Wed, Jul 03, 2019 at 12:55:41PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 03, 2019 at 12:50:37PM -0400, Michael Meissner wrote:
> > On Tue, Jul 02, 2019 at 07:09:20PM -0500, Segher Boessenkool wrote:
> > > We'll need to update our insn_cost for prefixed, sure, it currently does
> > >   int n = get_attr_length (insn) / 4;
> > > to figure out how many machine instructions a pattern is, and then uses
> > > "n" differently for the different types of insn.  We'll need to refine
> > > this a bit for prefixed instructions.
> > 
> > Yes, I have some plans with this regard.  In particular, I will be 
> > introducing
> > a "num_insns" RTL attribute, that if set is the number of instructions that
> > will be emitted.
> 
> I don't think this is a good idea.  You can set "cost" directly, if that
> is the only thing you need this for?

The trouble is the cost is currently a factor based on type + the cost from the
cost structure.  It really would be hard to set it to a single value in the
insns without having to have complex means for setting the machine dependent
costs.  If the numeric RTL attributes could set the value from a function, it
would be simpler, but that isn't currently supported.

Here is my current version of rs6000_insn_cost.  At the moment, I'm only
setting the "num_insns" in a few places, so the default would normally kick in.

/* How many real instructions are generated for this insn?  This is slightly
   different from the length attribute, in that the length attribute counts the
   number of bytes.  With prefixed instructions, we don't want to count a
   prefixed instruction (length 12 bytes including possible NOP) as taking 3
   instructions, but just one.  */

static int
rs6000_num_insns (rtx_insn *insn)
{
  /* If the insn provides an override, use it.  */
  int num = get_attr_num_insns (insn);

  if (!num)
    {
      /* Try to figure it out based on the length and whether there are
         prefixed instructions.  While prefixed instructions are only 8 bytes,
         we have to use 12 as the size of the first prefixed instruction in
         case the instruction needs to be aligned.  Back to back prefixed
         instructions would only take 20 bytes, since it is guaranteed that one
         of the prefixed instructions does not need the alignment.  */
      int length = get_attr_length (insn);

      if (length >= 12 && TARGET_PREFIXED_ADDR
          && get_attr_prefixed (insn) == PREFIXED_YES)
        {
          /* Single prefixed instruction.  */
          if (length == 12)
            return 1;

          /* A normal instruction and a prefixed instruction (16) or two back
             to back prefixed instructions (20).  */
          if (length == 16 || length == 20)
            return 2;

          /* Guess for larger instruction sizes.  */
          num = 2 + (length - 20) / 4;
        }
      else 
        num = length / 4;
    }

  return num;
}

rs6000_insn_cost (rtx_insn *insn, bool speed)
{
  int cost;

  if (recog_memoized (insn) < 0)
    return 0;

  if (!speed)
    return get_attr_length (insn);

  cost = get_attr_cost (insn);
  if (cost > 0)
    return cost;

  int n = rs6000_num_insns (insn);
  enum attr_type type = get_attr_type (insn);

  switch (type)
    {
    case TYPE_LOAD:
    case TYPE_FPLOAD:
    case TYPE_VECLOAD:
      cost = COSTS_N_INSNS (n + 1);
      break;

    case TYPE_MUL:
      switch (get_attr_size (insn))
        {
        case SIZE_8:
          cost = COSTS_N_INSNS (n - 1) + rs6000_cost->mulsi_const9;
          break;
        case SIZE_16:
          cost = COSTS_N_INSNS (n - 1) + rs6000_cost->mulsi_const;
          break;
        case SIZE_32:
          cost = COSTS_N_INSNS (n - 1) + rs6000_cost->mulsi;
          break;
        case SIZE_64:
          cost = COSTS_N_INSNS (n - 1) + rs6000_cost->muldi;
          break;
        default:
          gcc_unreachable ();
        }
      break;

// ...

> > What I was talking about is I've found some insns that don't set the length,
> > and are split.  Using the insn cost mechanism will mean that these 
> > instructions
> > will be thought of being cheaper than they actually are.
> 
> Yes.  Please split RTL insns as early as possible.  It also matters for
> scheduling, and it prevents exponential explosion of the number of
> patterns you need.  Only sometimes do you need to split late, usually
> because RA can put some registers in memory and you want to handle that
> optimally, or things depend on what exact register you were allocated
> (cr0 vs. crN for example, but could be GPR vs. VSR).

Generally most of the places I've been modifying with splits need to be handled
after register allocation.

> And again, you can set cost directly; length alone is not usually enough
> for determining the cost of split patterns.  But you do need length for
> accurate costs with -Os, hrm.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Reply via email to