Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-03 Thread Segher Boessenkool
On Wed, Jul 03, 2019 at 02:36:11PM -0400, Michael Meissner wrote:
> On Wed, Jul 03, 2019 at 12:55:41PM -0500, Segher Boessenkool wrote:
> > I don't think this is a good idea.  You can set "cost" directly, if that
> > is the only thing you need this for?
> 
> The trouble is the cost is currently a factor based on type + the cost from 
> the
> cost structure.  It really would be hard to set it to a single value in the
> insns without having to have complex means for setting the machine dependent
> costs.  If the numeric RTL attributes could set the value from a function, it
> would be simpler, but that isn't currently supported.

(set (attr "cost") (symbol_ref "any C expression you want here"))

It is supported.  The syntax is a bit weird, sure :-)


It may well be that some helper attribute can help here, but just
num_insns isn't it.


Segher


Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-03 Thread Segher Boessenkool
On Wed, Jul 03, 2019 at 01:06:27PM -0400, Michael Meissner wrote:
> On Mon, Jul 01, 2019 at 04:27:05PM -0500, Segher Boessenkool wrote:
> > > @@ -7385,8 +7385,8 @@ (define_insn "*mov_softfloat"
> > >   *,  *, *, *")
> > >  
> > > (set_attr "length"
> > > - "4,  4, 4, 4, 4, 4,
> > > - 4,  4, 8, 4")])
> > > + "*,  *, *, *, *, *,
> > > + *,  *, 8, *")])
> > 
> > [ That last line should start with a tab as well. ]
> 
> Ok.
> 
> > The entry before the 8 is split as well.  Maybe that should be "4", to
> > stand out?  I don't know what works better; your choice.
> 
> Though the "G" constraint specifically says for SFmode it is a single
> instruction.

Sure.  But that doesn't mean much, does it?  You cannot see the actual
split insns here, and that the constraint is for
  "Constant that can be copied into GPR with two insns for DF/DD
   and one for SF/SD."
is pretty weak :-)

The default is one insn, that's what * does.  You can still use 4 though
when that is beneficial, to make things a tiny bit clearer maybe.

> > > @@ -7696,8 +7696,8 @@ (define_insn "*mov_softfloat64"
> > >   *,   *,  *")
> > >  
> > > (set_attr "length"
> > > -"4,   4,  4,  4,  4,  8,
> > > - 12,  16, 4")])
> > > +"*,   *,  *,  *,  *,  8,
> > > + 12,  16, *")])
> > 
> > Same for the last entry here.
> 
> Well technically that alternative will never fire (destination is "*h" and
> source is "0"), and a nop is emitted.

If it can never fire we should remove it.  But it *can* fire, or I cannot
prove it cannot anyway.  Can you?

This would be a lovely cleanup if we could make it :-/

> I do wish we could never ever load
> floating point into SPR registers.  I've tried in the reload days to eliminate
> it, but there was always some abort if it got eliminated.

It should never load anything into CTR, LR, or VRSAVE, period.  :-)

Maybe we should use those as fixed registers, not allocatable as they
are now.


Segher


Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-03 Thread Michael Meissner
On Wed, Jul 03, 2019 at 12:55:41PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 03, 2019 at 12:50:37PM -0400, Michael Meissner wrote:
> > On Tue, Jul 02, 2019 at 07:09:20PM -0500, Segher Boessenkool wrote:
> > > We'll need to update our insn_cost for prefixed, sure, it currently does
> > >   int n = get_attr_length (insn) / 4;
> > > to figure out how many machine instructions a pattern is, and then uses
> > > "n" differently for the different types of insn.  We'll need to refine
> > > this a bit for prefixed instructions.
> > 
> > Yes, I have some plans with this regard.  In particular, I will be 
> > introducing
> > a "num_insns" RTL attribute, that if set is the number of instructions that
> > will be emitted.
> 
> I don't think this is a good idea.  You can set "cost" directly, if that
> is the only thing you need this for?

The trouble is the cost is currently a factor based on type + the cost from the
cost structure.  It really would be hard to set it to a single value in the
insns without having to have complex means for setting the machine dependent
costs.  If the numeric RTL attributes could set the value from a function, it
would be simpler, but that isn't currently supported.

Here is my current version of rs6000_insn_cost.  At the moment, I'm only
setting the "num_insns" in a few places, so the default would normally kick in.

/* How many real instructions are generated for this insn?  This is slightly
   different from the length attribute, in that the length attribute counts the
   number of bytes.  With prefixed instructions, we don't want to count a
   prefixed instruction (length 12 bytes including possible NOP) as taking 3
   instructions, but just one.  */

static int
rs6000_num_insns (rtx_insn *insn)
{
  /* If the insn provides an override, use it.  */
  int num = get_attr_num_insns (insn);

  if (!num)
{
  /* Try to figure it out based on the length and whether there are
 prefixed instructions.  While prefixed instructions are only 8 bytes,
 we have to use 12 as the size of the first prefixed instruction in
 case the instruction needs to be aligned.  Back to back prefixed
 instructions would only take 20 bytes, since it is guaranteed that one
 of the prefixed instructions does not need the alignment.  */
  int length = get_attr_length (insn);

  if (length >= 12 && TARGET_PREFIXED_ADDR
  && get_attr_prefixed (insn) == PREFIXED_YES)
{
  /* Single prefixed instruction.  */
  if (length == 12)
return 1;

  /* A normal instruction and a prefixed instruction (16) or two back
 to back prefixed instructions (20).  */
  if (length == 16 || length == 20)
return 2;

  /* Guess for larger instruction sizes.  */
  num = 2 + (length - 20) / 4;
}
  else 
num = length / 4;
}

  return num;
}

rs6000_insn_cost (rtx_insn *insn, bool speed)
{
  int cost;

  if (recog_memoized (insn) < 0)
return 0;

  if (!speed)
return get_attr_length (insn);

  cost = get_attr_cost (insn);
  if (cost > 0)
return cost;

  int n = rs6000_num_insns (insn);
  enum attr_type type = get_attr_type (insn);

  switch (type)
{
case TYPE_LOAD:
case TYPE_FPLOAD:
case TYPE_VECLOAD:
  cost = COSTS_N_INSNS (n + 1);
  break;

case TYPE_MUL:
  switch (get_attr_size (insn))
{
case SIZE_8:
  cost = COSTS_N_INSNS (n - 1) + rs6000_cost->mulsi_const9;
  break;
case SIZE_16:
  cost = COSTS_N_INSNS (n - 1) + rs6000_cost->mulsi_const;
  break;
case SIZE_32:
  cost = COSTS_N_INSNS (n - 1) + rs6000_cost->mulsi;
  break;
case SIZE_64:
  cost = COSTS_N_INSNS (n - 1) + rs6000_cost->muldi;
  break;
default:
  gcc_unreachable ();
}
  break;

// ...

> > What I was talking about is I've found some insns that don't set the length,
> > and are split.  Using the insn cost mechanism will mean that these 
> > instructions
> > will be thought of being cheaper than they actually are.
> 
> Yes.  Please split RTL insns as early as possible.  It also matters for
> scheduling, and it prevents exponential explosion of the number of
> patterns you need.  Only sometimes do you need to split late, usually
> because RA can put some registers in memory and you want to handle that
> optimally, or things depend on what exact register you were allocated
> (cr0 vs. crN for example, but could be GPR vs. VSR).

Generally most of the places I've been modifying with splits need to be handled
after register allocation.

> And again, you can set cost directly; length alone is not usually enough
> for determining the cost of split patterns.  But you do need length for
> accurate costs with -Os, hrm.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 

Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-03 Thread Segher Boessenkool
On Wed, Jul 03, 2019 at 12:50:37PM -0400, Michael Meissner wrote:
> On Tue, Jul 02, 2019 at 07:09:20PM -0500, Segher Boessenkool wrote:
> > We'll need to update our insn_cost for prefixed, sure, it currently does
> >   int n = get_attr_length (insn) / 4;
> > to figure out how many machine instructions a pattern is, and then uses
> > "n" differently for the different types of insn.  We'll need to refine
> > this a bit for prefixed instructions.
> 
> Yes, I have some plans with this regard.  In particular, I will be introducing
> a "num_insns" RTL attribute, that if set is the number of instructions that
> will be emitted.

I don't think this is a good idea.  You can set "cost" directly, if that
is the only thing you need this for?

> What I was talking about is I've found some insns that don't set the length,
> and are split.  Using the insn cost mechanism will mean that these 
> instructions
> will be thought of being cheaper than they actually are.

Yes.  Please split RTL insns as early as possible.  It also matters for
scheduling, and it prevents exponential explosion of the number of
patterns you need.  Only sometimes do you need to split late, usually
because RA can put some registers in memory and you want to handle that
optimally, or things depend on what exact register you were allocated
(cr0 vs. crN for example, but could be GPR vs. VSR).

And again, you can set cost directly; length alone is not usually enough
for determining the cost of split patterns.  But you do need length for
accurate costs with -Os, hrm.


Segher


Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-03 Thread Michael Meissner
On Mon, Jul 01, 2019 at 04:27:05PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> Sorry I missed this patch :-(
> 
> On Thu, Jun 27, 2019 at 04:18:00PM -0400, Michael Meissner wrote:
> > As we discussed off-line earlier, I changed all of the "4" lengths to be 
> > "*",
> > even for instruction alternatives that would not be subject to being 
> > changed to
> > be a prefixed instruction (i.e. the sign extend instruction "extsw"'s 
> > length is
> > set to "*", even though it would not be a prefixed instruction).
> 
> "*" means "use the default for this attribute", which often is nicer to
> see than "4".  For example, "8" stands out more in a sea of "*"s.
> 
> Usually "*" is the insns defined as "normal" alternatives, and "8" or
> "12" etc. are split.
> 
> > @@ -7231,7 +7231,7 @@ (define_insn "*movcc_internal1"
> >(const_string "mtjmpr")
> >(const_string "load")
> >(const_string "store")])
> > -   (set_attr "length" "4,4,12,4,4,8,4,4,4,4,4,4")])
> > +   (set_attr "length" "*,*,12,*,*,8,*,*,*,*,*,*")])
> 
> In this case, the "12" and "8" are actually defined as one insn in the
> template, with some "\;".  Luckily there aren't many of those.
> 
> > @@ -7385,8 +7385,8 @@ (define_insn "*mov_softfloat"
> >   *,  *, *, *")
> >  
> > (set_attr "length"
> > -   "4,  4, 4, 4, 4, 4,
> > - 4,  4, 8, 4")])
> > +   "*,  *, *, *, *, *,
> > + *,  *, 8, *")])
> 
> [ That last line should start with a tab as well. ]

Ok.

> The entry before the 8 is split as well.  Maybe that should be "4", to
> stand out?  I don't know what works better; your choice.

Though the "G" constraint specifically says for SFmode it is a single
instruction.

> > @@ -7696,8 +7696,8 @@ (define_insn "*mov_softfloat64"
> >   *,   *,  *")
> >  
> > (set_attr "length"
> > -"4,   4,  4,  4,  4,  8,
> > - 12,  16, 4")])
> > +"*,   *,  *,  *,  *,  8,
> > + 12,  16, *")])
> 
> Same for the last entry here.

Well technically that alternative will never fire (destination is "*h" and
source is "0"), and a nop is emitted.  I do wish we could never ever load
floating point into SPR registers.  I've tried in the reload days to eliminate
it, but there was always some abort if it got eliminated.

> 
> > @@ -8760,10 +8760,10 @@ (define_insn "*movdi_internal32"
> >vecsimple")
> > (set_attr "size" "64")
> > (set_attr "length"
> > - "8, 8, 8, 4, 4, 4,
> > -  16,4, 4, 4, 4, 4,
> > -  4, 4, 4, 4, 4, 8,
> > -  4")
> > + "8, 8, 8, *, *, *,
> > +  16,*, *, *, *, *,
> > +  *, *, *, *, *, 8,
> > +  *")
> 
> And the last here.

Well it will be split into a single VSPLTISW instruction.

> 
> > @@ -8853,11 +8853,11 @@ (define_insn "*movdi_internal64"
> >  mftgpr,mffgpr")
> > (set_attr "size" "64")
> > (set_attr "length"
> > -   "4, 4, 4, 4, 4,  20,
> > -4, 4, 4, 4, 4,  4,
> > -4, 4, 4, 4, 4,  4,
> > -4, 8, 4, 4, 4,  4,
> > -4, 4")
> > +   "*, *, *, *, *,  20,
> > +*, *, *, *, *,  *,
> > +*, *, *, *, *,  *,
> > +*, 8, *, *, *,  *,
> > +*, *")
> 
> And two of the entries here.

Though the second split becomes a single VSPLTISW instruction once again.

> > @@ -1150,9 +1150,9 @@ (define_insn "vsx_mov_64bit"
> >  store, load,  store, *, vecsimple, 
> > vecsimple,
> >  vecsimple, *, *, vecstore,  vecload")
> > (set_attr "length"
> > -   "4, 4, 4, 8, 4, 8,
> > -8, 8, 8, 8, 4, 4,
> > -4, 20,8, 4, 4")
> > +   "*, *, *, 8, *, 8,
> > +8, 8, 8, 8, *, *,
> > +*, 20,8, *, *")
> 
> No idea which ones are split here :-)  None of the * and all other would
> be nice, bu

Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-03 Thread Michael Meissner
On Tue, Jul 02, 2019 at 07:09:20PM -0500, Segher Boessenkool wrote:
> On Tue, Jul 02, 2019 at 07:36:21PM -0400, Michael Meissner wrote:
> > On Mon, Jul 01, 2019 at 04:27:05PM -0500, Segher Boessenkool wrote:
> > > The entry before the 8 is split as well.  Maybe that should be "4", to
> > > stand out?  I don't know what works better; your choice.
> > 
> > I'll look into it.  Note, the length is used in two places.  One at the end 
> > to
> > generate the appropriate branches, but the other is in rs6000_insn_cost 
> > inside
> > rs6000.c.
> 
> We'll need to update our insn_cost for prefixed, sure, it currently does
>   int n = get_attr_length (insn) / 4;
> to figure out how many machine instructions a pattern is, and then uses
> "n" differently for the different types of insn.  We'll need to refine
> this a bit for prefixed instructions.

Yes, I have some plans with this regard.  In particular, I will be introducing
a "num_insns" RTL attribute, that if set is the number of instructions that
will be emitted.

If "num_insns" is not set, then it will use the length and look at the
"prefixed" attribute.

> > This occurs before the final passes, so it is important that even
> > though the insn will be split, that the length is still set.  However, 
> > things
> > are rather inconsistant, in that sometimes the length field is accurate, and
> > sometimes not.
> 
> It *has* to be correct; it is allowed to be pessimistic though.  This
> is used to determine if a B-form branch can reach, for example.  You get
> ICEs if it isn't correct.  Only on very few testcases, of course :-(

To be clear.  Yes it has to be correct when the label handling is done.

What I was talking about is I've found some insns that don't set the length,
and are split.  Using the insn cost mechanism will mean that these instructions
will be thought of being cheaper than they actually are.

> > I'm finding that the rs6000_insn_cost issue really muddies things up,
> > particularly if a vector load/store insn is done in gpr registers, where it 
> > can
> > be 4 insns.
> 
> Yeah, it doesn't handle vectors specially *at all* currently, not even
> for FP in vectors.  It is pretty much just a switch over the "type", and
> for everything vector it gets to
> default:
>   cost = COSTS_N_INSNS (n);
> (with the above "n") which is a pretty coarse approximation to the truth ;-)
> 
> You could try #undef'ing TARGET_INSN_COST (in rs6000.c) for now, and hope
> that rs6000_rtx_costs does better for what you need right now.  In the end
> it will have to be fixed properly, insn_cost is quite important.
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-02 Thread Segher Boessenkool
On Tue, Jul 02, 2019 at 07:36:21PM -0400, Michael Meissner wrote:
> On Mon, Jul 01, 2019 at 04:27:05PM -0500, Segher Boessenkool wrote:
> > The entry before the 8 is split as well.  Maybe that should be "4", to
> > stand out?  I don't know what works better; your choice.
> 
> I'll look into it.  Note, the length is used in two places.  One at the end to
> generate the appropriate branches, but the other is in rs6000_insn_cost inside
> rs6000.c.

We'll need to update our insn_cost for prefixed, sure, it currently does
  int n = get_attr_length (insn) / 4;
to figure out how many machine instructions a pattern is, and then uses
"n" differently for the different types of insn.  We'll need to refine
this a bit for prefixed instructions.

> This occurs before the final passes, so it is important that even
> though the insn will be split, that the length is still set.  However, things
> are rather inconsistant, in that sometimes the length field is accurate, and
> sometimes not.

It *has* to be correct; it is allowed to be pessimistic though.  This
is used to determine if a B-form branch can reach, for example.  You get
ICEs if it isn't correct.  Only on very few testcases, of course :-(

> I'm finding that the rs6000_insn_cost issue really muddies things up,
> particularly if a vector load/store insn is done in gpr registers, where it 
> can
> be 4 insns.

Yeah, it doesn't handle vectors specially *at all* currently, not even
for FP in vectors.  It is pretty much just a switch over the "type", and
for everything vector it gets to
default:
  cost = COSTS_N_INSNS (n);
(with the above "n") which is a pretty coarse approximation to the truth ;-)

You could try #undef'ing TARGET_INSN_COST (in rs6000.c) for now, and hope
that rs6000_rtx_costs does better for what you need right now.  In the end
it will have to be fixed properly, insn_cost is quite important.


Segher


Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-02 Thread Michael Meissner
On Mon, Jul 01, 2019 at 04:27:05PM -0500, Segher Boessenkool wrote:
> The entry before the 8 is split as well.  Maybe that should be "4", to
> stand out?  I don't know what works better; your choice.

I'll look into it.  Note, the length is used in two places.  One at the end to
generate the appropriate branches, but the other is in rs6000_insn_cost inside
rs6000.c.  This occurs before the final passes, so it is important that even
though the insn will be split, that the length is still set.  However, things
are rather inconsistant, in that sometimes the length field is accurate, and
sometimes not.

I'm finding that the rs6000_insn_cost issue really muddies things up,
particularly if a vector load/store insn is done in gpr registers, where it can
be 4 insns.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Prepare for prefixed instructions on PowerPC

2019-07-01 Thread Segher Boessenkool
Hi Mike,

Sorry I missed this patch :-(

On Thu, Jun 27, 2019 at 04:18:00PM -0400, Michael Meissner wrote:
> As we discussed off-line earlier, I changed all of the "4" lengths to be "*",
> even for instruction alternatives that would not be subject to being changed 
> to
> be a prefixed instruction (i.e. the sign extend instruction "extsw"'s length 
> is
> set to "*", even though it would not be a prefixed instruction).

"*" means "use the default for this attribute", which often is nicer to
see than "4".  For example, "8" stands out more in a sea of "*"s.

Usually "*" is the insns defined as "normal" alternatives, and "8" or
"12" etc. are split.

> @@ -7231,7 +7231,7 @@ (define_insn "*movcc_internal1"
>(const_string "mtjmpr")
>(const_string "load")
>(const_string "store")])
> -   (set_attr "length" "4,4,12,4,4,8,4,4,4,4,4,4")])
> +   (set_attr "length" "*,*,12,*,*,8,*,*,*,*,*,*")])

In this case, the "12" and "8" are actually defined as one insn in the
template, with some "\;".  Luckily there aren't many of those.

> @@ -7385,8 +7385,8 @@ (define_insn "*mov_softfloat"
>   *,  *, *, *")
>  
> (set_attr "length"
> - "4,  4, 4, 4, 4, 4,
> - 4,  4, 8, 4")])
> + "*,  *, *, *, *, *,
> + *,  *, 8, *")])

[ That last line should start with a tab as well. ]

The entry before the 8 is split as well.  Maybe that should be "4", to
stand out?  I don't know what works better; your choice.

> @@ -7696,8 +7696,8 @@ (define_insn "*mov_softfloat64"
>   *,   *,  *")
>  
> (set_attr "length"
> -"4,   4,  4,  4,  4,  8,
> - 12,  16, 4")])
> +"*,   *,  *,  *,  *,  8,
> + 12,  16, *")])

Same for the last entry here.

> @@ -8760,10 +8760,10 @@ (define_insn "*movdi_internal32"
>vecsimple")
> (set_attr "size" "64")
> (set_attr "length"
> - "8, 8, 8, 4, 4, 4,
> -  16,4, 4, 4, 4, 4,
> -  4, 4, 4, 4, 4, 8,
> -  4")
> + "8, 8, 8, *, *, *,
> +  16,*, *, *, *, *,
> +  *, *, *, *, *, 8,
> +  *")

And the last here.

> @@ -8853,11 +8853,11 @@ (define_insn "*movdi_internal64"
>  mftgpr,mffgpr")
> (set_attr "size" "64")
> (set_attr "length"
> -   "4, 4, 4, 4, 4,  20,
> -4, 4, 4, 4, 4,  4,
> -4, 4, 4, 4, 4,  4,
> -4, 8, 4, 4, 4,  4,
> -4, 4")
> +   "*, *, *, *, *,  20,
> +*, *, *, *, *,  *,
> +*, *, *, *, *,  *,
> +*, 8, *, *, *,  *,
> +*, *")

And two of the entries here.

> @@ -1150,9 +1150,9 @@ (define_insn "vsx_mov_64bit"
>  store, load,  store, *, vecsimple, 
> vecsimple,
>  vecsimple, *, *, vecstore,  vecload")
> (set_attr "length"
> -   "4, 4, 4, 8, 4, 8,
> -8, 8, 8, 8, 4, 4,
> -4, 20,8, 4, 4")
> +   "*, *, *, 8, *, 8,
> +8, 8, 8, 8, *, *,
> +*, 20,8, *, *")

No idea which ones are split here :-)  None of the * and all other would
be nice, but who knows :-)


Okay for trunk, maybe with some "4" if you agree that has value.  Thanks!
And again, sorry I missed this patch.


Segher