David Edelsohn wrote:
> On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra <amo...@gmail.com> wrote:
> > This is PR68973 part 2, the failure of a boost test, caused by a
> > splitter emitting an invalid move in reload_vsx_from_gprsf:
> >   emit_move_insn (op0_di, op2);
> >
> > op0 can be any vsx reg, but the mtvsrd destination constraint in
> > movdi_internal64 is "wj", which only allows fp regs.  I'm not sure why
> > we have that constraint so left movdi_internal64 alone and used a
> > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used
> > by reload_vsx_from_gpr<mode>.  When looking at those, I noticed we're
> > restricted to fprs there too so fixed that as well.  (We can't use %L
> > in asm operands that must accept vsx.)
> 
> Michael, please review the "wj" constraint in these patterns.
> 
> Alan, the explanation says that it uses a special p8_mtvsrd similar to
> p8_mtvsrd_[12], but does not explain why the two similar patterns are
> removed.  The incorrect use of %L implies those patterns, but the
> change is to p8_xxpermdi_<mode> that is not mentioned in the
> ChangeLog.
> 
> I also would appreciate Uli's comments on this direction because of
> his reload expertise.

For the most part, this patch doesn't really change anything in the
interaction with reload as far as I can see.  The changes introduced
by the patch do make sense to me.  In particular, replacing the two
patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts
of a TFmode register pair with a single pattern p8_mtvsrd that just
works on any DFmode register (leaving the split into high/low to the
caller if necessary) seems to simplify things.

> > +  /* You might think that we could use op0 as one temp and a DF clobber
> > +     as the other, but you'd be wrong.  These secondary_reload patterns
> > +     don't check that the clobber is different to the destination, which
> > +     is probably a reload bug.  */

It's not a bug, it's deliberate behavior.  The reload registers allocated
for secondary reload clobbers may overlap the destination, since in many
cases you simply move the input to the reload register, and then the
reload register to the destination (where the latter move can be a no-op
if it is possible to allocate the reload register and the destination
into the same physical register).  If you need two separate intermediate
values, you need to allocate a secondary reload register of a larger
mode (as is already done in the pattern).

> >    /* Also use the destination register to hold the unconverted DImode 
> > value.
> >       This is conceptually a separate value from OP0, so we use gen_rtx_REG
> >       rather than simplify_gen_subreg.  */
> > -  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> > +  rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0));
> > +  rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0));

While this was not introduced by this patch, I'm a little bit concerned
about the hard-coded use of REGNO here, which will crash if op0 at this
point happens to be a SUBREG instead of a REG.  This is extremely unlikely
at this point in reload, but not 100% impossible, e.g. if op0 for some
reason is one of the "magic" registers like the stack pointer.

That's why it is in general better to use simplify_gen_subreg or one of
gen_highpart/gen_lowpart, which will handle SUBREG correctly as well.
I'm not sure why it matters whether this is "conceptually a separate
value" as the comment argues ...

> >    /* Move SF value to upper 32-bits for xscvspdpn.  */
> >    emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> > -  emit_move_insn (op0_di, op2);
> > -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
> > +  emit_insn (gen_p8_mtvsrd (op0_df, op2));
> > +  emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf));

The sequence of modes used for op0 here is a bit weird.  First,
op0 is loaded as DFmode by mtvsrd, then it is silently re-
interpreted as V4SFmode when used as input to xscvspdpn, which
gets a DFmode output that is again silently re-interpreted as
SFmode.

This isn't really wrong as such, just maybe a bit confusing.
Maybe instead have p8_mtvsrd use DImode as output (instead of
DFmode), which shouldn't be any harder to use in the
reload_vsx_from_gpr<mode> splitter, and keep using the
vsx_xscvspdpn_directmove pattern?

[ This of course reinforces the question why we have p8_mtvsrd
in the first place, instead of just allowing this use directly
in movdi_internal64 itself.  ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Reply via email to