Re: [RS6000] reload_vsx_from_gprsf splitter

David Edelsohn Thu, 11 Feb 2016 16:21:53 -0800

On Thu, Feb 11, 2016 at 10:38 AM, Ulrich Weigand <uweig...@de.ibm.com> wrote:
> David Edelsohn wrote:
>> On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra <amo...@gmail.com> wrote:
>> > This is PR68973 part 2, the failure of a boost test, caused by a
>> > splitter emitting an invalid move in reload_vsx_from_gprsf:
>> >   emit_move_insn (op0_di, op2);
>> >
>> > op0 can be any vsx reg, but the mtvsrd destination constraint in
>> > movdi_internal64 is "wj", which only allows fp regs.  I'm not sure why
>> > we have that constraint so left movdi_internal64 alone and used a
>> > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used
>> > by reload_vsx_from_gpr<mode>.  When looking at those, I noticed we're
>> > restricted to fprs there too so fixed that as well.  (We can't use %L
>> > in asm operands that must accept vsx.)
>>
>> Michael, please review the "wj" constraint in these patterns.
>>
>> Alan, the explanation says that it uses a special p8_mtvsrd similar to
>> p8_mtvsrd_[12], but does not explain why the two similar patterns are
>> removed.  The incorrect use of %L implies those patterns, but the
>> change is to p8_xxpermdi_<mode> that is not mentioned in the
>> ChangeLog.
>>
>> I also would appreciate Uli's comments on this direction because of
>> his reload expertise.
>
> For the most part, this patch doesn't really change anything in the
> interaction with reload as far as I can see.  The changes introduced
> by the patch do make sense to me.  In particular, replacing the two
> patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts
> of a TFmode register pair with a single pattern p8_mtvsrd that just
> works on any DFmode register (leaving the split into high/low to the
> caller if necessary) seems to simplify things.
>
>> > +  /* You might think that we could use op0 as one temp and a DF clobber
>> > +     as the other, but you'd be wrong.  These secondary_reload patterns
>> > +     don't check that the clobber is different to the destination, which
>> > +     is probably a reload bug.  */
>
> It's not a bug, it's deliberate behavior.  The reload registers allocated
> for secondary reload clobbers may overlap the destination, since in many
> cases you simply move the input to the reload register, and then the
> reload register to the destination (where the latter move can be a no-op
> if it is possible to allocate the reload register and the destination
> into the same physical register).  If you need two separate intermediate
> values, you need to allocate a secondary reload register of a larger
> mode (as is already done in the pattern).
>
>> >    /* Also use the destination register to hold the unconverted DImode 
>> > value.
>> >       This is conceptually a separate value from OP0, so we use gen_rtx_REG
>> >       rather than simplify_gen_subreg.  */
>> > -  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
>> > +  rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0));
>> > +  rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0));
>
> While this was not introduced by this patch, I'm a little bit concerned
> about the hard-coded use of REGNO here, which will crash if op0 at this
> point happens to be a SUBREG instead of a REG.  This is extremely unlikely
> at this point in reload, but not 100% impossible, e.g. if op0 for some
> reason is one of the "magic" registers like the stack pointer.
>
> That's why it is in general better to use simplify_gen_subreg or one of
> gen_highpart/gen_lowpart, which will handle SUBREG correctly as well.
> I'm not sure why it matters whether this is "conceptually a separate
> value" as the comment argues ...
>
>> >    /* Move SF value to upper 32-bits for xscvspdpn.  */
>> >    emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> > -  emit_move_insn (op0_di, op2);
>> > -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
>> > +  emit_insn (gen_p8_mtvsrd (op0_df, op2));
>> > +  emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf));
>
> The sequence of modes used for op0 here is a bit weird.  First,
> op0 is loaded as DFmode by mtvsrd, then it is silently re-
> interpreted as V4SFmode when used as input to xscvspdpn, which
> gets a DFmode output that is again silently re-interpreted as
> SFmode.
>
> This isn't really wrong as such, just maybe a bit confusing.
> Maybe instead have p8_mtvsrd use DImode as output (instead of
> DFmode), which shouldn't be any harder to use in the
> reload_vsx_from_gpr<mode> splitter, and keep using the
> vsx_xscvspdpn_directmove pattern?
>
> [ This of course reinforces the question why we have p8_mtvsrd
> in the first place, instead of just allowing this use directly
> in movdi_internal64 itself.  ]


Good question: is p8_mtvsrd really necessary if the movdi_internal64
is updated to use the correct constraints?

The patch definitely is going in the right direction.  Can we remove
more unnecessary bits?

Thanks, David

Re: [RS6000] reload_vsx_from_gprsf splitter

Reply via email to