On Thu, Feb 11, 2016 at 10:38 AM, Ulrich Weigand <uweig...@de.ibm.com> wrote: > David Edelsohn wrote: >> On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra <amo...@gmail.com> wrote: >> > This is PR68973 part 2, the failure of a boost test, caused by a >> > splitter emitting an invalid move in reload_vsx_from_gprsf: >> > emit_move_insn (op0_di, op2); >> > >> > op0 can be any vsx reg, but the mtvsrd destination constraint in >> > movdi_internal64 is "wj", which only allows fp regs. I'm not sure why >> > we have that constraint so left movdi_internal64 alone and used a >> > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used >> > by reload_vsx_from_gpr<mode>. When looking at those, I noticed we're >> > restricted to fprs there too so fixed that as well. (We can't use %L >> > in asm operands that must accept vsx.) >> >> Michael, please review the "wj" constraint in these patterns. >> >> Alan, the explanation says that it uses a special p8_mtvsrd similar to >> p8_mtvsrd_[12], but does not explain why the two similar patterns are >> removed. The incorrect use of %L implies those patterns, but the >> change is to p8_xxpermdi_<mode> that is not mentioned in the >> ChangeLog. >> >> I also would appreciate Uli's comments on this direction because of >> his reload expertise. > > For the most part, this patch doesn't really change anything in the > interaction with reload as far as I can see. The changes introduced > by the patch do make sense to me. In particular, replacing the two > patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts > of a TFmode register pair with a single pattern p8_mtvsrd that just > works on any DFmode register (leaving the split into high/low to the > caller if necessary) seems to simplify things. > >> > + /* You might think that we could use op0 as one temp and a DF clobber >> > + as the other, but you'd be wrong. These secondary_reload patterns >> > + don't check that the clobber is different to the destination, which >> > + is probably a reload bug. */ > > It's not a bug, it's deliberate behavior. The reload registers allocated > for secondary reload clobbers may overlap the destination, since in many > cases you simply move the input to the reload register, and then the > reload register to the destination (where the latter move can be a no-op > if it is possible to allocate the reload register and the destination > into the same physical register). If you need two separate intermediate > values, you need to allocate a secondary reload register of a larger > mode (as is already done in the pattern). > >> > /* Also use the destination register to hold the unconverted DImode >> > value. >> > This is conceptually a separate value from OP0, so we use gen_rtx_REG >> > rather than simplify_gen_subreg. */ >> > - rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); >> > + rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0)); >> > + rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0)); > > While this was not introduced by this patch, I'm a little bit concerned > about the hard-coded use of REGNO here, which will crash if op0 at this > point happens to be a SUBREG instead of a REG. This is extremely unlikely > at this point in reload, but not 100% impossible, e.g. if op0 for some > reason is one of the "magic" registers like the stack pointer. > > That's why it is in general better to use simplify_gen_subreg or one of > gen_highpart/gen_lowpart, which will handle SUBREG correctly as well. > I'm not sure why it matters whether this is "conceptually a separate > value" as the comment argues ... > >> > /* Move SF value to upper 32-bits for xscvspdpn. */ >> > emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); >> > - emit_move_insn (op0_di, op2); >> > - emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di)); >> > + emit_insn (gen_p8_mtvsrd (op0_df, op2)); >> > + emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf)); > > The sequence of modes used for op0 here is a bit weird. First, > op0 is loaded as DFmode by mtvsrd, then it is silently re- > interpreted as V4SFmode when used as input to xscvspdpn, which > gets a DFmode output that is again silently re-interpreted as > SFmode. > > This isn't really wrong as such, just maybe a bit confusing. > Maybe instead have p8_mtvsrd use DImode as output (instead of > DFmode), which shouldn't be any harder to use in the > reload_vsx_from_gpr<mode> splitter, and keep using the > vsx_xscvspdpn_directmove pattern? > > [ This of course reinforces the question why we have p8_mtvsrd > in the first place, instead of just allowing this use directly > in movdi_internal64 itself. ]
Good question: is p8_mtvsrd really necessary if the movdi_internal64 is updated to use the correct constraints? The patch definitely is going in the right direction. Can we remove more unnecessary bits? Thanks, David