David Edelsohn wrote: > On Thu, Feb 11, 2016 at 6:04 AM, Alan Modra <amo...@gmail.com> wrote: > > This is PR68973 part 2, the failure of a boost test, caused by a > > splitter emitting an invalid move in reload_vsx_from_gprsf: > > emit_move_insn (op0_di, op2); > > > > op0 can be any vsx reg, but the mtvsrd destination constraint in > > movdi_internal64 is "wj", which only allows fp regs. I'm not sure why > > we have that constraint so left movdi_internal64 alone and used a > > special p8_mtvsrd instead, similar to p8_mtvsrd_1 and p8_mtvsrd_2 used > > by reload_vsx_from_gpr<mode>. When looking at those, I noticed we're > > restricted to fprs there too so fixed that as well. (We can't use %L > > in asm operands that must accept vsx.) > > Michael, please review the "wj" constraint in these patterns. > > Alan, the explanation says that it uses a special p8_mtvsrd similar to > p8_mtvsrd_[12], but does not explain why the two similar patterns are > removed. The incorrect use of %L implies those patterns, but the > change is to p8_xxpermdi_<mode> that is not mentioned in the > ChangeLog. > > I also would appreciate Uli's comments on this direction because of > his reload expertise.
For the most part, this patch doesn't really change anything in the interaction with reload as far as I can see. The changes introduced by the patch do make sense to me. In particular, replacing the two patterns p8_mtvsrd_1 and p8_mtvsrd_2 used to fill high and low parts of a TFmode register pair with a single pattern p8_mtvsrd that just works on any DFmode register (leaving the split into high/low to the caller if necessary) seems to simplify things. > > + /* You might think that we could use op0 as one temp and a DF clobber > > + as the other, but you'd be wrong. These secondary_reload patterns > > + don't check that the clobber is different to the destination, which > > + is probably a reload bug. */ It's not a bug, it's deliberate behavior. The reload registers allocated for secondary reload clobbers may overlap the destination, since in many cases you simply move the input to the reload register, and then the reload register to the destination (where the latter move can be a no-op if it is possible to allocate the reload register and the destination into the same physical register). If you need two separate intermediate values, you need to allocate a secondary reload register of a larger mode (as is already done in the pattern). > > /* Also use the destination register to hold the unconverted DImode > > value. > > This is conceptually a separate value from OP0, so we use gen_rtx_REG > > rather than simplify_gen_subreg. */ > > - rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); > > + rtx op0_df = gen_rtx_REG (DFmode, REGNO (op0)); > > + rtx op0_v4sf = gen_rtx_REG (V4SFmode, REGNO (op0)); While this was not introduced by this patch, I'm a little bit concerned about the hard-coded use of REGNO here, which will crash if op0 at this point happens to be a SUBREG instead of a REG. This is extremely unlikely at this point in reload, but not 100% impossible, e.g. if op0 for some reason is one of the "magic" registers like the stack pointer. That's why it is in general better to use simplify_gen_subreg or one of gen_highpart/gen_lowpart, which will handle SUBREG correctly as well. I'm not sure why it matters whether this is "conceptually a separate value" as the comment argues ... > > /* Move SF value to upper 32-bits for xscvspdpn. */ > > emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); > > - emit_move_insn (op0_di, op2); > > - emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di)); > > + emit_insn (gen_p8_mtvsrd (op0_df, op2)); > > + emit_insn (gen_vsx_xscvspdpn (op0_df, op0_v4sf)); The sequence of modes used for op0 here is a bit weird. First, op0 is loaded as DFmode by mtvsrd, then it is silently re- interpreted as V4SFmode when used as input to xscvspdpn, which gets a DFmode output that is again silently re-interpreted as SFmode. This isn't really wrong as such, just maybe a bit confusing. Maybe instead have p8_mtvsrd use DImode as output (instead of DFmode), which shouldn't be any harder to use in the reload_vsx_from_gpr<mode> splitter, and keep using the vsx_xscvspdpn_directmove pattern? [ This of course reinforces the question why we have p8_mtvsrd in the first place, instead of just allowing this use directly in movdi_internal64 itself. ] Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com