Hi Mike, On Wed, Nov 08, 2017 at 08:14:31AM -0500, Michael Meissner wrote: > PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the > GPRs, but it does have vector byte swap half-word, word, double-word > operations > in the VSX registers. The enclosed patch enables generation of the byte > revseral instructions for register-register operations. It still prefers to > generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse > (ST{H,W,D}BRX) instructions over the register sequence. > > For 16-bit and 32-bit byte swaps, it typically does the tradational operation > in GPR registers, but it will generate XXBR{H,W} if the values are in vector > registers.
You can do the byteswaps with just VMX, too, with some rotate instructions (vrlh 8 for HI, vrlh 8 followed by vrlw 16 for SI, etc.) For a future date, I suppose. > For 64-bit swaps, it no longer generates the 9 instruction sequence in favor > of > XXBRD. I did some timing runs on a prototype power9 system, and it was > slightly faster to do direct move to the vecter unit, XXBRD, and direct move > back to a GPR than the traditional sequence. > > I did bootstraps on little endian Power8 and Power9 systems (with the default > cpu set to power8 and power9 respectively). There were no regressions. Can I > check this patch into the trunk? > > [gcc] > 2017-11-08 Michael Meissner <meiss...@linux.vnet.ibm.com> > > * config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems, > enable generating XXBR{H,W} if the value is in a vector > register. Join the last two lines? And you only mean XXBRH here. > (bswapsi2_reg): Likewise. And XXBRW here. > (bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64 > instead of doing the GPR sequence used on previoius machines. Typo ("previous"). > (bswapdi2_xxbrd): Likewise. Not "Likewise"; just "New define_insn." Should this somehow be joined with p9_xxbrd_<mode>? Or maybe you should generate that, instead. > (bswapdi2_reg splitters): Use int_reg_operand instead of > gpc_reg_operand to not match when XXBRD is generated. Please see if you can merge the define_splits to their corresponding define_insns (making define_insn_and_split). Shorter, no mismatch possible between the two anymore, and you get a name for the patterns too :-) > @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2" > emit_insn (gen_bswapdi2_load (dest, src)); > else if (MEM_P (dest)) > emit_insn (gen_bswapdi2_store (dest, src)); > + else if (TARGET_P9_VECTOR) > + emit_insn (gen_bswapdi2_xxbrd (dest, src)); > else > emit_insn (gen_bswapdi2_reg (dest, src)); > DONE; Pity that you cannot easily do similar for HI and SI. > @@ -2544,7 +2559,8 @@ (define_insn "bswapdi2_reg" > (clobber (match_scratch:DI 3 "=&r"))] > "TARGET_POWERPC64 && TARGET_LDBRX" > "#" > - [(set_attr "length" "36")]) > + [(set_attr "length" "36") > + (set_attr "type" "*")]) Explicitly setting "type" (or any other attr) to the default value is pretty useless; just leave it out. Segher