Richard Henderson wrote: > The generic support for vector permutation will allow for automatic > lowering to V*QImode, so all we need to add to support for these targets > is the single V16QI pattern that represents the base permutation insn. > > I'm not touching any of the other ways that the permutation insn > could be generated. After the generic support is added, I'll leave > it to the port maintainers to determine what they want to keep. I > suspect in many cases using the generic __builtin_shuffle plus some > casting in the target-specific header files would be sufficient, > eliminating several dozen builtins.
Sorry I didn't get to this earlier, I got side-tracked by a number of independent regressions on SPU ... Unfortunately, the semantics of vec_perm do not match 100% those of the SPU Shuffle Bytes instruction. vec_perm assumes the selector elements apply modulo 32, but shufb uses values >= 128 for special purposes. See the ISA: Value in Register RC (Expressed in Binary) Result Byte 10xxxxxx 0x00 110xxxxx 0xFF 111xxxxx 0x80 Otherwise The byte of the concatenated register addressed by the rightmost 5 bits of register RC To implement the vec_perm semantics fully, we therefore need to reduce the selector modulo 32 explicitly before using shuf. Tested on spu-elf, fixes various vshuf test cases. Committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.md ("vec_permv16qi"): Reduce selector modulo 32 before using the shufb instruction. Index: gcc/config/spu/spu.md =================================================================== *** gcc/config/spu/spu.md (revision 180240) --- gcc/config/spu/spu.md (working copy) *************** selb\t%0,%4,%0,%3" *** 4395,4410 **** "shufb\t%0,%1,%2,%3" [(set_attr "type" "shuf")]) (define_expand "vec_permv16qi" ! [(set (match_operand:V16QI 0 "spu_reg_operand" "") (unspec:V16QI [(match_operand:V16QI 1 "spu_reg_operand" "") (match_operand:V16QI 2 "spu_reg_operand" "") ! (match_operand:V16QI 3 "spu_reg_operand" "")] UNSPEC_SHUFB))] "" { ! operands[3] = gen_lowpart (TImode, operands[3]); }) (define_insn "nop" --- 4395,4416 ---- "shufb\t%0,%1,%2,%3" [(set_attr "type" "shuf")]) + ; The semantics of vec_permv16qi are nearly identical to those of the SPU + ; shufb instruction, except that we need to reduce the selector modulo 32. (define_expand "vec_permv16qi" ! [(set (match_dup 4) (and:V16QI (match_operand:V16QI 3 "spu_reg_operand" "") ! (match_dup 6))) ! (set (match_operand:V16QI 0 "spu_reg_operand" "") (unspec:V16QI [(match_operand:V16QI 1 "spu_reg_operand" "") (match_operand:V16QI 2 "spu_reg_operand" "") ! (match_dup 5)] UNSPEC_SHUFB))] "" { ! operands[4] = gen_reg_rtx (V16QImode); ! operands[5] = gen_lowpart (TImode, operands[4]); ! operands[6] = spu_const (V16QImode, 31); }) (define_insn "nop" -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com