Richard Henderson wrote:

> The generic support for vector permutation will allow for automatic
> lowering to V*QImode, so all we need to add to support for these targets
> is the single V16QI pattern that represents the base permutation insn.
> 
> I'm not touching any of the other ways that the permutation insn 
> could be generated.  After the generic support is added, I'll leave
> it to the port maintainers to determine what they want to keep.  I
> suspect in many cases using the generic __builtin_shuffle plus some
> casting in the target-specific header files would be sufficient,
> eliminating several dozen builtins.


Sorry I didn't get to this earlier, I got side-tracked by a number
of independent regressions on SPU ...

Unfortunately, the semantics of vec_perm do not match 100% those of the
SPU Shuffle Bytes instruction.  vec_perm assumes the selector elements
apply modulo 32, but shufb uses values >= 128 for special purposes.
See the ISA:

  Value in Register RC
  (Expressed in Binary)  Result Byte

  10xxxxxx               0x00
  110xxxxx               0xFF
  111xxxxx               0x80
  Otherwise              The byte of the concatenated register addressed by
                         the rightmost 5 bits of register RC


To implement the vec_perm semantics fully, we therefore need to reduce the
selector modulo 32 explicitly before using shuf.

Tested on spu-elf, fixes various vshuf test cases.
Committed to mainline.

Bye,
Ulrich


ChangeLog:

        * config/spu/spu.md ("vec_permv16qi"): Reduce selector modulo 32
        before using the shufb instruction.

Index: gcc/config/spu/spu.md
===================================================================
*** gcc/config/spu/spu.md       (revision 180240)
--- gcc/config/spu/spu.md       (working copy)
*************** selb\t%0,%4,%0,%3"
*** 4395,4410 ****
    "shufb\t%0,%1,%2,%3"
    [(set_attr "type" "shuf")])
  
  (define_expand "vec_permv16qi"
!   [(set (match_operand:V16QI 0 "spu_reg_operand" "")
        (unspec:V16QI
          [(match_operand:V16QI 1 "spu_reg_operand" "")
           (match_operand:V16QI 2 "spu_reg_operand" "")
!          (match_operand:V16QI 3 "spu_reg_operand" "")]
          UNSPEC_SHUFB))]
    ""
    {
!     operands[3] = gen_lowpart (TImode, operands[3]);
    })
  
  (define_insn "nop"
--- 4395,4416 ----
    "shufb\t%0,%1,%2,%3"
    [(set_attr "type" "shuf")])
  
+ ; The semantics of vec_permv16qi are nearly identical to those of the SPU
+ ; shufb instruction, except that we need to reduce the selector modulo 32.
  (define_expand "vec_permv16qi"
!   [(set (match_dup 4) (and:V16QI (match_operand:V16QI 3 "spu_reg_operand" "")
!                                  (match_dup 6)))
!    (set (match_operand:V16QI 0 "spu_reg_operand" "")
        (unspec:V16QI
          [(match_operand:V16QI 1 "spu_reg_operand" "")
           (match_operand:V16QI 2 "spu_reg_operand" "")
!          (match_dup 5)]
          UNSPEC_SHUFB))]
    ""
    {
!     operands[4] = gen_reg_rtx (V16QImode);
!     operands[5] = gen_lowpart (TImode, operands[4]);
!     operands[6] = spu_const (V16QImode, 31);
    })
  
  (define_insn "nop"

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Reply via email to