Hi Ke Wen, On Mon, Aug 09, 2021 at 10:53:00AM +0800, Kewen.Lin wrote: > on 2021/8/6 下午9:10, Bill Schmidt wrote: > > On 8/4/21 9:06 PM, Kewen.Lin wrote: > >> The existing vec_unpacku_{hi,lo} supports emulated unsigned > >> unpacking for short and char but misses the support for int. > >> This patch adds the support for vec_unpacku_{hi,lo}_v4si.
> * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove. > (vec_unpacku_hi_v8hi): Likewise. > (vec_unpacku_lo_v16qi): Likewise. > (vec_unpacku_lo_v8hi): Likewise. > (vec_unpacku_hi_<VP_small_lc>): New define_expand. > (vec_unpacku_lo_<VP_small_lc>): Likewise. > -(define_expand "vec_unpacku_hi_v16qi" > - [(set (match_operand:V8HI 0 "register_operand" "=v") > - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")] > - UNSPEC_VUPKHUB))] > - "TARGET_ALTIVEC" > -{ > - rtx vzero = gen_reg_rtx (V8HImode); > - rtx mask = gen_reg_rtx (V16QImode); > - rtvec v = rtvec_alloc (16); > - bool be = BYTES_BIG_ENDIAN; > - > - emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); > - > - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); > - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); > - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); > - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); > - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); > - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); > - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); > - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); > - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); > - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); > - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); > - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); > - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); > - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); > - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); > - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); > - > - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); > - emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); > - DONE; > -}) So I wonder if all this still generates good code. The unspecs cannot be optimised properly, the RTL can (in principle, anyway: it is possible it makes more opportunities to use unpack etc. insns invisible than that it helps over unspec. This needs to be tested, and the usual idioms need testcases, is that what you add here? (/me reads on...) > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1)); > + else > + emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero)); Ah, so it is *not* using unspecs? Excellent. Okay for trunk. Thank you! Segher