Hi Ke Wen,

On Mon, Aug 09, 2021 at 10:53:00AM +0800, Kewen.Lin wrote:
> on 2021/8/6 下午9:10, Bill Schmidt wrote:
> > On 8/4/21 9:06 PM, Kewen.Lin wrote:
> >> The existing vec_unpacku_{hi,lo} supports emulated unsigned
> >> unpacking for short and char but misses the support for int.
> >> This patch adds the support for vec_unpacku_{hi,lo}_v4si.

>       * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
>       (vec_unpacku_hi_v8hi): Likewise.
>       (vec_unpacku_lo_v16qi): Likewise.
>       (vec_unpacku_lo_v8hi): Likewise.
>       (vec_unpacku_hi_<VP_small_lc>): New define_expand.
>       (vec_unpacku_lo_<VP_small_lc>): Likewise.

> -(define_expand "vec_unpacku_hi_v16qi"
> -  [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
> -                     UNSPEC_VUPKHUB))]
> -  "TARGET_ALTIVEC"      
> -{  
> -  rtx vzero = gen_reg_rtx (V8HImode);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -  rtvec v = rtvec_alloc (16);
> -  bool be = BYTES_BIG_ENDIAN;
> -   
> -  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
> -   
> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> -  DONE;
> -})

So I wonder if all this still generates good code.  The unspecs cannot
be optimised properly, the RTL can (in principle, anyway: it is possible
it makes more opportunities to use unpack etc. insns invisible than that
it helps over unspec.  This needs to be tested, and the usual idioms
need testcases, is that what you add here?  (/me reads on...)

> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
> +  else
> +    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));

Ah, so it is *not* using unspecs?  Excellent.

Okay for trunk.  Thank you!


Segher

Reply via email to