https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Uroš Bizjak from comment #13) > (In reply to Hongtao.liu from comment #12) > > > > > > Just noticed that for some reason two VPXORs are emitted. One should be > > > enough for both VPINSRW insns. > > > > With new alternative in your attached match(vpblenw one), RA could reuse > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper > > bits. > > vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] *vec_setv8hf_0/8 > > True, but I'd expect some post-reload(?) pass to propagate zeros and remove > redundant initializations. On the other hand, if not use expand_vector_set (which treats zero register as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new pseudo register as dest. the redudant initialization could be optimized off by fwprop1. pextrw $0, %xmm1, %eax pextrw $0, %xmm0, %edx vpxor %xmm1, %xmm1, %xmm1 vpinsrw $0, %edx, %xmm1, %xmm0 vpinsrw $0, %eax, %xmm1, %xmm1 vcvtph2ps %xmm1, %xmm1 vcvtph2ps %xmm0, %xmm0 vaddss %xmm1, %xmm0, %xmm0 vinsertps $0xe, %xmm0, %xmm0, %xmm0 vcvtps2ph $4, %xmm0, %xmm0