https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #16 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Uroš Bizjak from comment #15) > (In reply to Hongtao.liu from comment #14) > > (In reply to Uroš Bizjak from comment #13) > > > (In reply to Hongtao.liu from comment #12) > > > > > > > > > > Just noticed that for some reason two VPXORs are emitted. One should > > > > > be > > > > > enough for both VPINSRW insns. > > > > > > > > With new alternative in your attached match(vpblenw one), RA could reuse > > > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the > > > > upper > > > > bits. > > > > vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] > > > > *vec_setv8hf_0/8 > > > > > > True, but I'd expect some post-reload(?) pass to propagate zeros and > > > remove > > > redundant initializations. > > > > On the other hand, if not use expand_vector_set (which treats zero register > > as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new > > pseudo register as dest. the redudant initialization could be optimized off > > by fwprop1. > > > > pextrw $0, %xmm1, %eax > > pextrw $0, %xmm0, %edx > > vpxor %xmm1, %xmm1, %xmm1 > > vpinsrw $0, %edx, %xmm1, %xmm0 > > vpinsrw $0, %eax, %xmm1, %xmm1 > > vcvtph2ps %xmm1, %xmm1 > > vcvtph2ps %xmm0, %xmm0 > > vaddss %xmm1, %xmm0, %xmm0 > > vinsertps $0xe, %xmm0, %xmm0, %xmm0 > > vcvtps2ph $4, %xmm0, %xmm0 > > Then we will lose optimization in expand vector set: > > case E_V8HFmode: > if (TARGET_AVX2) > { > mmode = SImode; > gen_blendm = gen_sse4_1_pblendph; > blendm_const = true; > } > else > use_vec_merge = true; > break; > > Maybe we should simply copy "target" to a new pseudo here: > > do_vec_merge: > tmp = gen_rtx_VEC_DUPLICATE (mode, val); > tmp = gen_rtx_VEC_MERGE (mode, tmp, target, > GEN_INT (HOST_WIDE_INT_1U << elt)); > emit_insn (gen_rtx_SET (target, tmp)); > > OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target" > to a new pseudo at the beginning of the expand_vector_set? ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes target as both input and output, it seems we can't create a new target for that.