https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Hongtao.liu from comment #20) > (In reply to Uroš Bizjak from comment #18) > > (In reply to Uroš Bizjak from comment #17) > > > (In reply to Hongtao.liu from comment #16) > > > > > > > ix86_expand_vector_set is mainly used by vec_set_optab which exactly > > > > takes > > > > target as both input and output, it seems we can't create a new target > > > > for > > > > that. > > > > > > OK, let's try to optimize it with gen_pinsr, as you proposed. > > > > > > (It looks that the add-on patch from Comment #6 will generate VPBLEND in > > > this case, too.) > > > > We should manually generate vinsertps from truncsfhf2, too. There is no > > point to call ix86_expand_vector_set if we already know the instruction. It > > will use vec_set<VI4F_128:mode>_0 insn pattern, which has quite some > > alternatives. > > For AVX2, your attached patch will optimize > > vpxor %xmm2, %xmm2, %xmm2 > - vpbroadcastw %xmm1, %xmm1 > - vpbroadcastw %xmm0, %xmm0 > vpblendw $1, %xmm0, %xmm2, %xmm0 > vpblendw $1, %xmm1, %xmm2, %xmm2 > vcvtph2ps %xmm2, %xmm2 > > Since upper bits of xmm1/xmm0 is not selected by vpblendw. True, the blending of only element 0 does not need broadcast. I will prepare a formal patch submission once your changes are committed.