https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #20)
> (In reply to Uroš Bizjak from comment #18)
> > (In reply to Uroš Bizjak from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > > 
> > > > ix86_expand_vector_set is mainly used by vec_set_optab which exactly 
> > > > takes
> > > > target as both input and output, it seems we can't create a new target 
> > > > for
> > > > that.
> > > 
> > > OK, let's try to optimize it with gen_pinsr, as you proposed.
> > > 
> > > (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> > > this case, too.)
> > 
> > We should manually generate vinsertps from truncsfhf2, too. There is no
> > point to call ix86_expand_vector_set if we already know the instruction. It
> > will use vec_set<VI4F_128:mode>_0 insn pattern, which has quite some
> > alternatives.
> 
> For AVX2, your attached patch will optimize
> 
>         vpxor   %xmm2, %xmm2, %xmm2
> -       vpbroadcastw    %xmm1, %xmm1
> -       vpbroadcastw    %xmm0, %xmm0
>         vpblendw        $1, %xmm0, %xmm2, %xmm0
>         vpblendw        $1, %xmm1, %xmm2, %xmm2
>         vcvtph2ps       %xmm2, %xmm2
> 
> Since upper bits of xmm1/xmm0 is not selected by vpblendw.

True, the blending of only element 0 does not need broadcast. I will prepare a
formal patch submission once your changes are committed.

Reply via email to