https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #16 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #15)
> (In reply to Hongtao.liu from comment #14)
> > (In reply to Uroš Bizjak from comment #13)
> > > (In reply to Hongtao.liu from comment #12)
> > > > > 
> > > > > Just noticed that for some reason two VPXORs are emitted. One should 
> > > > > be
> > > > > enough for both VPINSRW insns.
> > > > 
> > > > With new alternative in your attached match(vpblenw one), RA could reuse
> > > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the 
> > > > upper
> > > > bits.
> > > > vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  
> > > > *vec_setv8hf_0/8
> > > 
> > > True, but I'd expect some post-reload(?) pass to propagate zeros and 
> > > remove
> > > redundant initializations.
> > 
> > On the other hand, if not use expand_vector_set (which treats zero register
> > as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
> > pseudo register as dest. the redudant initialization could be optimized off
> > by fwprop1.
> > 
> >         pextrw  $0, %xmm1, %eax
> >         pextrw  $0, %xmm0, %edx
> >         vpxor   %xmm1, %xmm1, %xmm1
> >         vpinsrw $0, %edx, %xmm1, %xmm0
> >         vpinsrw $0, %eax, %xmm1, %xmm1
> >         vcvtph2ps       %xmm1, %xmm1
> >         vcvtph2ps       %xmm0, %xmm0
> >         vaddss  %xmm1, %xmm0, %xmm0
> >         vinsertps       $0xe, %xmm0, %xmm0, %xmm0
> >         vcvtps2ph       $4, %xmm0, %xmm0
> 
> Then we will lose optimization in expand vector set:
> 
>     case E_V8HFmode:
>       if (TARGET_AVX2)
>       {
>         mmode = SImode;
>         gen_blendm = gen_sse4_1_pblendph;
>         blendm_const = true;
>       }
>       else
>       use_vec_merge = true;
>       break;
> 
> Maybe we should simply copy "target" to a new pseudo here:
> 
> do_vec_merge:
>       tmp = gen_rtx_VEC_DUPLICATE (mode, val);
>       tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
>                              GEN_INT (HOST_WIDE_INT_1U << elt));
>       emit_insn (gen_rtx_SET (target, tmp));
> 
> OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target"
> to a new pseudo at the beginning of the expand_vector_set?

ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
target as both input and output, it seems we can't create a new target for
that.

Reply via email to