https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---

(In reply to Uroš Bizjak from comment #13)
> (In reply to Hongtao.liu from comment #12)
> > > 
> > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > enough for both VPINSRW insns.
> > 
> > With new alternative in your attached match(vpblenw one), RA could reuse
> > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > bits.
> > vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  *vec_setv8hf_0/8
> 
> True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> redundant initializations.

On the other hand, if not use expand_vector_set (which treats zero register as
both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
pseudo register as dest. the redudant initialization could be optimized off by
fwprop1.

        pextrw  $0, %xmm1, %eax
        pextrw  $0, %xmm0, %edx
        vpxor   %xmm1, %xmm1, %xmm1
        vpinsrw $0, %edx, %xmm1, %xmm0
        vpinsrw $0, %eax, %xmm1, %xmm1
        vcvtph2ps       %xmm1, %xmm1
        vcvtph2ps       %xmm0, %xmm0
        vaddss  %xmm1, %xmm0, %xmm0
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0
        vcvtps2ph       $4, %xmm0, %xmm0

Reply via email to