[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c

ubizjak at gmail dot com via Gcc-bugs Fri, 26 Nov 2021 08:00:45 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811


--- Comment #15 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #14)
> (In reply to Uroš Bizjak from comment #13)
> > (In reply to Hongtao.liu from comment #12)
> > > > 
> > > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > > enough for both VPINSRW insns.
> > > 
> > > With new alternative in your attached match(vpblenw one), RA could reuse
> > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the 
> > > upper
> > > bits.
> > > vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  
> > > *vec_setv8hf_0/8
> > 
> > True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> > redundant initializations.
> 
> On the other hand, if not use expand_vector_set (which treats zero register
> as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
> pseudo register as dest. the redudant initialization could be optimized off
> by fwprop1.
> 
>         pextrw  $0, %xmm1, %eax
>         pextrw  $0, %xmm0, %edx
>         vpxor   %xmm1, %xmm1, %xmm1
>         vpinsrw $0, %edx, %xmm1, %xmm0
>         vpinsrw $0, %eax, %xmm1, %xmm1
>         vcvtph2ps       %xmm1, %xmm1
>         vcvtph2ps       %xmm0, %xmm0
>         vaddss  %xmm1, %xmm0, %xmm0
>         vinsertps       $0xe, %xmm0, %xmm0, %xmm0
>         vcvtps2ph       $4, %xmm0, %xmm0

Then we will lose optimization in expand vector set:

    case E_V8HFmode:
      if (TARGET_AVX2)
        {
          mmode = SImode;
          gen_blendm = gen_sse4_1_pblendph;
          blendm_const = true;
        }
      else
        use_vec_merge = true;
      break;

Maybe we should simply copy "target" to a new pseudo here:

do_vec_merge:
      tmp = gen_rtx_VEC_DUPLICATE (mode, val);
      tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
                               GEN_INT (HOST_WIDE_INT_1U << elt));
      emit_insn (gen_rtx_SET (target, tmp));

OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target" to
a new pseudo at the beginning of the expand_vector_set?

[Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c

Reply via email to