https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762

--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #17)
> > compiles to:
> > 
> >         movq    %xmm1, %xmm1    # 8     [c=4 l=4]  *vec_concatv4sf_0
> >         movq    %xmm0, %xmm0    # 9     [c=4 l=4]  *vec_concatv4sf_0
> >         movq    %xmm2, %xmm2    # 12    [c=4 l=4]  *vec_concatv4sf_0
> >         mulps   %xmm1, %xmm0    # 10    [c=16 l=3]  *mulv4sf3/0
> >         movq    %xmm0, %xmm0    # 13    [c=4 l=4]  *vec_concatv4sf_0
> 
> so this one is obviously redundant - I suppose at the RTL level we have
> no chance of noticing this.  I hope for integer vector operations we
> avoid these ops?  I think this will make epilog vectorization with V2SFmode
> a bad idea, we'd need to appropriately disqualify this in the costing
> hooks.

Yes, the redundant movq is emitted only in front of V2SFmode trapping
operations. So, all integer, V2SF logic and swizzling operations are still
implemented directly with "emulated" instructions.
> 
> I wonder if combine could for example combine a v2sf load with the
> upper half zeroing for the next use?  Likewise for arithmetics.

The patch already does that. We know that V2SF load zeroes the upper half, so
there is no additional MOVQ emitted. To illustrate, the testcase:

--cut here--
typedef float __attribute__((vector_size(8))) v2sf;

v2sf m;

v2sf test (v2sf a)
{
  return a - m;
}
--cut here--

compiles to:

        movq    m(%rip), %xmm1  # 6     [c=4 l=8]  *vec_concatv4sf_0
        movq    %xmm0, %xmm0    # 7     [c=4 l=4]  *vec_concatv4sf_0
        subps   %xmm1, %xmm0    # 8     [c=12 l=3]  *subv4sf3/0

As far as arithmetic is concerned, perhaps some back-walking RTL optimization
pass can figure out that the preceding trapping V2SFmode operation guarantees
zeros in the upper half and remove clearing insn. However, MOVQ xmm,xmm is an
extremely fast instruction with latency of 1 and reciprocal throughput of 0.33,
so I guess it is not of much concern.

Reply via email to