https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70708

            Bug ID: 70708
           Summary: Suboptimal code generated when using _mm_set_sd (X64)
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kobalicek.petr at gmail dot com
  Target Milestone: ---

The ABI already uses XMM registers for floating point operations. Compare the
following two snippets:

  double MyMinV1(double a, double b) {
    return a &lt b ? a : b;
  }

  double MyMinV2(double a, double b) {
    __m128d x = _mm_set_sd(a);
    __m128d y = _mm_set_sd(b);
    return _mm_cvtsd_f64(_mm_min_sd(x, y));
  }

And the code generated:

  MyMinV1(double, double):
    minsd   xmm0, xmm1
    ret

  MyMinV2(double, double):
    movsd   QWORD PTR [rsp-24], xmm0
    movsd   QWORD PTR [rsp-16], xmm1
    movsd   xmm0, QWORD PTR [rsp-24]
    movsd   xmm1, QWORD PTR [rsp-16]
    minsd   xmm0, xmm1
    ret

The problem is obvious, the _mm_set_sd() intrinsic really generates movsd even
if the content is already in the XMM register in the right place. I checked
also CLang and it generates an optimal code for both functions.

You can reproduce the test-case here:

 
https://gcc.godbolt.org/#compilers:!((compiler:g6,options:'-O2+-Wall+',source:'%23include+%3Cxmmintrin.h%3E%0A%0Adouble+MyMinV1(double+a,+double+b)+%7B%0A++return+a+%3C+b+%3F+a+:+b%3B%0A%7D%0A%0Adouble+MyMinV2(double+a,+double+b)+%7B%0A++__m128d+x+%3D+_mm_set_sd(a)%3B%0A++__m128d+y+%3D+_mm_set_sd(b)%3B%0A++return+_mm_cvtsd_f64(_mm_min_sd(x,+y))%3B%0A%7D%0A')),filterAsm:(commentOnly:!t,directives:!t,intel:!t,labels:!t),version:3

It looks like all GCC versions are affected.

Reply via email to