https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68211

            Bug ID: 68211
           Summary: Free __m128d subreg of double
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

Hello,

we still seem to be missing some way of passing a double to intrinsics that
take a __m128d argument (see below for an example) without any overhead when we
do not care about the high part.

__m128d to __m256d has an intrinsic, although its implementation is not optimal
(see PR 50829). But Intel apparently "forgot" to add a similar one for double
to __m128d.

Say I want to use the new AVX512 _mm_sqrt_round_sd to compute the square root
of a double rounded towards +infinity. Using -mavx512f, I can try:

#include <x86intrin.h>

double sqrt_up(double x){
  __m128d y = { x, 0 };
  return _mm_cvtsd_f64(_mm_sqrt_round_sd(y, y,
_MM_FROUND_TO_POS_INF|_MM_FROUND_NO_EXC));
}

which generates

        vmovsd  %xmm0, -16(%rsp)
        vmovsd  -16(%rsp), %xmm0
        vsqrtsd {ru-sae}, %xmm0, %xmm0, %xmm0

I get the exact same code with

  double d = d;
  __m128d y = { x, d };

or

  __m128d y = y;
  y[0] = x;

I can shorten it to

        vmovddup        %xmm0, %xmm0
        vsqrtsd {ru-sae}, %xmm0, %xmm0, %xmm0

using

  __m128d y = { x, x };

I am forced to use inline asm

  __m128d y;
  asm("":"=x"(y):"0"(x));

to get what I wanted, i.e. only vsqrtsd without any extra instruction. But that
makes the code non-portable, and I might as well write the vsqrtsd instruction
myself in the asm. It probably also has similar drawbacks to the unspec in PR
50829.

Reply via email to