https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68211
Bug ID: 68211 Summary: Free __m128d subreg of double Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* Hello, we still seem to be missing some way of passing a double to intrinsics that take a __m128d argument (see below for an example) without any overhead when we do not care about the high part. __m128d to __m256d has an intrinsic, although its implementation is not optimal (see PR 50829). But Intel apparently "forgot" to add a similar one for double to __m128d. Say I want to use the new AVX512 _mm_sqrt_round_sd to compute the square root of a double rounded towards +infinity. Using -mavx512f, I can try: #include <x86intrin.h> double sqrt_up(double x){ __m128d y = { x, 0 }; return _mm_cvtsd_f64(_mm_sqrt_round_sd(y, y, _MM_FROUND_TO_POS_INF|_MM_FROUND_NO_EXC)); } which generates vmovsd %xmm0, -16(%rsp) vmovsd -16(%rsp), %xmm0 vsqrtsd {ru-sae}, %xmm0, %xmm0, %xmm0 I get the exact same code with double d = d; __m128d y = { x, d }; or __m128d y = y; y[0] = x; I can shorten it to vmovddup %xmm0, %xmm0 vsqrtsd {ru-sae}, %xmm0, %xmm0, %xmm0 using __m128d y = { x, x }; I am forced to use inline asm __m128d y; asm("":"=x"(y):"0"(x)); to get what I wanted, i.e. only vsqrtsd without any extra instruction. But that makes the code non-portable, and I might as well write the vsqrtsd instruction myself in the asm. It probably also has similar drawbacks to the unspec in PR 50829.