[Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack

solar-gcc at openwall dot com Fri, 26 Feb 2016 16:00:06 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349


--- Comment #11 from Alexander Peslyak <solar-gcc at openwall dot com> ---
Turns out that gcc 4.6.x to 4.8.x generating "movd" instead of "movq" is
actually a deliberate hack, to support binutils older than 2.17 ("movq" support
committed in 2005, released in 2006) and (presumably) non-GNU assemblers:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43215

Also related, on "vmovd":

https://sourceware.org/ml/binutils/2008-05/msg00257.html

Per H.J. Lu, this is because of an error in AMD's spec for x86-64.

More detail on this cursed intrinsic: gcc got the _mm_cvtsi128_si64x() (with
'x') form before it got Intel's _mm_cvtsi128_si64() name (without 'x').  (When
using the inline asm workaround above, this does not matter as the macro brings
the without 'x' form to older gcc as well.)  Older MSVC and Open64 had bugs for
the intrinsic (without 'x'):

http://www.thesalmons.org/john/random123/releases/1.08/docs/sse_8h_source.html#l00108

This refers to https://bugs.open64.net/show_bug.cgi?id=873 for the Open64 bug,
and I had looked at it before, but unfortunately right now their bug tracker
refuses connections (for https; and gives 404 for that path with http).  I have
no detail on what the MSVC bug was.  Apparently, these could result in
incorrect computation at runtime (the comment at the URL above mentions failed
assertions).  Using _mm_extract_epi64(x, 0) is a workaround (SSE4.1+, sometimes
slower).

[Bug target/54349] _mm_cvtsi128_si64 unnecessary stores value at stack

Reply via email to