The following code does generate wrong results when optimization is turned on :
#include <stdio.h> #include <xmmintrin.h> void pv(const char *s, __m128 v) { float *p = (float*)&v; printf("%s=[%g %g %g %g]\n", s,p[0],p[1],p[2],p[3]); } #define P(x) pv(#x,x) static void plop(__m128 *Y) { __m128 zero = _mm_setzero_ps(); __m128 foo = _mm_movehl_ps(zero, *Y); __m128 bar = _mm_movehl_ps(*Y, zero); P(*Y);P(foo);P(bar); } int main() { __m128 y=_mm_set_ps(-3,2,1,9); plop(&y); return 0; } Here are some outputs: > gcc-3.4 -O3 -Wall -W -msse -o toto toto.c && ./toto *Y=[9 1 2 -3] foo=[0 0 9 1] bar=[9 1 0 0] > gcc-4.0 -g -O0 -Wall -W -msse -o toto toto.c && ./toto *Y=[9 1 2 -3] foo=[2 -3 0 0] bar=[0 0 2 -3] (this one is correct) > gcc-4.0 -O3 -Wall -W -msse -o toto toto.c && ./toto *Y=[9 1 2 -3] foo=[9 1 0 0] bar=[0 0 2 -3] (same output with gcc-4.1 from cvs) Tested with: gcc-3.4 (GCC) 3.4.4 20050314 (prerelease) (Debian 3.4.3-12) gcc-4.0 (GCC) 4.0.0 20050410 (prerelease) (Debian 4.0-0pre10) gcc (GCC) 4.0.0 20050418 (prerelease) gcc (GCC) 4.1.0 20050421 (experimental) -- Summary: invalid code generation for _mm_movehl_ps SSE intrisinc Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: critical Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: julien dot pommier at insa-toulouse dot fr CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21149