Consider the following C program using SSE intrinsics: //---------- #include <stdio.h> #include <xmmintrin.h>
int main(int argc, const char **argv) { __m128 v; v = _mm_setr_ps( 1.0f, 2.0f, 3.0f, 4.0f ); v = _mm_rsqrt_ss( v ); v = _mm_add_ss( v, _mm_movehl_ps( v, v )); v = _mm_add_ss( v, _mm_shuffle_ps( v, v, _MM_SHUFFLE( 0, 0, 0, 1 ))); printf( "%e %e %e %e\n", ((float *)&v)[0], ((float *)&v)[1], ((float *)&v)[2], ((float *)&v)[3] ); return 0; } //---------- Compiling and running this gives different results depending on whether -fregmove is specified or not. [EMAIL PROTECTED] regmove% gcc41 -Wall -O -fno-regmove -march=pentium4m -o test main.c [EMAIL PROTECTED] regmove% ./test 5.999756e+00 2.000000e+00 3.000000e+00 4.000000e+00 [EMAIL PROTECTED] regmove% gcc41 -Wall -O -fregmove -march=pentium4m -o test main.c [EMAIL PROTECTED] regmove% ./test 7.999756e+00 4.000000e+00 3.000000e+00 4.000000e+00 The first case (-fno-regmove) is the correct one. When you take a look at the assembly output for both cases the problem is with an "addss %xmm1, %xmm0" that is changed to "addss %xmm0, %xmm1". This is incorrect. The addss instruction is not commutative (unlike addps which sums over the entire vector). The same problem occurs with _mm_add_ss in the code above replaced by _mm_mul_ss (mulss instruction), but not with _mm_sub_ss for instance (obviously), so I suppose this can be fixed by handling addss and mulss the same way as subss. I suppose other instructions could be affected too. -- Summary: "-O -fregmove" handles SSE scalar instructions incorrectly Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tijl at ulyssis dot org GCC build triplet: i386-portbld-freebsd6.1 GCC host triplet: i386-portbld-freebsd6.1 GCC target triplet: i386-portbld-freebsd6.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27869