Consider the following C program using SSE intrinsics:

//----------
#include <stdio.h>
#include <xmmintrin.h>

int main(int argc, const char **argv) {
        __m128 v;
        v = _mm_setr_ps( 1.0f, 2.0f, 3.0f, 4.0f );

        v = _mm_rsqrt_ss( v );
        v = _mm_add_ss( v, _mm_movehl_ps( v, v ));
        v = _mm_add_ss( v, _mm_shuffle_ps( v, v, _MM_SHUFFLE( 0, 0, 0, 1 )));

        printf( "%e %e %e %e\n", ((float *)&v)[0], ((float *)&v)[1], ((float
*)&v)[2], ((float *)&v)[3] );
        return 0;
}
//----------

Compiling and running this gives different results depending on whether
-fregmove is specified or not.

[EMAIL PROTECTED] regmove% gcc41 -Wall -O -fno-regmove -march=pentium4m -o test
main.c
[EMAIL PROTECTED] regmove% ./test
5.999756e+00 2.000000e+00 3.000000e+00 4.000000e+00
[EMAIL PROTECTED] regmove% gcc41 -Wall -O -fregmove -march=pentium4m -o test 
main.c
[EMAIL PROTECTED] regmove% ./test
7.999756e+00 4.000000e+00 3.000000e+00 4.000000e+00

The first case (-fno-regmove) is the correct one.

When you take a look at the assembly output for both cases the problem is with
an "addss %xmm1, %xmm0" that is changed to "addss %xmm0, %xmm1". This is
incorrect. The addss instruction is not commutative (unlike addps which sums
over the entire vector).

The same problem occurs with _mm_add_ss in the code above replaced by
_mm_mul_ss (mulss instruction), but not with _mm_sub_ss for instance
(obviously), so I suppose this can be fixed by handling addss and mulss the
same way as subss.

I suppose other instructions could be affected too.


-- 
           Summary: "-O -fregmove" handles SSE scalar instructions
                    incorrectly
           Product: gcc
           Version: 4.1.2
            Status: UNCONFIRMED
          Severity: critical
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tijl at ulyssis dot org
 GCC build triplet: i386-portbld-freebsd6.1
  GCC host triplet: i386-portbld-freebsd6.1
GCC target triplet: i386-portbld-freebsd6.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27869

Reply via email to