When compiling the following reduced code, both GCC 4.0.3 and 4.1.0 clutter the assembly code with some strange moves through SSE registers.
typedef union { long long l; double d; } db_number; double test(double x[3]) { double th = x[1] + x[2]; if (x[2] != th - x[1]) { db_number thdb; thdb.d = th; thdb.l++; th = thdb.d; } return x[0] + th; } "gcc-4.0 -S -O3 -march=pentium3" will generate: fstpl -16(%ebp) movlps -16(%ebp), %xmm0 je .L2 ... movlps -16(%ebp), %xmm1 movaps %xmm1, %xmm0 .L2: movlps %xmm0, -16(%ebp) fldl -16(%ebp) GCC has decided that the content of "th" would be in %xmm0 (while "th" is a double variable and the target is a SSE 1 processor!) instead of being in -16(%ebp) where the rest of the code expects it to be. As a consequence, the compiler has to misoptimize the code to cope with this. In comparison, below is what GCC 3.4 generates. This seems saner and optimal to me. fstp %st(1) je .L2 ... fldl -16(%ebp) .L2: With GCC 3.4, either the value is left untouched at the top of the floating-point stack, either it is loaded once if it was modified. In both cases, it is directly available once the execution reaches .L2. No SSE register is involded and there is no load/store/load sequence though a single stack location. This was tested with Debian packages for GCC 3.4, 4.0, and 4.1 : Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk-default --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --with-tune=i686 --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.0.3 (Debian 4.0.3-1) -- Summary: GCC4 moves the result of a conditional block through inadequate registers Product: gcc Version: 4.0.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guillaume dot melquiond at ens-lyon dot fr GCC host triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26778