When compiling the following reduced code, both GCC 4.0.3 and 4.1.0 clutter the
assembly code with some strange moves through SSE registers.

typedef union {
  long long l;
  double d;
} db_number;

double test(double x[3]) {
  double th = x[1] + x[2];
  if (x[2] != th - x[1]) {
    db_number thdb;
    thdb.d = th;
    thdb.l++;
    th = thdb.d;
  }
  return x[0] + th;
}

"gcc-4.0 -S -O3 -march=pentium3" will generate:

        fstpl   -16(%ebp)
        movlps  -16(%ebp), %xmm0
        je      .L2
        ...
        movlps  -16(%ebp), %xmm1
        movaps  %xmm1, %xmm0
  .L2:
        movlps  %xmm0, -16(%ebp)
        fldl    -16(%ebp)

GCC has decided that the content of "th" would be in %xmm0 (while "th" is a
double variable and the target is a SSE 1 processor!) instead of being in
-16(%ebp) where the rest of the code expects it to be. As a consequence, the
compiler has to misoptimize the code to cope with this. In comparison, below is
what GCC 3.4 generates. This seems saner and optimal to me.

        fstp    %st(1)
        je      .L2
        ...
        fldl    -16(%ebp)
  .L2:

With GCC 3.4, either the value is left untouched at the top of the
floating-point stack, either it is loaded once if it was modified. In both
cases, it is directly available once the execution reaches .L2. No SSE register
is involded and there is no load/store/load sequence though a single stack
location.

This was tested with Debian packages for GCC 3.4, 4.0, and 4.1 :

Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-java-awt=gtk-default --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr
--disable-werror --with-tune=i686 --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.0.3 (Debian 4.0.3-1)


-- 
           Summary: GCC4 moves the result of a conditional block through
                    inadequate registers
           Product: gcc
           Version: 4.0.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: guillaume dot melquiond at ens-lyon dot fr
  GCC host triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26778

Reply via email to