I discovered that a simple benchmark ("SCIMARK2 Montecarlo") runs tree times
slower when compiled with gcc 4.3 w.r.t. 4.1 or 3.4
Code is compiled and run of INTEL core 2 machines running RHEL4, RHEL5 or
fedora10.
below details on fedora 10
compilers used are from fedora distribution
-bash-3.2$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-cpu=generic --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC)

-bash-3.2$ gcc34 -v
Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--disable-checking --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-languages=c,c++,f77 --disable-libgcj
--host=x86_64-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)


I've extracted the code in a self contained source downloadable from
wget http://innocent.home.cern.ch/innocent/fullMC.c
results are
-bash-3.2$ g++ -O3 fullMC.c ; time ./a.out 

real    0m1.731s
user    0m1.730s
sys     0m0.001s
-bash-3.2$ g++34 -O3 fullMC.c ; time ./a.out 

real    0m0.547s
user    0m0.546s
sys     0m0.001s


in my opinion the culprit is a wrong use of jump instead of cmov instruction
here:

this is the disassember emitted by 4.3

  int I = R->i;
 400510:       8b 4f 48                mov    0x48(%rdi),%ecx
   int J = R->j;
 400513:       8b 77 4c                mov    0x4c(%rdi),%esi
   int *m = R->m;

   k = m[I] - m[J];
 400516:       48 63 c1                movslq %ecx,%rax
 400519:       48 63 d6                movslq %esi,%rdx
 40051c:       8b 04 87                mov    (%rdi,%rax,4),%eax
   if (k < 0) k += m1;
 40051f:       41 89 c0                mov    %eax,%r8d
 400522:       44 2b 04 97             sub    (%rdi,%rdx,4),%r8d
 400526:       78 58                   js     400580 <Random_nextDouble+0x70>
   R->m[J] = k;


and this for 3.4

   int I = R->i;
 400660:       8b 47 48                mov    0x48(%rdi),%eax
   int J = R->j;
 400663:       8b 57 4c                mov    0x4c(%rdi),%edx
   int *m = R->m;

   k = m[I] - m[J];
 400666:       48 63 c8                movslq %eax,%rcx
 400669:       48 63 f2                movslq %edx,%rsi
 40066c:       44 8b 04 8f             mov    (%rdi,%rcx,4),%r8d
 400670:       44 2b 04 b7             sub    (%rdi,%rsi,4),%r8d
   if (k < 0) k += m1;
 400674:       41 8d 88 ff ff ff 7f    lea    0x7fffffff(%r8),%ecx
 40067b:       41 83 f8 ff             cmp    $0xffffffffffffffff,%r8d
 40067f:       44 0f 4e c1             cmovle %ecx,%r8d
   R->m[J] = k;
-------------------------------------

gcc 4.1 (below specs from RHL5) produces same instructions than 3.4

 gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic
--host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)


-- 
           Summary: Optimization regression in simple conditional code (js
                    instead of cmov) 4.3 vs 4.1 and 3.4
           Product: gcc
           Version: 4.3.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: vincenzo dot innocente at cern dot ch
 GCC build triplet: x86_64-redhat-linux
  GCC host triplet: x86_64-redhat-linux
GCC target triplet: x86_64-redhat-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38922

Reply via email to