I discovered that a simple benchmark ("SCIMARK2 Montecarlo") runs tree times slower when compiled with gcc 4.3 w.r.t. 4.1 or 3.4 Code is compiled and run of INTEL core 2 machines running RHEL4, RHEL5 or fedora10. below details on fedora 10 compilers used are from fedora distribution -bash-3.2$ gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=x86_64-redhat-linux Thread model: posix gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC)
-bash-3.2$ gcc34 -v Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,f77 --disable-libgcj --host=x86_64-redhat-linux Thread model: posix gcc version 3.4.6 20060404 (Red Hat 3.4.6-9) I've extracted the code in a self contained source downloadable from wget http://innocent.home.cern.ch/innocent/fullMC.c results are -bash-3.2$ g++ -O3 fullMC.c ; time ./a.out real 0m1.731s user 0m1.730s sys 0m0.001s -bash-3.2$ g++34 -O3 fullMC.c ; time ./a.out real 0m0.547s user 0m0.546s sys 0m0.001s in my opinion the culprit is a wrong use of jump instead of cmov instruction here: this is the disassember emitted by 4.3 int I = R->i; 400510: 8b 4f 48 mov 0x48(%rdi),%ecx int J = R->j; 400513: 8b 77 4c mov 0x4c(%rdi),%esi int *m = R->m; k = m[I] - m[J]; 400516: 48 63 c1 movslq %ecx,%rax 400519: 48 63 d6 movslq %esi,%rdx 40051c: 8b 04 87 mov (%rdi,%rax,4),%eax if (k < 0) k += m1; 40051f: 41 89 c0 mov %eax,%r8d 400522: 44 2b 04 97 sub (%rdi,%rdx,4),%r8d 400526: 78 58 js 400580 <Random_nextDouble+0x70> R->m[J] = k; and this for 3.4 int I = R->i; 400660: 8b 47 48 mov 0x48(%rdi),%eax int J = R->j; 400663: 8b 57 4c mov 0x4c(%rdi),%edx int *m = R->m; k = m[I] - m[J]; 400666: 48 63 c8 movslq %eax,%rcx 400669: 48 63 f2 movslq %edx,%rsi 40066c: 44 8b 04 8f mov (%rdi,%rcx,4),%r8d 400670: 44 2b 04 b7 sub (%rdi,%rsi,4),%r8d if (k < 0) k += m1; 400674: 41 8d 88 ff ff ff 7f lea 0x7fffffff(%r8),%ecx 40067b: 41 83 f8 ff cmp $0xffffffffffffffff,%r8d 40067f: 44 0f 4e c1 cmovle %ecx,%r8d R->m[J] = k; ------------------------------------- gcc 4.1 (below specs from RHL5) produces same instructions than 3.4 gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20071124 (Red Hat 4.1.2-42) -- Summary: Optimization regression in simple conditional code (js instead of cmov) 4.3 vs 4.1 and 3.4 Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vincenzo dot innocente at cern dot ch GCC build triplet: x86_64-redhat-linux GCC host triplet: x86_64-redhat-linux GCC target triplet: x86_64-redhat-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38922