With these compile options -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
With this compiler: euler-44% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2 Thread model: posix gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC) With the following routine compiled with gcc-4.2.2 you get (time (direct-fft-recursive-4 a table)) 366 ms real time 366 ms cpu time (366 user, 0 system) no collections 64 bytes allocated no minor faults no major faults while with today's mainline you get (time (direct-fft-recursive-4 a table)) 448 ms real time 448 ms cpu time (448 user, 0 system) no collections 64 bytes allocated no minor faults no major faults I've isolated that one routine and I'll add it at the end of an attachment; unfortunately there are a lot of declarations and global data that are difficult to winnow. There is really only one main loop in the routine, the one that begins at ___L19_direct_2d_fft_2d_recursive_2d_4. This loop was scheduled in 102 cycles (sched2) on 4.4.2 and in 134 cycles in mainline. -- Summary: 33% performance slowdown from 4.2.2 in floating-point code Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lucier at math dot purdue dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928