The testcase from PR target/27827 shows another problem, this time with -ffast-math. The runtime performance of -ffast-math code drops for ~30%.
The problem could be traced down to reassociation tree pass, because the performance jumps back when "flag_unsafe_math_optimizations" switch is disabled by changing every occurence in tree-ssa-reassoc.c with (flag_unsafe_math_optimizations && 0). To see the problem, -funsafe-math-optimizations should be added to MMFLAGS in target/27827 example Makefile: MM4FLAGS = $(GMMFLAGS) -funsafe-math-optimizations Current mainline gcc produces code with following results: -O -mfpmath=387 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.260 1663.04 -O -msse2 -mfpmath=sse ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.229 1890.47 gcc with disabled reassoc pass for floating point values: -O -mfpmath=387 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.162 2664.87 -O -msse2 -mfpmath=sse ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.164 2633.15 -- Summary: reassociation pass produces ~30% slower matrix multiplication code Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at kss-loka dot si GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu OtherBugsDependingO 27827 nThis: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855