------- Comment #4 from jb at gcc dot gnu dot org 2006-11-04 22:16 ------- For the C version with 1d arrays, the benchmark results, with gfortran results for comparison, are
Complex version: -O3 funroll-loops -mfpmath=sse -msse2 1.32 above + fast-math 0.38 gfortran -O2: 0.32 Real version: 0.07 s fast-math, same thing. gfortran -O2 -g 0.07 So it seems the culprit is some optimization that -ffast-math enables that makes a huge difference for C99 complex arithmetic. However, compiling matmul in libgfortran with -ffast-math almost certainly won't fly.. So ideally we should find exactly what flag enables this performance improvement, and see if we can enable only that without bringing in all the -ffast-math baggage. Or then we should bugger the optimizer guys, if this is an optimization that could be enabled also without -ffast-math. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549