https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jvdelisle at gcc dot gnu.org --- Comment #16 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> --- For what its worth: $ gfc pr51119.f90 -lblas -fno-external-blas -Ofast -march=native $ ./a.out Time, MATMUL: 21.2483196 21.254449646000001 1.5055670945599979 Time, dgemm: 33.2441711 33.243087289000002 .96260614189671445 This is on a laptop not taking any advantage of a tuned BLAS. If I replace -Ofast with -O2 I get: $ ./a.out Time, MATMUL: 43.6199570 43.625358022999997 0.73351833543988521 Time, dgemm: 33.2262650 33.226961453000001 0.96307331759072967 -O3 brings performance back to match with -Ofast. It seems odd to me that -O2 does not do well. Regardless, the internal MATMUL is doing better than BLAS on this platform, but 1.5 gflops is pretty lame either way.