https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741
--- Comment #32 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> --- (In reply to Thomas Koenig from comment #31) > If the middle end is not up to this, should we be looking at doing loop > blocking in the Fortran front end, at least for the Matmul intrinsic? I think this makes sense, fixing this issue in the middle end seems to be a project on a different timescale. Ideally, matmul expands to something that generates good code even at e.g. -O2 -march=native (which would require both blocking and unrolling). At that point, the inlined code would be faster than the runtime library...for all sizes.