https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #28 from Andrew Roberts <andrewm.roberts at sky dot com> --- Adding -mno-avx2 into the mix was a marginal win, but only just showing out of the noise: /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -mno-avx2 -O3 matrix.c -o matrix mult took 121397 clocks mult took 124373 clocks mult took 125345 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -O3 matrix.c -o matrix mult took 123262 clocks mult took 128193 clocks mult took 125891 clocks Using -Ofast instead of -O3 /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -Ofast matrix.c -o matrix mult took 125163 clocks mult took 123799 clocks mult took 122808 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -mno-avx2 -Ofast matrix.c -o matrix mult took 130189 clocks mult took 122726 clocks mult took 123686 clocks