https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #29 from Andrew Roberts <andrewm.roberts at sky dot com> --- And rerunning all the tests for matrix.c on Ryzen using: -march=$amarch -mtune=$amtune -mprefer-vector-width=none -mno-fma -O3 The winners were: mult took 118145 clocks -march=broadwell -mtune=broadwell mult took 118912 clocks -march=core-avx2 -mtune=core-avx2 Top -mtune=znver1 mult took 121845 clocks -march=core-avx2 -mtune=znver1 mult took 129241 clocks -march=znver1 -mtune=znver1 And the bottom of the list no longer has a cluster of -mtune= btverX, bdverX, znver1 Worst cases: mult took 253400 clocks -march=x86-64 -mtune=haswell mult took 254006 clocks -march=bonnell -mtune=westmere mult took 254624 clocks -march=bonnell -mtune=silvermont mult took 258577 clocks -march=bonnell -mtune=nehalem mult took 260612 clocks -march=bonnell -mtune=corei7 mult took 277789 clocks -march=nocona -mtune=nano-x4 --------- And rerunning all the tests for matrix.c on Ryzen using: -march=$amarch -mtune=$amtune -mprefer-vector-width=none -mno-fma -mno-avx2 -Ofast The winners were: mult took 116405 clocks -march=broadwell -mtune=broadwell mult took 117314 clocks -march=ivybridge -mtune=haswell mult took 117551 clocks -march=broadwell -mtune=bdver2 Top znver1: mult took 119951 clocks -march=knl -mtune=znver1 mult took 120442 clocks -march=znver1 -mtune=znver1 Worst cases: mult took 239640 clocks -march=nehalem -mtune=bdver3 mult took 240623 clocks -march=athlon64-sse3 -mtune=silvermont mult took 241143 clocks -march=eden-x2 -mtune=nano-2000 mult took 241547 clocks -march=core2 -mtune=intel mult took 241870 clocks -march=nehalem -mtune=bdver2 mult took 248251 clocks -march=nocona -mtune=intel The differences between broadwell and znver1 is within the margin of error I would suggest, with these options.