https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #31 from Andrew Roberts <andrewm.roberts at sky dot com> --- of for mt19937ar with -mno-avx2 /usr/local/gcc/bin/gcc -march=$amarch -mtune=$amtune -mno-avx2 -O3 -o mt199 37ar mt19937ar.c Top 2: mt19937ar took 358493 clocks -march=silvermont -mtune=bdver1 mt19937ar took 359933 clocks -march=corei7 -mtune=btver2 Top znver1: mt19937ar took 363177 clocks -march=znver1 -mtune=k8-sse3 mt19937ar took 373751 clocks -march=slm -mtune=znver1 mt19937ar took 379094 clocks -march=znver1 -mtune=znver1 Worst cases: mt19937ar took 683339 clocks -march=bdver3 -mtune=btver1 mt19937ar took 687566 clocks -march=btver2 -mtune=haswell mt19937ar took 695629 clocks -march=athlon64-sse3 -mtune=sandybridge mt19937ar took 697349 clocks -march=k8-sse3 -mtune=knl mt19937ar took 697831 clocks -march=knl -mtune=core2 mt19937ar took 798283 clocks -march=opteron -mtune=athlon64-sse3 Running just for: -march=znver1 -mtune=znver1 -Ofast mt19937ar took 445136 clocks mt19937ar took 449784 clocks mt19937ar took 460105 clocks Running just for: -march=znver1 -mtune=znver1 -mno-avx2 -Ofast mt19937ar took 416937 clocks mt19937ar took 389458 clocks mt19937ar took 389154 clocks So -mno-avx2 gives 13-14% gain depending on how you look at it.