on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2 and -Ofast -march=native which is both very nice! Zen1 for some reason sees less improvement, about 6%. With PGO it is 3.8%
Overall it seems a win, but there are few noteworthy issues. I also see a 6.69% regression on x64 with -Ofast -march=native -flto https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0 and perhaps 3-5% on sphinx https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.280.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.280.0 For non-spec benchmarks spec there is a regression on nbench https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=26.645.1 There are also large changes in tsvc https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report it may be noise since kernels are tiny, but for example x293 reproduces both on kabylake and zen by about 80-90% regression that may be easy to track (the kernel is included in the testsuite). Same regression is not seen on zen3, so may be an ISA specific or so. FInally there seems relatively large code size savings on polyhedron benchmarks today (8% on capacita, Thanks a lot!