https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641
Bug ID: 113641 Summary: 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux-gnu Target: x86_64-linux-gnu During the development of GCC 13, 510.parest_r run-time regressed on x86_64 when built with profile guided optimization and just plain O2 and master than when using GCC12. The difference is not big but fairly clear cut, about 7.6% on Zen3: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=740.457.0&plot.1=892.457.0&plot.2=694.457.0& and about 7.2% on Zen2: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=777.457.0&plot.1=932.457.0&plot.2=687.457.0& The graphs above show use of both LTO and PGO but LTO is not necessary. I was able to bisect the regression to commit r13-4272-g8caf155a3d6e23 (i386: Only enable small loop unrolling in backend [PR 107692]). parest_r is also about 4% slower when compiled with this revision than with the previous one on Intel CascadeLake. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)