https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007
--- Comment #28 from Maxim Kuvyrkov <mkuvyrkov at gcc dot gnu.org> --- (In reply to Ilya Leoshkevich from comment #27) > With > > -DSPEC_CPU -DNDEBUG -DPERL_CORE -O3 -save-temps=obj > -fopt-info-vec-optimized -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 > -fgnu89-inline > > on gcc113 I can see 2% slowdown: > > r277511 (without this fix): 880.09s > r277515 (with this fix): 897.85s > > The function that degraded the most is indeed S_regmatch: > > $ perf diff perf-9760321.data perf-44b2b4c.data > 32.24% exe [.] S_regmatch > > 8.92% exe [.] S_find_byclass.isra.0 > > 6.80% +0.28% libc-2.19.so [.] 0x000000000007dec0 > > 5.20% exe [.] S_regtry > > > However, the "shape" of S_regmatch did not change, that is, when all > offsets and register numbers are replaced with "x" in the objdump > output, the old and the new versions are identical. This hints at some > microarchitectural effect - aliasing in the branch predictor maybe? > > From my perspective, this happens too often, so I use the following test > to rule this out: just add a nop at the beginning of the problematic > function. This changes all the offsets and makes aliasing situation > completely different. And indeed, by adding a single nop to S_regmatch, > I get wildly different results (for now this is just 1 repeat, I will > run best-of-3 overnight): > > r277511 (without this fix): 929.1s > r277515 (with this fix): 931.48s Hi Ilya, Thanks for the analysis. Doesn't seem like we can do anything useful about this regression. [For completeness, I see same 5% slowdown with "-O3 -funroll-loops" as with plain -O3 on Cortex-A57.]