> Wilco Dijkstra wrote: > > Vladimir Makarov wrote: > > > On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and > > > SPECFP is ~0.2% faster. > > Thanks for reporting this. It is important for me as I have no aarch64 > > machine for benchmarking. > > > > Perlbmk performance degradation is too big and I'll definitely look at > > this problem. > > Looking at the diffs in regexec.c which has the hot function regmatch(), > nothing obvious stands out that could cause a serious regression. > I did notice this around line 2300: > > .L802: > ldr x1, [x23, 48] > adrp x5, PL_savestack_ix > ldr w0, [x23] > str x5, [sp, 104] > str x1, [x24, #:lo12:PL_regcc] > ldr w27, [x1, 4] > bl regcppush > - ldr x5, [sp, 104] > str w0, [sp, 112] > ldr x0, [x23, 32] > + adrp x5, PL_savestack_ix > ldr w28, [x5, #:lo12:PL_savestack_ix] > + str x5, [sp, 104] > bl regmatch > ldr x5, [sp, 104] > mov w19, w0 > ldr w1, [sp, 112] > ldr w0, [x5, #:lo12:PL_savestack_ix] > > So it rematerializes once instance, but fails to rematerialize the second use. > An extra store is inserted, and the first adrp and store are not removed as > dead.
A simple example that reproduces the issue (-mcpu=cortex-a57 -O2 -fomit-frame-pointer -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30). It looks like an odd interaction between -fcaller-saves and rematerialization. void g(void); int x; int f3b(int y) { y += x; g(); y += x; g(); y += x; return y; } f3b: adrp x2, x --> DEAD sub sp, sp, #16 ldr w1, [x2, #:lo12:x] str x2, [sp] --> DEAD add w0, w0, w1 str w0, [sp] --> reuse of stackslot!!! bl g adrp x2, x ldr w0, [sp] ldr w1, [x2, #:lo12:x] str x2, [sp, 8] add w0, w0, w1 str w0, [sp] --> REMOVE bl g ldr x2, [sp, 8] --> rematerialize adrp ldr w0, [sp] add sp, sp, 16 ldr w1, [x2, #:lo12:x] add w0, w0, w1 ret Wilco