[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #33 from prathamesh3492 at gcc dot gnu.org --- Created attachment 42341 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42341=edit Test-case to reproduce regression with cortex-m7 I have attached an artificial test-case that is fairly representative of the regression we are seeing in a benchmark. The test-case mimics a deterministic finite automaton. With code-hoisting there's an additional spill of r5 near beginning of the function. Looking at the loop from the attached test-case: for (; *a && b != 'z'; a++) { next = *a; if (next == ',') { a++; break; } switch (b) { ... } } The for loop has same computation a++ in two sibling basic blocks, which gets hoisted. From PRE dump with code-hoisting: [23.80%] [count: INV]: # _25 = PHI <_151(25), _23(2)> # b_50 = PHI# a_55 = PHI next_29 = (int) _25; _44 = a_55 + 1; if (next_29 == 44) goto ; [5.00%] [count: INV] else goto ; [95.00%] [count: INV] (a+1) seems to get hoisted in bb26: _44 = a_55 + 1 just before if (next_29 == 44) which corresponds to if (next == ',') condition. The issue I think is that there is a use of 'a' near end of function: *s = a; which possibly results in register pressure forcing the compiler to spill r5. Commenting out the assignment removes the spill. Looking at register allocation with code-hoisting, it seems r2 is used to hold the hoisted value (a + 1): r0 = s r1 = tab r3 = a r4 = b r5 = *a r2 = r3 + 1 (holding the hoisted value) And without code-hoisting, it seems only r3 is assigned to 'a'. r0 = s r1 = tab r2 = b r3 = a r4 = *a This is evident from asm differences for the early-exit code-path: if (next == ',') { a++; break; } : *s = a; return b; Without code-hoisting: .L2: cmp r4, #44 beq .L4 .L4: addsr3, r3, #1 ldr r4, [sp], #4 str r3, [r0] mov r0, r2 bx lr With code-hoisting: .L2: cmp r5, #44 add r2, r3, #1 beq .L3 .L3: str r2, [r0] mov r0, r4 pop {r4, r5} bx lr Without code-hoisting it is reusing r3 to store a + 1, while due to code hoisting it uses the extra register 'r2' to store the value of hoisted expression a + 1. Would it be a good idea to somehow "limit" the distance (in terms of number of basic blocks maybe?) between the definition of hoisted variable and it's furthest use during PRE ? If that exceeds a certain threshold then PRE should choose not to hoist that expression. The threshold could be a param that can be set by backends. Does this analysis look reasonable ? Thanks, Prathamesh
[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #32 from Thomas Preud'homme --- (In reply to rguent...@suse.de from comment #31) > On Wed, 4 Oct 2017, prathamesh3492 at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 > > > > prathamesh3492 at gcc dot gnu.org changed: > > > >What|Removed |Added > > > > CC||prathamesh3492 at gcc dot > > gnu.org > > > > --- Comment #30 from prathamesh3492 at gcc dot gnu.org --- > > Hi Richard, > > I tried your patch in comment #9 with the fix in comment #13 but since > > tree-ssa-pre.c appears to be refactored, the fix doesn't apply anymore and > > ICE > > resurfaces. Could you guide me what fix I should apply to reproduce the > > regression ? IIUC the issue here is that code-hoisting is increasing > > register > > pressure thus causing the extra spill ? And GIMPLE does not seem to have > > cost > > model for register allocation. > > > > Are you planning to take a look at this PR soon ? If not I would like to > > give a > > try and would be grateful for suggestions on how to approach this bug. > > Thanks! > > Neither am I planning to look at this soon nor do I have a good idea > how to approach this bug. > > My ideas were to compute register pressure & update it during elimination > and thus avoid adding uses that increase pressure over some point. While > that might mitigate the issue it isn't in any way applying a cost model > to individual inserts. [nor is computing/updating register pressure easy] Hi, Looking at the testcase I attached to this ticket I'm regrettably not so sure they are representative of the issue we were facing which resulted from too much register pressure. With so few variable this is probably hitting some other bug. I'll try and come up with a better reduced testcase. Best regards.
[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #31 from rguenther at suse dot de --- On Wed, 4 Oct 2017, prathamesh3492 at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 > > prathamesh3492 at gcc dot gnu.org changed: > >What|Removed |Added > > CC||prathamesh3492 at gcc dot > gnu.org > > --- Comment #30 from prathamesh3492 at gcc dot gnu.org --- > Hi Richard, > I tried your patch in comment #9 with the fix in comment #13 but since > tree-ssa-pre.c appears to be refactored, the fix doesn't apply anymore and ICE > resurfaces. Could you guide me what fix I should apply to reproduce the > regression ? IIUC the issue here is that code-hoisting is increasing register > pressure thus causing the extra spill ? And GIMPLE does not seem to have cost > model for register allocation. > > Are you planning to take a look at this PR soon ? If not I would like to give > a > try and would be grateful for suggestions on how to approach this bug. > Thanks! Neither am I planning to look at this soon nor do I have a good idea how to approach this bug. My ideas were to compute register pressure & update it during elimination and thus avoid adding uses that increase pressure over some point. While that might mitigate the issue it isn't in any way applying a cost model to individual inserts. [nor is computing/updating register pressure easy]
[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #30 from prathamesh3492 at gcc dot gnu.org --- Hi Richard, I tried your patch in comment #9 with the fix in comment #13 but since tree-ssa-pre.c appears to be refactored, the fix doesn't apply anymore and ICE resurfaces. Could you guide me what fix I should apply to reproduce the regression ? IIUC the issue here is that code-hoisting is increasing register pressure thus causing the extra spill ? And GIMPLE does not seem to have cost model for register allocation. Are you planning to take a look at this PR soon ? If not I would like to give a try and would be grateful for suggestions on how to approach this bug. Thanks!
[Bug tree-optimization/80155] [7/8 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 Jeffrey A. Law changed: What|Removed |Added Target Milestone|7.0 |8.0 Summary|[7 regression] Performance |[7/8 regression] |regression with code|Performance regression with |hoisting enabled|code hoisting enabled --- Comment #29 from Jeffrey A. Law --- Based on c#27, pushing out to gcc-8.