[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #11 from Jakub Jelinek --- Fixed.
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 --- Comment #10 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:d55ce33a34a8e33d17285228b32cf1e564241a70 commit r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70 Author: Jakub Jelinek Date: Tue Mar 16 18:46:20 2021 +0100 i386: Avoid mutual recursion between two peephole2s [PR99600] As the testcase shows, the compiler hangs and eats all memory when compiling it. This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54 I have changed the ix86_avoid_lea_for_addr splitting from a splitter into a peephole2 (because during splitting passes we don't have guaranteed df, while during peephole2 we do). The problem is we have another peephole2 that works in the opposite way, when seeing split lea (in particular ASHIFT followed by PLUS) it attempts to turn it back into a lea. In the past, they were fighting against each other, but as they were in different passes, simply the last one won. So, split after reload split the lea into shift left and plus, peephole2 reverted that (but, note not perfectly, the peephole2 doesn't understand that something can be placed into lea disp; to be fixed for GCC12) and then another split pass split the lea appart again. But my changes and the way peephole2 works means that we endlessly iterate over those two, the first peephole2 splits the lea, the second one reverts it, the first peephole2 splits the new lea back into new 2 insns and so forth forever. So, we need to break the cycle somehow. This patch does that by not emitting an ASHIFT insn from ix86_split_lea_for_addr but emitting a corresponding MULT by constant instead, and splitting that later back into ASHIFT. 2021-03-16 Jakub Jelinek PR target/99600 * config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT rather than ASHIFT. * config/i386/i386.md (mult by 1248 into ashift): New splitter. * gcc.target/i386/pr99600.c: New test.
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 --- Comment #9 from Arnd Bergmann --- I now built gcc with and without the patch from attachment 50390 to find more broken kernel configurations and verify that they are all fixed. So far, all the broken configurations are fixed by the patch, I'll leave it running over night to see if something comes up. Thanks a lot for coming up with a patch so quickly!
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 --- Comment #8 from Jakub Jelinek --- Created attachment 50390 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50390&action=edit gcc11-pr99600.patch Untested fix. I'm certainly not proud of that, but I don't see easy and clean and inexpensive fixes.
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 --- Comment #7 from Jakub Jelinek --- Or emit a noop move insn (or something else that will be optimized away soon, e.g. during DCE) in between the ASHIFT and following insn in ix86_split_lea_for_addr. A problem with remembering the INSN_UID of the ASHIFT insn is where we'd reset before processing next function, though peephole2 is scheduled just once, so it could be in lots of different spots.
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 Jakub Jelinek changed: What|Removed |Added CC|jakub at redhat dot com|uros at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- Unfortunately, running ix86_avoid_lea_for_addr on the insns that aren't added yet into the insn stream and aren't updated with df isn't that easy, *lea_outperforms* wants to walk forwards and backwards from there etc. and uses df. So, one option could be disable the (define_peephole2 [(match_scratch:W 5 "r") (parallel [(set (match_operand 0 "register_operand") (ashift (match_operand 1 "register_operand") (match_operand 2 "const_int_operand"))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_operand 3 "register_operand") (plus (match_dup 0) (match_operand 4 "x86_64_general_operand"))) (clobber (reg:CC FLAGS_REG))])] "IN_RANGE (INTVAL (operands[2]), 1, 3) /* Validate MODE for lea. */ && ((!TARGET_PARTIAL_REG_STALL ... altogether for TARGET_AVOID_LEA_FOR_ADDR && optimize_function_for_speed_p (cfun). Another might be to somehow mark the instructions created by the (define_peephole2 [(set (match_operand:SWI48 0 "register_operand") (match_operand:SWI48 1 "address_no_seg_operand"))] "ix86_hardreg_mov_ok (operands[0], operands[1]) && peep2_regno_dead_p (0, FLAGS_REG) && ix86_avoid_lea_for_addr (peep2_next_insn (0), operands)" [(const_int 0)] peephole2 and in the other peephole2 punt if one or both of the insns are marked that way. That marking could be some hash set (but what would delete it at the end of peephole2 pass), some reg note or whatever on the insns or perhaps just remembering INSN_UID for the first and last insn in the sequence before DONE?
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 --- Comment #5 from Jakub Jelinek --- So, just to document what GCC 10 does: (insn 38 37 15 3 (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (plus:DI (mult:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (const_int 4 [0x4])) (const_int 4 [0x4]))) "pr99600.c":8:25 182 {*leadi} (nil)) after RA before split2 (like in GCC 11). split2 makes: (insn 44 43 45 3 (parallel [ (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (ashift:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags)) ]) "pr99600.c":8:25 592 {*ashldi3_1} (nil)) (insn 45 44 15 3 (parallel [ (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (plus:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) "pr99600.c":8:25 186 {*adddi_1} (nil)) out of that because lea is expensive on atom. Then peephole2 triggers and undoes that using the 2nd pattern mentioned in there (but apparently not perfectly): (insn 56 55 57 3 (set (reg:DI 1 dx) (const_int 4 [0x4])) "pr99600.c":8:25 -1 (nil)) (insn 57 56 15 3 (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (plus:DI (reg:DI 1 dx) (mult:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (const_int 4 [0x4] "pr99600.c":8:25 -1 (nil)) and finally split3 applies the lea split up again: (insn 56 55 66 3 (set (reg:DI 1 dx) (const_int 4 [0x4])) "pr99600.c":8:25 66 {*movdi_internal} (nil)) (insn 66 56 67 3 (parallel [ (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (ashift:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags)) ]) "pr99600.c":8:25 592 {*ashldi3_1} (nil)) (insn 67 66 15 3 (parallel [ (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (plus:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84]) (reg:DI 1 dx))) (clobber (reg:CC 17 flags)) ]) "pr99600.c":8:25 186 {*adddi_1} (nil)) But because each of those do it, undo it, do it again operations happens in a separate pass, there is not the compiler hang. This means that I think the best fix is to FAIL in the second peephole2 if the constructed address for lea is undesirable. And maybe, for GCC12, optimize that peephole2 so that it doesn't force into registers something that could be an immediate.
[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P1 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Summary|[11 regression] out of |[11 regression] out of |memory for simple test case |memory for simple test case |(x86 -march=atom) |(x86 -march=atom) since ||r11-7274 Target Milestone|--- |11.0 --- Comment #4 from Jakub Jelinek --- Therefore, most likely started with my r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54 that changed a splitter into the first peephole2. Will try to see which of those two actually won and will need to adjust the other peephole2 to punt.