[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-16 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #11 from Jakub Jelinek  ---
Fixed.

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:d55ce33a34a8e33d17285228b32cf1e564241a70

commit r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70
Author: Jakub Jelinek 
Date:   Tue Mar 16 18:46:20 2021 +0100

i386: Avoid mutual recursion between two peephole2s [PR99600]

As the testcase shows, the compiler hangs and eats all memory when
compiling
it.  This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54
I have changed the ix86_avoid_lea_for_addr splitting from a splitter
into a peephole2 (because during splitting passes we don't have guaranteed
df, while during peephole2 we do).
The problem is we have another peephole2 that works in the opposite way,
when seeing split lea (in particular ASHIFT followed by PLUS) it attempts
to turn it back into a lea.
In the past, they were fighting against each other, but as they were in
different passes, simply the last one won.  So, split after reload
split the lea into shift left and plus, peephole2 reverted that (but, note
not perfectly, the peephole2 doesn't understand that something can be
placed
into lea disp; to be fixed for GCC12) and then another split pass split the
lea appart again.
But my changes and the way peephole2 works means that we endlessly iterate
over those two, the first peephole2 splits the lea, the second one reverts
it, the first peephole2 splits the new lea back into new 2 insns and so
forth forever.
So, we need to break the cycle somehow.  This patch does that by not
emitting
an ASHIFT insn from ix86_split_lea_for_addr but emitting a corresponding
MULT by constant instead, and splitting that later back into ASHIFT.

2021-03-16  Jakub Jelinek  

PR target/99600
* config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT
rather than ASHIFT.
* config/i386/i386.md (mult by 1248 into ashift): New splitter.

* gcc.target/i386/pr99600.c: New test.

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-15 Thread arnd at linaro dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

--- Comment #9 from Arnd Bergmann  ---
I now built gcc with and without the patch from attachment 50390 to find more
broken kernel configurations and verify that they are all fixed. So far, all
the broken configurations are fixed by the patch, I'll leave it running over
night to see if something comes up.

Thanks a lot for coming up with a patch so quickly!

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

--- Comment #8 from Jakub Jelinek  ---
Created attachment 50390
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50390&action=edit
gcc11-pr99600.patch

Untested fix.  I'm certainly not proud of that, but I don't see easy and clean
and inexpensive fixes.

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

--- Comment #7 from Jakub Jelinek  ---
Or emit a noop move insn (or something else that will be optimized away soon,
e.g. during DCE) in between the ASHIFT and following insn in
ix86_split_lea_for_addr.
A problem with remembering the INSN_UID of the ASHIFT insn is where we'd reset
before processing next function, though peephole2 is scheduled just once, so it
could be in lots of different spots.

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

Jakub Jelinek  changed:

   What|Removed |Added

 CC|jakub at redhat dot com|uros at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Unfortunately, running ix86_avoid_lea_for_addr on the insns that aren't added
yet into the insn stream and aren't updated with df isn't that easy,
*lea_outperforms* wants to walk forwards and backwards from there etc. and uses
df.
So, one option could be disable the
(define_peephole2
  [(match_scratch:W 5 "r")
   (parallel [(set (match_operand 0 "register_operand")
   (ashift (match_operand 1 "register_operand")
   (match_operand 2 "const_int_operand")))
   (clobber (reg:CC FLAGS_REG))])
   (parallel [(set (match_operand 3 "register_operand")
   (plus (match_dup 0)
 (match_operand 4 "x86_64_general_operand")))
   (clobber (reg:CC FLAGS_REG))])]
  "IN_RANGE (INTVAL (operands[2]), 1, 3)
   /* Validate MODE for lea.  */
   && ((!TARGET_PARTIAL_REG_STALL
...
altogether for TARGET_AVOID_LEA_FOR_ADDR && optimize_function_for_speed_p
(cfun).
Another might be to somehow mark the instructions created by the
(define_peephole2
  [(set (match_operand:SWI48 0 "register_operand")
(match_operand:SWI48 1 "address_no_seg_operand"))]
  "ix86_hardreg_mov_ok (operands[0], operands[1])
   && peep2_regno_dead_p (0, FLAGS_REG)
   && ix86_avoid_lea_for_addr (peep2_next_insn (0), operands)"
  [(const_int 0)]
peephole2 and in the other peephole2 punt if one or both of the insns are
marked that way.
That marking could be some hash set (but what would delete it at the end of
peephole2 pass), some reg note or whatever on the insns or perhaps just
remembering INSN_UID for the first and last insn in the sequence before DONE?

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

--- Comment #5 from Jakub Jelinek  ---
So, just to document what GCC 10 does:
(insn 38 37 15 3 (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(plus:DI (mult:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(const_int 4 [0x4]))
(const_int 4 [0x4]))) "pr99600.c":8:25 182 {*leadi}
 (nil))
after RA before split2 (like in GCC 11).
split2 makes:
(insn 44 43 45 3 (parallel [
(set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(ashift:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(const_int 2 [0x2])))
(clobber (reg:CC 17 flags))
]) "pr99600.c":8:25 592 {*ashldi3_1}
 (nil))
(insn 45 44 15 3 (parallel [
(set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(plus:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "pr99600.c":8:25 186 {*adddi_1}
 (nil))
out of that because lea is expensive on atom.
Then peephole2 triggers and undoes that using the 2nd pattern mentioned in
there (but apparently not perfectly):
(insn 56 55 57 3 (set (reg:DI 1 dx)
(const_int 4 [0x4])) "pr99600.c":8:25 -1
 (nil))
(insn 57 56 15 3 (set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(plus:DI (reg:DI 1 dx)
(mult:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(const_int 4 [0x4] "pr99600.c":8:25 -1
 (nil))
and finally split3 applies the lea split up again:
(insn 56 55 66 3 (set (reg:DI 1 dx)
(const_int 4 [0x4])) "pr99600.c":8:25 66 {*movdi_internal}
 (nil))
(insn 66 56 67 3 (parallel [
(set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(ashift:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(const_int 2 [0x2])))
(clobber (reg:CC 17 flags))
]) "pr99600.c":8:25 592 {*ashldi3_1}
 (nil))
(insn 67 66 15 3 (parallel [
(set (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(plus:DI (reg:DI 0 ax [orig:84 iftmp.1_3 ] [84])
(reg:DI 1 dx)))
(clobber (reg:CC 17 flags))
]) "pr99600.c":8:25 186 {*adddi_1}
 (nil))
But because each of those do it, undo it, do it again operations happens in a
separate pass, there is not the compiler hang.

This means that I think the best fix is to FAIL in the second peephole2 if the
constructed address for lea is undesirable.
And maybe, for GCC12, optimize that peephole2 so that it doesn't force into
registers something that could be an immediate.

[Bug target/99600] [11 regression] out of memory for simple test case (x86 -march=atom) since r11-7274

2021-03-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99600

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
Summary|[11 regression] out of  |[11 regression] out of
   |memory for simple test case |memory for simple test case
   |(x86 -march=atom)   |(x86 -march=atom) since
   ||r11-7274
   Target Milestone|--- |11.0

--- Comment #4 from Jakub Jelinek  ---
Therefore, most likely started with my
r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54 that changed a splitter into
the first peephole2.
Will try to see which of those two actually won and will need to adjust the
other peephole2 to punt.