[Bug target/40730] redundant memory load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730 --- Comment #15 from Andrew Pinski --- (In reply to Andrew Pinski from comment #14) > Fixed for GCC 12 by r12-897-gde56f95afaaa22 (and r11-408-g84935c9822183c). The first redundant load was fixed by r11-408-g84935c9822183c. The extra store was fixed was fixed by r12-897-gde56f95afaaa22 . But it is still fixed fully.
[Bug target/40730] redundant memory load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |12.0 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=33315 Known to fail||10.1.0, 11.1.0, 5.1.0 Known to work||12.1.0, 13.1.0 Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #14 from Andrew Pinski --- Fixed for GCC 12 by r12-897-gde56f95afaaa22 (and r11-408-g84935c9822183c).
[Bug target/40730] redundant memory load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730 Steven Bosscher changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|steven at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #13 from Steven Bosscher --- Not working on this...
[Bug target/40730] redundant memory load
--- Comment #11 from stevenb dot gcc at gmail dot com 2010-01-11 08:22 --- Subject: Re: redundant memory load On Mon, Jan 11, 2010 at 7:47 AM, carrot at google dot com gcc-bugzi...@gcc.gnu.org wrote: iterate: push{lr} ldr r3, [r1] .L6: str r3, [r0] sub r2, r3, #0 bne .L5 b .L3 .L4: ldr r3, [r3, #8] b .L6 .L5: ldr r1, [r3, #4] cmp r1, #0 beq .L4 .L3: str r2, [r0, #12] @ sp needed for prologue pop {pc} Carrot, could you please double-check that this is still correct code? Yes, it is correct. There are still 13 instructions, I think it is related to unoptimized basic block order. Yes, I would have expected the block starting with .L4 to be *after* the block starting with .L5, something like so: iterate: push{lr} ldr r3, [r1] .L6: str r3, [r0] sub r2, r3, #0 beq .L3 .L5: ldr r1, [r3, #4] cmp r1, #0 bne .L3 ldr r3, [r3, #8] b .L6 .L3: str r2, [r0, #12] @ sp needed for prologue pop {pc} Does that look correct? And if so, could you see if there is an open bug report about this; or otherwise file a new PR and add me to the CC-list? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #12 from carrot at google dot com 2010-01-11 08:55 --- (In reply to comment #11) Yes, I would have expected the block starting with .L4 to be *after* the block starting with .L5, something like so: iterate: push{lr} ldr r3, [r1] .L6: str r3, [r0] sub r2, r3, #0 beq .L3 .L5: ldr r1, [r3, #4] cmp r1, #0 bne .L3 ldr r3, [r3, #8] b .L6 .L3: str r2, [r0, #12] @ sp needed for prologue pop {pc} Does that look correct? And if so, could you see if there is an open bug report about this; or otherwise file a new PR and add me to the CC-list? It is correct. The basic block ordering issue (-Os) has been observed several times. Following are related PRs: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41004 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41396 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #10 from carrot at google dot com 2010-01-11 06:47 --- (In reply to comment #9) With GCC: (GNU) 4.5.0 20100108 (experimental) [trunk revision 155731] and my patch for bug 20070 applied, I get the following code: iterate: push{lr} ldr r3, [r1] .L6: str r3, [r0] sub r2, r3, #0 bne .L5 b .L3 .L4: ldr r3, [r3, #8] b .L6 .L5: ldr r1, [r3, #4] cmp r1, #0 beq .L4 .L3: str r2, [r0, #12] @ sp needed for prologue pop {pc} Carrot, could you please double-check that this is still correct code? Yes, it is correct. There are still 13 instructions, I think it is related to unoptimized basic block order. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #9 from steven at gcc dot gnu dot org 2010-01-08 09:34 --- With GCC: (GNU) 4.5.0 20100108 (experimental) [trunk revision 155731] and my patch for bug 20070 applied, I get the following code: iterate: push{lr} ldr r3, [r1] .L6: str r3, [r0] sub r2, r3, #0 bne .L5 b .L3 .L4: ldr r3, [r3, #8] b .L6 .L5: ldr r1, [r3, #4] cmp r1, #0 beq .L4 .L3: str r2, [r0, #12] @ sp needed for prologue pop {pc} Carrot, could you please double-check that this is still correct code? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #7 from carrot at google dot com 2009-07-15 08:07 --- (In reply to comment #6) Carrot, can you please try this test case with my patch crossjump_abstract.diff from Bug 20070 applied? I tried your patch. It did remove the redundant memory load. Following is the output push{lr} ldr r3, [r1] .L6: str r3, [r0] mov r2, r3 // M cmp r3, #0 bne .L5 b .L3 .L4: ldr r3, [r3, #8] b .L6 .L5: ldr r1, [r3, #4] cmp r1, #0 beq .L4 .L3: str r2, [r0, #12] @ sp needed for prologue pop {pc} In pass ifcvt it noticed the difference of two stores is the different pseudo register number and there is no conflict between the two pseudo registers, so it rename one of them to the same as another and do basic block cross jump on them earlier. Then pass iterate.c.161r.cse2 detected the redundant load and remove it. But it introduced another redundant move instruction marked as M. At the place r2 is used, r3 still contain the same result as r2, so we can also use r3 there. I think this is another problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #8 from steven at gcc dot gnu dot org 2009-07-15 09:47 --- That redundant move has to be a separate issue, indeed. I would expect the register allocator to coalesce those registers. I hadn't expected this. I thought the result would be just the removal of the redundant load, but the code that comes out is bigger (14 instructions instead of 13) and has a completely different structure. I'll see if I can understand what is going on. Thus, mine. -- steven at gcc dot gnu dot org changed: What|Removed |Added BugsThisDependsOn||20070 AssignedTo|unassigned at gcc dot gnu |steven at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2009-07-13 10:10:07 |2009-07-15 09:47:09 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #4 from carrot at google dot com 2009-07-14 09:14 --- In TREE level, the two stores are different statements. Only after register allocation, the two stores get same register and make the load redundant. try_crossjump_bb tries to find same instruction sequence in all predecessors of a basic block bb, and move that code sequence to head of bb. It is triggered by this function, and the store is moved just before the load. I tried -fgcse-las but it couldn't do the work. (In reply to comment #2) -fgcse-las should do the trick. Note that PRE would do this kind of optimization on the tree-level, but it is disabled with -Os (so is gcse). bb 2: D.1614_2 = p2_1(D)-front; p1_3(D)-head = D.1614_2; goto bb 4; bb 3: D.1616_8 = D.1615_4-next; p1_3(D)-head = D.1616_8; bb 4: D.1615_4 = p1_3(D)-head; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #5 from steven at gcc dot gnu dot org 2009-07-14 09:18 --- As you said, try_crossjump_bb tries to find the same instruction sequence in *all* predecessors of a basic block bb. Meaning that the load must have been redundant even before cross jumping occurred. If you are right, and this redundancy is not exposed until after register allocation, then this may be another case that postreload-gcse should handle (but probably does not). There are a couple of bugs about the need for a more powerful postreload-gcse. We should perhaps group them (in a meta-bug for example) and make a plan for a fix... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #6 from steven at gcc dot gnu dot org 2009-07-14 09:20 --- Carrot, can you please try this test case with my patch crossjump_abstract.diff from Bug 20070 applied? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #1 from carrot at google dot com 2009-07-13 08:58 --- Created an attachment (id=18183) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18183action=view) test case -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-07-13 09:50 --- -fgcse-las should do the trick. Note that PRE would do this kind of optimization on the tree-level, but it is disabled with -Os (so is gcse). bb 2: D.1614_2 = p2_1(D)-front; p1_3(D)-head = D.1614_2; goto bb 4; bb 3: D.1616_8 = D.1615_4-next; p1_3(D)-head = D.1616_8; bb 4: D.1615_4 = p1_3(D)-head; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730
[Bug target/40730] redundant memory load
--- Comment #3 from steven at gcc dot gnu dot org 2009-07-13 10:10 --- And no, it is *not* OK to remove this kind of redundant code in DCE. The load may be redundant, but it is not dead. It is not clear to me why cleanup_cfg would move that insn. Perhaps you can show what is going on with the RTL dumps (use -fdump-rtl-all-slim for the most readable results). -- steven at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-07-13 10:10:07 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730