[Bug target/40730] redundant memory load

2023-05-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730

--- Comment #15 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #14)
> Fixed for GCC 12 by r12-897-gde56f95afaaa22 (and r11-408-g84935c9822183c).

The first redundant load was fixed by r11-408-g84935c9822183c.

The extra store was fixed was fixed by r12-897-gde56f95afaaa22 .

But it is still fixed fully.

[Bug target/40730] redundant memory load

2023-05-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=33315
  Known to fail||10.1.0, 11.1.0, 5.1.0
  Known to work||12.1.0, 13.1.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #14 from Andrew Pinski  ---
Fixed for GCC 12 by r12-897-gde56f95afaaa22 (and r11-408-g84935c9822183c).

[Bug target/40730] redundant memory load

2019-03-05 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730

Steven Bosscher  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|steven at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org

--- Comment #13 from Steven Bosscher  ---
Not working on this...

[Bug target/40730] redundant memory load

2010-01-11 Thread stevenb dot gcc at gmail dot com


--- Comment #11 from stevenb dot gcc at gmail dot com  2010-01-11 08:22 
---
Subject: Re:  redundant memory load

On Mon, Jan 11, 2010 at 7:47 AM, carrot at google dot com
gcc-bugzi...@gcc.gnu.org wrote:
 iterate:
 push{lr}
 ldr r3, [r1]
 .L6:
 str r3, [r0]
 sub r2, r3, #0
 bne .L5
 b   .L3
 .L4:
 ldr r3, [r3, #8]
 b   .L6
 .L5:
 ldr r1, [r3, #4]
 cmp r1, #0
 beq .L4
 .L3:
 str r2, [r0, #12]
 @ sp needed for prologue
 pop {pc}

 Carrot, could you please double-check that this is still correct code?


 Yes, it is correct.
 There are still 13 instructions, I think it is related to unoptimized basic
 block order.

Yes, I would have expected the block starting with .L4 to be *after*
the block starting with .L5, something like so:

iterate:
push{lr}
ldr r3, [r1]
.L6:
str r3, [r0]
sub r2, r3, #0
beq .L3
.L5:
ldr r1, [r3, #4]
cmp r1, #0
bne .L3
ldr r3, [r3, #8]
b   .L6
.L3:
str r2, [r0, #12]
@ sp needed for prologue
pop {pc}

Does that look correct?  And if so, could you see if there is an open
bug report about this; or otherwise file a new PR and add me to the
CC-list?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2010-01-11 Thread carrot at google dot com


--- Comment #12 from carrot at google dot com  2010-01-11 08:55 ---
(In reply to comment #11)
 Yes, I would have expected the block starting with .L4 to be *after*
 the block starting with .L5, something like so:
 
 iterate:
 push{lr}
 ldr r3, [r1]
 .L6:
 str r3, [r0]
 sub r2, r3, #0
 beq .L3
 .L5:
 ldr r1, [r3, #4]
 cmp r1, #0
 bne .L3
 ldr r3, [r3, #8]
 b   .L6
 .L3:
 str r2, [r0, #12]
 @ sp needed for prologue
 pop {pc}
 
 Does that look correct?  And if so, could you see if there is an open
 bug report about this; or otherwise file a new PR and add me to the
 CC-list?
 

It is correct. The basic block ordering issue (-Os) has been observed several
times. Following are related PRs:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41004
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41396


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2010-01-10 Thread carrot at google dot com


--- Comment #10 from carrot at google dot com  2010-01-11 06:47 ---
(In reply to comment #9)
 With GCC: (GNU) 4.5.0 20100108 (experimental) [trunk revision 155731] and my
 patch for bug 20070 applied, I get the following code:
 
 iterate:
 push{lr}
 ldr r3, [r1]
 .L6:
 str r3, [r0]
 sub r2, r3, #0
 bne .L5
 b   .L3
 .L4:
 ldr r3, [r3, #8]
 b   .L6
 .L5:
 ldr r1, [r3, #4]
 cmp r1, #0
 beq .L4
 .L3:
 str r2, [r0, #12]
 @ sp needed for prologue
 pop {pc}
 
 Carrot, could you please double-check that this is still correct code?
 

Yes, it is correct.
There are still 13 instructions, I think it is related to unoptimized basic
block order.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2010-01-08 Thread steven at gcc dot gnu dot org


--- Comment #9 from steven at gcc dot gnu dot org  2010-01-08 09:34 ---
With GCC: (GNU) 4.5.0 20100108 (experimental) [trunk revision 155731] and my
patch for bug 20070 applied, I get the following code:

iterate:
push{lr}
ldr r3, [r1]
.L6:
str r3, [r0]
sub r2, r3, #0
bne .L5
b   .L3
.L4:
ldr r3, [r3, #8]
b   .L6
.L5:
ldr r1, [r3, #4]
cmp r1, #0
beq .L4
.L3:
str r2, [r0, #12]
@ sp needed for prologue
pop {pc}

Carrot, could you please double-check that this is still correct code?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-15 Thread carrot at google dot com


--- Comment #7 from carrot at google dot com  2009-07-15 08:07 ---
(In reply to comment #6)
 Carrot, can you please try this test case with my patch
 crossjump_abstract.diff from Bug 20070 applied?
 

I tried your patch. It did remove the redundant memory load. Following is the
output

push{lr}
ldr r3, [r1]
.L6:
str r3, [r0]
mov r2, r3  // M
cmp r3, #0
bne .L5
b   .L3
.L4:
ldr r3, [r3, #8]
b   .L6
.L5:
ldr r1, [r3, #4]
cmp r1, #0
beq .L4
.L3:
str r2, [r0, #12]
@ sp needed for prologue
pop {pc}

In pass ifcvt it noticed the difference of two stores is the different pseudo
register number and there is no conflict between the two pseudo registers, so
it rename one of them to the same as another and do basic block cross jump on
them earlier. Then pass iterate.c.161r.cse2 detected the redundant load and
remove it.

But it introduced another redundant move instruction marked as M. At the place
r2 is used, r3 still contain the same result as r2, so we can also use r3
there. I think this is another problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-15 Thread steven at gcc dot gnu dot org


--- Comment #8 from steven at gcc dot gnu dot org  2009-07-15 09:47 ---
That redundant move has to be a separate issue, indeed.  I would expect the
register allocator to coalesce those registers.

I hadn't expected this.  I thought the result would be just the removal of the
redundant load, but the code that comes out is bigger (14 instructions instead
of 13) and has a completely different structure.

I'll see if I can understand what is going on. Thus, mine.


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

  BugsThisDependsOn||20070
 AssignedTo|unassigned at gcc dot gnu   |steven at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2009-07-13 10:10:07 |2009-07-15 09:47:09
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-14 Thread carrot at google dot com


--- Comment #4 from carrot at google dot com  2009-07-14 09:14 ---
In TREE level, the two stores are different statements. Only after register
allocation, the two stores get same register and make the load redundant.

try_crossjump_bb tries to find same instruction sequence in all predecessors of
a basic block bb, and move that code sequence to head of bb. It is triggered by
this function, and the store is moved just before the load.

I tried -fgcse-las but it couldn't do the work.

(In reply to comment #2)
 -fgcse-las should do the trick.  Note that PRE would do this kind of
 optimization on the tree-level, but it is disabled with -Os (so is gcse).
 
 bb 2:
   D.1614_2 = p2_1(D)-front;
   p1_3(D)-head = D.1614_2;
   goto bb 4;
 
 bb 3:
   D.1616_8 = D.1615_4-next;
   p1_3(D)-head = D.1616_8;
 
 bb 4:
   D.1615_4 = p1_3(D)-head;
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-14 Thread steven at gcc dot gnu dot org


--- Comment #5 from steven at gcc dot gnu dot org  2009-07-14 09:18 ---
As you said, try_crossjump_bb tries to find the same instruction sequence in
*all* predecessors of a basic block bb. Meaning that the load must have been
redundant even before cross jumping occurred.

If you are right, and this redundancy is not exposed until after register
allocation, then this may be another case that postreload-gcse should handle
(but probably does not).

There are a couple of bugs about the need for a more powerful postreload-gcse.
We should perhaps group them (in a meta-bug for example) and make a plan for a
fix...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-14 Thread steven at gcc dot gnu dot org


--- Comment #6 from steven at gcc dot gnu dot org  2009-07-14 09:20 ---
Carrot, can you please try this test case with my patch
crossjump_abstract.diff from Bug 20070 applied?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-13 Thread carrot at google dot com


--- Comment #1 from carrot at google dot com  2009-07-13 08:58 ---
Created an attachment (id=18183)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18183action=view)
test case


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-13 Thread rguenth at gcc dot gnu dot org


--- Comment #2 from rguenth at gcc dot gnu dot org  2009-07-13 09:50 ---
-fgcse-las should do the trick.  Note that PRE would do this kind of
optimization on the tree-level, but it is disabled with -Os (so is gcse).

bb 2:
  D.1614_2 = p2_1(D)-front;
  p1_3(D)-head = D.1614_2;
  goto bb 4;

bb 3:
  D.1616_8 = D.1615_4-next;
  p1_3(D)-head = D.1616_8;

bb 4:
  D.1615_4 = p1_3(D)-head;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730



[Bug target/40730] redundant memory load

2009-07-13 Thread steven at gcc dot gnu dot org


--- Comment #3 from steven at gcc dot gnu dot org  2009-07-13 10:10 ---
And no, it is *not* OK to remove this kind of redundant code in DCE. The load
may be redundant, but it is not dead. 

It is not clear to me why cleanup_cfg would move that insn.  Perhaps you can
show what is going on with the RTL dumps (use -fdump-rtl-all-slim for the most
readable results).


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-07-13 10:10:07
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40730