[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #22 from sandra at gcc dot gnu.org --- My nios2-elf test results look good now with this patch. Thanks!
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #21 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:dd9ca9d770a18ce4b16d867f49fef3293b483ff5 commit r10-7635-gdd9ca9d770a18ce4b16d867f49fef3293b483ff5 Author: Richard Biener Date: Wed Apr 8 14:04:35 2020 +0200 rtl-optimization/93946 - fix TBAA for redundant store removal in CSE It turns out RTL CSE tries to remove redundant stores but fails to do the usual validity check what such a change is TBAA neutral to later loads. This now triggers with the PR93946 testcases on nios2. 2020-04-08 Richard Biener PR rtl-optimization/93946 * cse.c (cse_insn): Record the tabled expression in src_related. Verify a redundant store removal is valid.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #20 from Richard Biener --- So for the CSE issue we go through the equivalence chain and find (gdb) p debug_rtx (p->exp) (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) 5076 if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest)) 5077src_related = dest; ... 5121 if (rtx_equal_p (src_related, dest)) 5122src_related_cost = src_related_regcost = -1; the first issue is that we're recording dest as src_related here losing the opportunity to do a validity check later on. If we fix that we can do the usual validity check, copied from DSE: diff --git a/gcc/cse.c b/gcc/cse.c index 3e8724b3fed..f07bbdbebad 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -5074,7 +5074,7 @@ cse_insn (rtx_insn *insn) to prefer it. Copy it to src_related. The code below will then give it a negative cost. */ if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest)) - src_related = dest; + src_related = p->exp; } /* Find the cheapest valid equivalent, trying all the available @@ -5332,7 +5332,16 @@ cse_insn (rtx_insn *insn) && rtx_equal_p (trial, dest) && !side_effects_p (dest) && (cfun->can_delete_dead_exceptions - || insn_nothrow_p (insn))) + || insn_nothrow_p (insn)) + /* We can only remove the later store if the earlier aliases + at least all accesses the later one. */ + && (!MEM_P (trial) + || ((MEM_ALIAS_SET (dest) == MEM_ALIAS_SET (trial) + || alias_set_subset_of (MEM_ALIAS_SET (dest), + MEM_ALIAS_SET (trial))) + && (!MEM_EXPR (trial) + || refs_same_for_tbaa_p (MEM_EXPR (trial), +MEM_EXPR (dest)) { SET_SRC (sets[i].rtl) = trial; noop_insn = true; that gives us for the testcase foo: movir2, 1 stw r2, 0(r4) stw zero, 0(r5) ldw r2, 0(r4) stw zero, 4(r5) ret which looks correct to me. I'm going to test the above on x86_64-linux.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #19 from Richard Biener --- (In reply to sandra from comment #18) > Hmmm, it looks to me like things are going wrong in the tree fre1 pass too. > I commented out the redundant zero store in the test case to see what would > happen, like > > long __attribute__((noipa)) > foo (struct bb *bv, void *ptr) > { > struct aa *a = ptr; > struct bb *b = ptr; > bv->b.u.f = 1; > a->a.u.i = 0; > /* b->b.u.f = 0; */ > return bv->b.u.f; > } > > fre1 gets this as input: > > __attribute__((noipa, noinline, noclone, no_icf)) > foo (struct bb * bv, void * ptr) > { > struct bb * b; > struct aa * a; > long int _8; > >: > bv_5(D)->b.u.f = 1; > MEM[(struct aa *)ptr_1(D)].a.u.i = 0; > _8 = bv_5(D)->b.u.f; > return _8; > > } > > > and produces this output: > > __attribute__((noipa, noinline, noclone, no_icf)) > foo (struct bb * bv, void * ptr) > { > struct bb * b; > struct aa * a; > >: > bv_5(D)->b.u.f = 1; > MEM[(struct aa *)ptr_1(D)].a.u.i = 0; > return 1; > > } > > This is with -O2. > > Again, it seems like something is not realizing that pointer parameters can > be aliased no matter what their types are. Or perhaps that is just a > symptom of some other underlying bug. :-S No, it's a feature and called type-based alias-analysis. The "redundant" store is what eventually aliases with the load, the other store to the same location can be disambiguated using TBAA. So the second store is not really redudnant - while it has no effect on the bits in the memory it alters the memory state by changing the effective type of the memory location to one compatible with the following load (or since we're dealing with unions, it changes the active union member). This is why we cannot elide the "redudnant" store. We can elide the earlier store as dead though (since there's no intermediate use).
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #18 from sandra at gcc dot gnu.org --- Hmmm, it looks to me like things are going wrong in the tree fre1 pass too. I commented out the redundant zero store in the test case to see what would happen, like long __attribute__((noipa)) foo (struct bb *bv, void *ptr) { struct aa *a = ptr; struct bb *b = ptr; bv->b.u.f = 1; a->a.u.i = 0; /* b->b.u.f = 0; */ return bv->b.u.f; } fre1 gets this as input: __attribute__((noipa, noinline, noclone, no_icf)) foo (struct bb * bv, void * ptr) { struct bb * b; struct aa * a; long int _8; : bv_5(D)->b.u.f = 1; MEM[(struct aa *)ptr_1(D)].a.u.i = 0; _8 = bv_5(D)->b.u.f; return _8; } and produces this output: __attribute__((noipa, noinline, noclone, no_icf)) foo (struct bb * bv, void * ptr) { struct bb * b; struct aa * a; : bv_5(D)->b.u.f = 1; MEM[(struct aa *)ptr_1(D)].a.u.i = 0; return 1; } This is with -O2. Again, it seems like something is not realizing that pointer parameters can be aliased no matter what their types are. Or perhaps that is just a symptom of some other underlying bug. :-S
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #17 from Richard Biener --- x86 splits large stores only after reload it seems. Also on x86 GIMPLE store-merging triggers, merging the two = 0 stores so I need -fno-store-merging to even get two stores there. Still the issue in CSE is obviously there...
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #16 from Richard Biener --- OK, now looking myself. RTL expansion creates (insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]) (reg:SI 49)) "t.c":12:13 -1 (nil)) (insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) (const_int 0 [0])) "t.c":13:12 -1 (nil)) (insn 10 9 11 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ]) (const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4 A32]) (const_int 0 [0])) "t.c":13:12 -1 (nil)) (insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4 A32]) (const_int 0 [0])) "t.c":14:12 -1 (nil)) (insn 12 11 13 2 (set (reg:SI 51) (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32])) "t.c":15:17 -1 (nil)) where insn 11 is the important one. Somehow on nios2 the CSE1 removes that store. deferring deletion of insn with uid = 11. and we end up with (insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]) (reg:SI 49)) "t.c":12:13 5 {movsi_internal} (expr_list:REG_DEAD (reg:SI 49) (nil))) (insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) (const_int 0 [0])) "t.c":13:12 5 {movsi_internal} (nil)) (insn 10 9 12 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ]) (const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4 A32]) (const_int 0 [0])) "t.c":13:12 5 {movsi_internal} (nil)) (insn 12 10 13 2 (set (reg:SI 51 [ bv_3(D)->b.u.f ]) (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32])) "t.c":15:17 5 {movsi_internal} (expr_list:REG_DEAD (reg/v/f:SI 47 [ bv ]) (nil))) where there indeed is no scheduling barrier anymore. I didn't know CSE removes stores or why this only triggers on nios2, it looks like some DF thing? Backtrace of the "DSE": #0 delete_insn (insn=0x76bc3400) at /space/rguenther/src/gcc/gcc/cfgrtl.c:135 #1 0x00b0bfa5 in delete_insn_and_edges (insn=0x76bc3400) at /space/rguenther/src/gcc/gcc/cfgrtl.c:237 #2 0x01a9d8eb in cse_insn (insn=0x76bc3400) at /space/rguenther/src/gcc/gcc/cse.c:5571 #3 0x01aa0b76 in cse_extended_basic_block (ebb_data=0x7fffdc90) at /space/rguenther/src/gcc/gcc/cse.c:6614 #4 0x01aa10a5 in cse_main (f=0x76cce310, nregs=52) at /space/rguenther/src/gcc/gcc/cse.c:6793 that's /* Similarly for no-op moves. */ else if (noop_insn) { if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn)) cse_cfg_altered = true; cse_cfg_altered |= delete_insn_and_edges (insn); /* No more processing for this set. */ sets[i].rtl = 0; so appearantly it does redundant store removal as well... /* Similarly, lots of targets don't allow no-op (set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) might be impossible for certain registers (like CC registers). */ else if (n_sets == 1 && !CALL_P (insn) && (MEM_P (trial) || REG_P (trial)) && rtx_equal_p (trial, dest) && !side_effects_p (dest) && (cfun->can_delete_dead_exceptions || insn_nothrow_p (insn))) { SET_SRC (sets[i].rtl) = trial; noop_insn = true; break; } where (gdb) p debug_rtx (insn) (insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4 A32]) (const_int 0 [0])) "t.c":14:12 5 {movsi_internal} (expr_list:REG_DEAD (reg/v/f:SI 48 [ ptr ]) (nil))) (gdb) p debug_rtx (trial) (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4 A32]) $4 = void (gdb) p debug_rtx (dest) (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4 A32]) $6 = void so it might be that the trigger is a target where sizeof(long long) = 2 * sizeof(long) _and_ we split stores to the larger type (I tried to pick a set of types where sizeof is the same but alias-sets are different - otherwise I'd have to cater for big vs. little-endian).
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #15 from sandra at gcc dot gnu.org --- Hmmm. I've gone over this code 2 or 3 times now, and I'm still convinced the problem is in the alias analysis, not the scheduler. I've stepped deeper into the code in the debugger, and here is the backtrace from where I see things going wrong: #0 indirect_refs_may_alias_p (ref1=0x774196f0, base1=0x773fec08, offset1=..., max_size1=..., size1=..., ref1_alias_set=1, base1_alias_set=4, ref2=0x774195d0, base2=0x773febb8, offset2=..., max_size2=..., size2=..., ref2_alias_set=1, base2_alias_set=6, tbaa_p=true) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2122 #1 0x013f8266 in refs_may_alias_p_2 (ref1=0x7fffd7d0, ref2=0x7fffd790, tbaa_p=true) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2320 #2 0x013f82bb in refs_may_alias_p_1 (ref1=0x7fffd7d0, ref2=0x7fffd790, tbaa_p=true) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2339 #3 0x00b90b06 in rtx_refs_may_alias_p (x=0x7742cac8, mem=0x7742c9a8, tbaa_p=true) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:365 #4 0x00b981de in true_dependence_1 (mem=0x7742c9a8, mem_mode=E_SImode, mem_addr=0x7742c7e0, x=0x7742cac8, x_addr=0x7742c798, mem_canonicalized=false) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3048 The code here says /* Do type-based disambiguation. */ if (base1_alias_set != base2_alias_set && !alias_sets_conflict_p (base1_alias_set, base2_alias_set)) return false; and the "false" return status gets propagated all the way back up to true_dependence. It seems to me that what is going wrong is that it is failing to consider that two pointer parameters can be aliased no matter what their declared type is, and no matter what types they are cast to -- e.g. because they point to members of the same union. Should ao_ref_base_alias_set be putting everything based on pointer parameters without restrict semantics into the same alias set, maybe? Or should there be some code in indirect_refs_may_alias_p to look for this situation before it reaches the point of type-based disambiguation where it is currently failing to DTRT?
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #14 from Richard Biener --- (In reply to sandra from comment #13) > Well, no. The problem is that the scheduler is moving > > ldw r2, 0(r4) > > ahead of > > stw zero, 0(r5) > > which is incorrect because the pointers in r4 and r5 are aliases. Ah, so the scheduler needs to call anti_dependence (WAR). > So at the point of call to true_dependence, I see: > > (gdb) frame 1 > #1 0x01d1a108 in sched_analyze_2 (deps=0x7fffdd50, > x=0x7742cac8, insn=0x77315600) > at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/sched-deps.c:2663 > 2663if (true_dependence (pending_mem->element (), > VOIDmode, t) > (gdb) print debug_rtx(insn) > (insn 17 10 18 2 (set (reg/i:SI 2 r2) > (mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 > S4 A32])) > "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/ > pr93946-1.c":18:1 5 {movsi_internal} > (expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) > (nil))) > $3 = void > (gdb) print debug_rtx(pending->insn()) > (insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 > MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) > (const_int 0 [0])) > "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/ > pr93946-1.c":15:12 5 {movsi_internal} > (nil)) > $4 = void
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #13 from sandra at gcc dot gnu.org --- Well, no. The problem is that the scheduler is moving ldw r2, 0(r4) ahead of stw zero, 0(r5) which is incorrect because the pointers in r4 and r5 are aliases. So at the point of call to true_dependence, I see: (gdb) frame 1 #1 0x01d1a108 in sched_analyze_2 (deps=0x7fffdd50, x=0x7742cac8, insn=0x77315600) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/sched-deps.c:2663 2663if (true_dependence (pending_mem->element (), VOIDmode, t) (gdb) print debug_rtx(insn) (insn 17 10 18 2 (set (reg/i:SI 2 r2) (mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4 A32])) "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/pr93946-1.c":18:1 5 {movsi_internal} (expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) (nil))) $3 = void (gdb) print debug_rtx(pending->insn()) (insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) (const_int 0 [0])) "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 5 {movsi_internal} (nil)) $4 = void
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #12 from rguenther at suse dot de --- On March 27, 2020 9:19:33 PM GMT+01:00, "sandra at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 > >--- Comment #11 from sandra at gcc dot gnu.org --- >RTL before sched2 does look sane and similar to that generated for x86 >with >-m32. > >I've been trying to step through sched2. I think that where things get >interesting is the call to true_dependence at sched-deps.c:2663. > >Breakpoint 6, true_dependence (mem=0x7742c9a8, mem_mode=E_VOIDmode, > >x=0x7742cac8) >at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3056 >3056 return true_dependence_1 (mem, mem_mode, NULL_RTX, >(gdb) print debug_rtx(mem) >(mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa >*)ptr_1(D)].a.u.i+0 S4 A32]) >$19 = void >(gdb) print debug_rtx(x) >(mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4 >A32]) >$20 = void > >This is making it all the way to the end of true_dependence_1, into >rtx_refs_may_alias_p, and into refs_may_alias_p_1, which is returning >false, >which gets propagated back up as the result of true_dependence. IIUC, >this is >what is allowing sched2 to move the read from "x" ahead of the write to >"mem". > >Before I spend more time on this, am I on the right track here? And is >this >pointing at the problem being in refs_may_alias_p_1 rather than >somewhere along >the way e.g. in true_dependence_1? Those are two stores, right? Then true_dependence is the wrong predicate to look at it should be output_dependence instead.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #11 from sandra at gcc dot gnu.org --- RTL before sched2 does look sane and similar to that generated for x86 with -m32. I've been trying to step through sched2. I think that where things get interesting is the call to true_dependence at sched-deps.c:2663. Breakpoint 6, true_dependence (mem=0x7742c9a8, mem_mode=E_VOIDmode, x=0x7742cac8) at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3056 3056 return true_dependence_1 (mem, mem_mode, NULL_RTX, (gdb) print debug_rtx(mem) (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) $19 = void (gdb) print debug_rtx(x) (mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4 A32]) $20 = void This is making it all the way to the end of true_dependence_1, into rtx_refs_may_alias_p, and into refs_may_alias_p_1, which is returning false, which gets propagated back up as the result of true_dependence. IIUC, this is what is allowing sched2 to move the read from "x" ahead of the write to "mem". Before I spend more time on this, am I on the right track here? And is this pointing at the problem being in refs_may_alias_p_1 rather than somewhere along the way e.g. in true_dependence_1?
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #10 from Richard Biener --- (In reply to sandra from comment #9) > Both the new test cases are failing on nios2 at -Os, -O2, and -O3. I've > done some analysis, but I'm not sure exactly where the problem lies, and > whether this is a problem in the nios2 back end or somewhere else. > > long __attribute__((noipa)) > foo (struct bb *bv, void *ptr) > { > struct aa *a = ptr; > struct bb *b = ptr; > bv->b.u.f = 1; > a->a.u.i = 0; > b->b.u.f = 0; > return bv->b.u.f; > } > > is compiling to > > foo: > movir2, 1 > stw r2, 0(r4) > ldw r2, 0(r4) > stw zero, 0(r5) > stw zero, 4(r5) > ret > > What's going on here is that load instructions have 3-cycle latency on > nios2, so the sched2 pass is moving the "ldw r2, 0(r4)" to load the return > value 2 instructions earlier ahead of the store instruction to the same > location via the aliased pointer. :-( > > I'm not an expert on the instruction scheduler, and it seems like the target > hooks and machine description syntax are all focused on modelling pipeline > costs in order to minimize stalls, not telling the scheduler that certain > instructions cannot be correctly reordered at all. Should some other pass > be inserting optimization barriers, or something like that? I feel like I'm > missing some big-picture expertise of where this needs to be fixed, so any > suggestions to point me in the right direction would be appreciated. The instruction scheduler needs to check dependences which should correctly model the constraints here already. So you need to start looking at the RTL before sched2 to see if that's sane (compare to say x86 RTL and look for obvious signs of errors - some target expanders made errors here in the past). Then debug sched2 as to what true/anti/output_dependence tests it performs and why those tell sched2 moving is OK. As you can see the fix involved touching many passes so no doubt there can be more issues elsewhere ...
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 sandra at gcc dot gnu.org changed: What|Removed |Added CC||sandra at gcc dot gnu.org --- Comment #9 from sandra at gcc dot gnu.org --- Both the new test cases are failing on nios2 at -Os, -O2, and -O3. I've done some analysis, but I'm not sure exactly where the problem lies, and whether this is a problem in the nios2 back end or somewhere else. long __attribute__((noipa)) foo (struct bb *bv, void *ptr) { struct aa *a = ptr; struct bb *b = ptr; bv->b.u.f = 1; a->a.u.i = 0; b->b.u.f = 0; return bv->b.u.f; } is compiling to foo: movir2, 1 stw r2, 0(r4) ldw r2, 0(r4) stw zero, 0(r5) stw zero, 4(r5) ret What's going on here is that load instructions have 3-cycle latency on nios2, so the sched2 pass is moving the "ldw r2, 0(r4)" to load the return value 2 instructions earlier ahead of the store instruction to the same location via the aliased pointer. :-( I'm not an expert on the instruction scheduler, and it seems like the target hooks and machine description syntax are all focused on modelling pipeline costs in order to minimize stalls, not telling the scheduler that certain instructions cannot be correctly reordered at all. Should some other pass be inserting optimization barriers, or something like that? I feel like I'm missing some big-picture expertise of where this needs to be fixed, so any suggestions to point me in the right direction would be appreciated.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 Richard Biener changed: What|Removed |Added Keywords||wrong-code Status|ASSIGNED|RESOLVED Known to work||10.0 Resolution|--- |FIXED Known to fail|10.0| --- Comment #8 from Richard Biener --- Fixed on trunk.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #7 from CVS Commits --- The master branch has been updated by Richard Guenther : https://gcc.gnu.org/g:3d6fd7ce6dc4b6baa11920387d62dc001980aa70 commit r10-6993-g3d6fd7ce6dc4b6baa11920387d62dc001980aa70 Author: Richard Biener Date: Tue Mar 3 11:01:09 2020 +0100 tree-optimization/93946 - fix bogus redundant store removal in FRE, DSE and DOM This fixes a common mistake in removing a store that looks redudnant but is not because it changes the dynamic type of the memory and thus makes a difference for following loads with TBAA. 2020-03-03 Richard Biener PR tree-optimization/93946 * alias.h (refs_same_for_tbaa_p): Declare. * alias.c (refs_same_for_tbaa_p): New function. * tree-ssa-alias.c (ao_ref_alias_set): For a NULL ref return zero. * tree-ssa-scopedtables.h (avail_exprs_stack::lookup_avail_expr): Add output argument giving access to the hashtable entry. * tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr): Likewise. * tree-ssa-dom.c: Include alias.h. (dom_opt_dom_walker::optimize_stmt): Validate TBAA state before removing redundant store. * tree-ssa-sccvn.h (vn_reference_s::base_set): New member. (ao_ref_init_from_vn_reference): Adjust prototype. (vn_reference_lookup_pieces): Likewise. (vn_reference_insert_pieces): Likewise. * tree-ssa-sccvn.c: Track base alias set in addition to alias set everywhere. (eliminate_dom_walker::eliminate_stmt): Also check base alias set when removing redundant stores. (visit_reference_op_store): Likewise. * dse.c (record_store): Adjust valdity check for redundant store removal. * gcc.dg/torture/pr93946-1.c: New testcase. * gcc.dg/torture/pr93946-2.c: Likewise.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #6 from Richard Biener --- Created attachment 47948 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47948=edit DSE part I'm now testing the combined (and will squash if that succeeds).
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 Richard Biener changed: What|Removed |Added Attachment #47922|0 |1 is obsolete|| --- Comment #5 from Richard Biener --- Created attachment 47947 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47947=edit patch w/o RTL DSE
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #4 from Richard Biener --- OK, so the fix works but on x86_64 with -m32 RTL opts (sched2) end up wrecking things... foo: .LFB0: .cfi_startproc movl4(%esp), %eax movl8(%esp), %edx movl$1, (%eax) movl(%eax), %eax movl$0, (%edx) movl$0, 4(%edx) ret (insn:TI 7 18 10 2 (set (mem/j:SI (reg/v/f:SI 0 ax [orig:83 bv ] [83]) [1 bv_3(D)->b.u.f+0 S4 A32]) (const_int 1 [0x1])) "/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":14:13 67 {*movsi_internal} (nil)) (insn 18 7 20 2 (set (reg/v/f:SI 1 dx [orig:84 ptr ] [84]) (mem/f/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [9 ptr+0 S4 A32])) "/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 67 {*movsi_internal} (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 [0x4])) [9 ptr+0 S4 A32]) (nil))) (insn:TI 20 10 21 2 (set (mem/j:SI (reg/v/f:SI 1 dx [orig:84 ptr ] [84]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32]) (const_int 0 [0])) "/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 67 {*movsi_internal} (nil)) (insn:TI 21 20 16 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 1 dx [orig:84 ptr ] [84]) (const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4 A32]) (const_int 0 [0])) "/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 67 {*movsi_internal} (expr_list:REG_DEAD (reg/v/f:SI 1 dx [orig:84 ptr ] [84]) (nil))) (insn 10 7 20 2 (set (reg:SI 0 ax [orig:86 bv_3(D)->b.u.f ] [86]) (mem/j:SI (reg/v/f:SI 0 ax [orig:83 bv ] [83]) [1 bv_3(D)->b.u.f+0 S4 A32])) "/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":17:17 67 {*movsi_internal} (nil)) Ah, something dropped the store to b.u.f and split the store to i. DSE. Obviously. pr93946-1.c.263r.dse1: processing cselib load against insn 9 pr93946-1.c.263r.dse1:Locally deleting insn 9 because insn 8 stores the same value and couldn't be eliminated pr93946-1.c.263r.dse1:Locally deleting insn 9 indeed I remember touching DSE back in time. It still only does /* We can only remove the later store if the earlier aliases at least all accesses the later one. */ && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem) || alias_set_subset_of (MEM_ALIAS_SET (mem), MEM_ALIAS_SET (s_info->mem not considering the base set of the other store. Now, we could allow the DSE by clearing MEM_EXPR here or adjusting MEM_ALIAS_SET. But not sure if we want that?
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 --- Comment #3 from Richard Biener --- Created attachment 47922 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47922=edit patch I am testing This is what I have now. Got to factor out that alias set. In VN consider passing ao_ref everywhere, the order of set, base-set is easy to get wrong since there's no type safety.
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 Richard Biener changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #2 from Richard Biener --- Ick. And DSEs "new" redundant store removal code is completely buggy in this regard as well... adding noipa to foo makes the testcase still fail when FRE is fixed. So fixing that as well... (and then there's still DOM as well, with -fno-tree-dse which would delete the dead store).
[Bug tree-optimization/93946] Bogus redundant store removal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-02-26 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Known to fail||10.0, 4.8.5, 7.5.0, 9.2.0 --- Comment #1 from Richard Biener --- Mine.