Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
2. gcse.c: gcse_emit_move_after added notes, but none of them was very useful as far as I could tell, and almost all of them turned self-referencing after CPROP. So I propose we just never add notes in this case. gcse_emit_move_after also preserves existing notes. Are they problematic? Yes, they tend to be invalid after PRE because the registers used in the PRE'd expression usually are not live anymore (making the note invalid). Sometimes CPROP re-validates the notes, but it doesn't seem wise to me to rely on that. So the compiler doesn't bootstrap with the gcse.c patch you posted earlier in the thread? Still this seems too bold to me, the note datum could be a constant and should be preserved in this case. 3. cprop.c: It seems to me that the purpose here is to propagate constants. If a reg could not be propagated, then the REG_EQUAL note will not help much either. Propagating constants via REG_EQUAL notes still helps folding comparisons sometimes, so I'm proposing we only propagate those. As a bonus: less garbage RTL because a cprop_constant_p can be shared. That seems a bit radical to me, especially in try_replace_reg which is used for copy propagation as well. In cprop_jump, why is attaching a note to the jump problematic? Most of the time a note from copy-propagation was not valid because the copy-prop'd reg was not live at the point of the note. This one I think we should drop for now, or just avoid the self-referential case. There is a comment explicitly saying that the REG_EQUAL note added by try_replace_reg are part of the algorithm. Not really. It uses single_set in a few places, including delete_trivially_dead_insns and cse_extended_basic_block. so it seems like we're back to the earlier trick of using df_note_add_problem to clean up pre-existing REG_EQ* notes. Again: Not really. I also bootstrappedtested without the cse.c change. The cse.c hunk is OK then. I plan (and promise ;-) to add a REG_EQ* note verifier for GCC 4.9. Thanks (no need to promise though :-), that will be helpful. In the meantime, I don't think that we should aim for perfection in 4.8, these REG_EQ* notes and their quirks have been with us for a long time... -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 01/12/2012 15:54, Eric Botcazou ha scritto: Attached is a different fix. It splits DF_REF_IN_NOTE in two: One flag for each kind of note. This allows the dead note removal code to distinguish the source note for the EQ_USES. I needed to remove one flag to keep the df_ref_flags 16-bit, but the DF_REF_SUBREG flag looks completely unnecessary to me, and only ira.c uses it, so it wasn't to hard to scavenge a single bit. I'll defer this to Paolo. Ok, I'll review this tomorrow. Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Dec 3, 2012 at 7:23 PM, Eric Botcazou wrote: 2. gcse.c: gcse_emit_move_after added notes, but none of them was very useful as far as I could tell, and almost all of them turned self-referencing after CPROP. So I propose we just never add notes in this case. gcse_emit_move_after also preserves existing notes. Are they problematic? Yes, they tend to be invalid after PRE because the registers used in the PRE'd expression usually are not live anymore (making the note invalid). Sometimes CPROP re-validates the notes, but it doesn't seem wise to me to rely on that. So the compiler doesn't bootstrap with the gcse.c patch you posted earlier in the thread? Still this seems too bold to me, the note datum could be a constant and should be preserved in this case. You mean the patch at http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02275.html right? I haven't tried that other patch. I'll test that one. 3. cprop.c: It seems to me that the purpose here is to propagate constants. If a reg could not be propagated, then the REG_EQUAL note will not help much either. Propagating constants via REG_EQUAL notes still helps folding comparisons sometimes, so I'm proposing we only propagate those. As a bonus: less garbage RTL because a cprop_constant_p can be shared. That seems a bit radical to me, especially in try_replace_reg which is used for copy propagation as well. In cprop_jump, why is attaching a note to the jump problematic? Most of the time a note from copy-propagation was not valid because the copy-prop'd reg was not live at the point of the note. This one I think we should drop for now, or just avoid the self-referential case. There is a comment explicitly saying that the REG_EQUAL note added by try_replace_reg are part of the algorithm. I suppose so. But this was all added before RTL fwprop and way before GIMPLE optimizations. Avoiding the self-referential case is just more difficult to do, quite expensive (have to scan the SET_SRC pattern), and AFAICT doesn't bring much pay-off. I'll prepare something to avoid the self-referential case, but I think we're making our lives complicated for no good reason. Not really. It uses single_set in a few places, including delete_trivially_dead_insns and cse_extended_basic_block. so it seems like we're back to the earlier trick of using df_note_add_problem to clean up pre-existing REG_EQ* notes. Again: Not really. I also bootstrappedtested without the cse.c change. The cse.c hunk is OK then. Thanks, I'll commit it separately. I plan (and promise ;-) to add a REG_EQ* note verifier for GCC 4.9. Thanks (no need to promise though :-), that will be helpful. In the meantime, I don't think that we should aim for perfection in 4.8, these REG_EQ* notes and their quirks have been with us for a long time... Well, yes they've been with us for a long time, but my LR_RD change exposed all these problems that were simply hidden before. I think we're safe for GCC 4.8 but I don't feel comfortable about this situation... Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
You mean the patch at http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02275.html right? I haven't tried that other patch. I'll test that one. Yes, thanks. I suppose so. But this was all added before RTL fwprop and way before GIMPLE optimizations. Avoiding the self-referential case is just more difficult to do, quite expensive (have to scan the SET_SRC pattern), and AFAICT doesn't bring much pay-off. I'll prepare something to avoid the self-referential case, but I think we're making our lives complicated for no good reason. Can't you just use the same best-effort approach used for fwprop.c here? -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Dec 3, 2012 at 9:19 PM, Steven Bosscher wrote: So the compiler doesn't bootstrap with the gcse.c patch you posted earlier in the thread? Still this seems too bold to me, the note datum could be a constant and should be preserved in this case. You mean the patch at http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02275.html right? I haven't tried that other patch. I'll test that one. ... The cse.c hunk is OK then. Thanks, I'll commit it separately. This is what I'll commit in a second. A new proposed cprop.c change will follow later. Ciao! Steven * gcse.c (struct reg_use): Remove unused struct. (gcse_emit_move_after): Do not create REG_EQUAL notes that reference the SET_DEST of the instruction the note would be attached to. * cse.c (cse_main): Add the DF_NOTE problem. Index: gcse.c === --- gcse.c (revision 194084) +++ gcse.c (working copy) @@ -255,8 +255,6 @@ int flag_rerun_cse_after_global_opts; /* An obstack for our working variables. */ static struct obstack gcse_obstack; -struct reg_use {rtx reg_rtx; }; - /* Hash table of expressions. */ struct expr @@ -2491,23 +2489,27 @@ gcse_emit_move_after (rtx dest, rtx src, rtx insn) rtx new_rtx; rtx set = single_set (insn), set2; rtx note; - rtx eqv; + rtx eqv = NULL_RTX; /* This should never fail since we're creating a reg-reg copy we've verified to be valid. */ new_rtx = emit_insn_after (gen_move_insn (dest, src), insn); - /* Note the equivalence for local CSE pass. */ + /* Note the equivalence for local CSE pass. Take the note from the old + set if there was one. Otherwise record the SET_SRC from the old set + unless DEST is also an operand of the SET_SRC. */ set2 = single_set (new_rtx); if (!set2 || !rtx_equal_p (SET_DEST (set2), dest)) return new_rtx; if ((note = find_reg_equal_equiv_note (insn))) eqv = XEXP (note, 0); - else + else if (! REG_P (dest) + || ! reg_mentioned_p (dest, SET_SRC (set))) eqv = SET_SRC (set); - set_unique_reg_note (new_rtx, REG_EQUAL, copy_insn_1 (eqv)); + if (eqv != NULL_RTX) +set_unique_reg_note (new_rtx, REG_EQUAL, copy_insn_1 (eqv)); return new_rtx; } Index: cse.c === --- cse.c (revision 194084) +++ cse.c (working copy) @@ -6520,6 +6520,7 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nregs) int i, n_blocks; df_set_flags (DF_LR_RUN_DCE); + df_note_add_problem (); df_analyze (); df_set_flags (DF_DEFER_INSN_RESCAN);
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Attached is a different fix. It splits DF_REF_IN_NOTE in two: One flag for each kind of note. This allows the dead note removal code to distinguish the source note for the EQ_USES. I needed to remove one flag to keep the df_ref_flags 16-bit, but the DF_REF_SUBREG flag looks completely unnecessary to me, and only ira.c uses it, so it wasn't to hard to scavenge a single bit. I'll defer this to Paolo. The patch also includes all places I've found so far where the compiler could create self-referencing notes: 1. optabs.c: Not sure what it was trying to do, but now it just refuses to add a note if TARGET is mentioned in one of the source operands. OK. 2. gcse.c: gcse_emit_move_after added notes, but none of them was very useful as far as I could tell, and almost all of them turned self-referencing after CPROP. So I propose we just never add notes in this case. gcse_emit_move_after also preserves existing notes. Are they problematic? 3. cprop.c: It seems to me that the purpose here is to propagate constants. If a reg could not be propagated, then the REG_EQUAL note will not help much either. Propagating constants via REG_EQUAL notes still helps folding comparisons sometimes, so I'm proposing we only propagate those. As a bonus: less garbage RTL because a cprop_constant_p can be shared. That seems a bit radical to me, especially in try_replace_reg which is used for copy propagation as well. In cprop_jump, why is attaching a note to the jump problematic? 4. fwprop.c: If the SET_DEST is a REG that is mentioned in the SET_SRC, don' create a note. This one I'm not very happy with, because it doesn't handle the case that the REG is somehow simplified out of the SET_SRC, but being smarter about this would only complicate things. I for one can't think of anything better for the moment, anyway. OK. Finally, it makes sense to compute the NOTE problem for CSE. Why? It only uses REG_EQ* notes, so it seems like we're back to the earlier trick of using df_note_add_problem to clean up pre-existing REG_EQ* notes. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sat, Dec 1, 2012 at 3:54 PM, Eric Botcazou wrote: The patch also includes all places I've found so far where the compiler could create self-referencing notes: 1. optabs.c: Not sure what it was trying to do, but now it just refuses to add a note if TARGET is mentioned in one of the source operands. OK. Thanks. I'll commit this separately. 2. gcse.c: gcse_emit_move_after added notes, but none of them was very useful as far as I could tell, and almost all of them turned self-referencing after CPROP. So I propose we just never add notes in this case. gcse_emit_move_after also preserves existing notes. Are they problematic? Yes, they tend to be invalid after PRE because the registers used in the PRE'd expression usually are not live anymore (making the note invalid). Sometimes CPROP re-validates the notes, but it doesn't seem wise to me to rely on that. 3. cprop.c: It seems to me that the purpose here is to propagate constants. If a reg could not be propagated, then the REG_EQUAL note will not help much either. Propagating constants via REG_EQUAL notes still helps folding comparisons sometimes, so I'm proposing we only propagate those. As a bonus: less garbage RTL because a cprop_constant_p can be shared. That seems a bit radical to me, especially in try_replace_reg which is used for copy propagation as well. In cprop_jump, why is attaching a note to the jump problematic? Most of the time a note from copy-propagation was not valid because the copy-prop'd reg was not live at the point of the note. 4. fwprop.c: If the SET_DEST is a REG that is mentioned in the SET_SRC, don' create a note. This one I'm not very happy with, because it doesn't handle the case that the REG is somehow simplified out of the SET_SRC, but being smarter about this would only complicate things. I for one can't think of anything better for the moment, anyway. OK. I'll commit this along with the optabs.c part. Finally, it makes sense to compute the NOTE problem for CSE. Why? It only uses REG_EQ* notes, Not really. It uses single_set in a few places, including delete_trivially_dead_insns and cse_extended_basic_block. so it seems like we're back to the earlier trick of using df_note_add_problem to clean up pre-existing REG_EQ* notes. Again: Not really. I also bootstrappedtested without the cse.c change. I plan (and promise ;-) to add a REG_EQ* note verifier for GCC 4.9. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Nov 28, 2012 at 8:44 AM, Eric Botcazou ebotca...@adacore.com wrote: Again: Is this really the direction we should be going in? (I haven't any better ideas...) Do you mean for self-referencing notes? If so, definitely, these notes are either wrong or almost always redundant so we should eliminate them. Well, I'm not sure I agree that they are wrong. Consider: S0: r1 = ... S1: r2 = r1 + 10 S2: r1 = r2 + 10 { REG_EQUAL r1 + 20 } S3: r3 = r1 + 10 Clearly it would be wrong to use the REG_EQUAL note on S2 to optimize S3 to: S0: r1 = ... S1: r2 = r1 + 10 S2: r1 = r1 + 20 S3: r3 = r1 + 30 However, It seems to me that this would be valid if e.g. the webizer turns the above into: S0: r1 = ... S1: r2 = r1 + 10 S2: r4 = r2 + 10 { REG_EQUAL r1 + 20 } S3: r3 = r4 + 10 because now the optimization would be valid: S0: r1 = ... S1: r2 = r1 + 10 S2: r4 = r1 + 20 S3: r3 = r1 + 30 So self-referencing REG_EQUAL notes should IMHO be considered valid, and transformations using the notes should just be careful to invalidate the equivalence before processing the insn that the note appears on. cse.c doesn't appear to do this, however. On a completely different note: df-problems.c has this comment: /* Remove the notes that refer to dead registers. As we have at most one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note so we need to purge the complete EQ_USES vector when removing the note using df_notes_rescan. */ But ira.c:update_equiv_regs creates insns with a REG_EQUAL *and* a REG_EQUIV note: (gdb) update_equiv_regs () at ../../trunk/gcc/ira.c:3177 3177= gen_rtx_INSN_LIST (VOIDmode, insn, NULL_RTX); 2: debug_rtx((rtx)0x74df5e58) = (insn 638 637 639 85 (parallel [ (set (reg:SI 891) (minus:SI (reg:SI 890) (reg:SI 546 [ D.45472 ]))) (clobber (reg:CC 17 flags)) ]) ../../trunk/gcc/value-prof.c:928 283 {*subsi_1} (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/v/f:DI 204 [ e12 ]) (const_int 52 [0x34])) [4 e12_223-probability+0 S4 A32]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (minus:SI (const_int 1 [0x2710]) (reg:SI 546 [ D.45472 ])) (nil) void (gdb) Is that valid? Ciao! Steven As for the more general problem of REG_EQ* notes, perhaps it's time to devise a global infrastructure to handle them. But, as far as I recall, they always have been problematic, before or after the DF merge. /* Remove the notes that refer to dead registers. As we have at most one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note so we need to purge the complete EQ_USES vector when removing the note using df_notes_rescan. */
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Well, I'm not sure I agree that they are wrong. Consider: S0: r1 = ... S1: r2 = r1 + 10 S2: r1 = r2 + 10 { REG_EQUAL r1 + 20 } S3: r3 = r1 + 10 Clearly it would be wrong to use the REG_EQUAL note on S2 to optimize S3 to: S0: r1 = ... S1: r2 = r1 + 10 S2: r1 = r1 + 20 S3: r3 = r1 + 30 But the note is wrong by itself. The semantics is clear: the note means that r1 is equal to r1 + 20 right after S2, which doesn't make any sense. However, It seems to me that this would be valid if e.g. the webizer turns the above into: S0: r1 = ... S1: r2 = r1 + 10 S2: r4 = r2 + 10 { REG_EQUAL r1 + 20 } S3: r3 = r4 + 10 because now the optimization would be valid: S0: r1 = ... S1: r2 = r1 + 10 S2: r4 = r1 + 20 S3: r3 = r1 + 30 Sure, but we have no guarantee that the RTL stream will be massaged that way. So self-referencing REG_EQUAL notes should IMHO be considered valid, and transformations using the notes should just be careful to invalidate the equivalence before processing the insn that the note appears on. cse.c doesn't appear to do this, however. IMO that's a recipe for bugs. On a completely different note: df-problems.c has this comment: /* Remove the notes that refer to dead registers. As we have at most one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note so we need to purge the complete EQ_USES vector when removing the note using df_notes_rescan. */ But ira.c:update_equiv_regs creates insns with a REG_EQUAL *and* a REG_EQUIV note: (gdb) update_equiv_regs () at ../../trunk/gcc/ira.c:3177 3177= gen_rtx_INSN_LIST (VOIDmode, insn, NULL_RTX); 2: debug_rtx((rtx)0x74df5e58) = (insn 638 637 639 85 (parallel [ (set (reg:SI 891) (minus:SI (reg:SI 890) (reg:SI 546 [ D.45472 ]))) (clobber (reg:CC 17 flags)) ]) ../../trunk/gcc/value-prof.c:928 283 {*subsi_1} (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/v/f:DI 204 [ e12 ]) (const_int 52 [0x34])) [4 e12_223-probability+0 S4 A32]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (minus:SI (const_int 1 [0x2710]) (reg:SI 546 [ D.45472 ])) (nil) void (gdb) Is that valid? Yes, the comment is wrong and should have been removed in r183719: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01547.html -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Nov 28, 2012 at 11:10 PM, Eric Botcazou wrote: Well, I'm not sure I agree that they are wrong. Consider: S0: r1 = ... S1: r2 = r1 + 10 S2: r1 = r2 + 10 { REG_EQUAL r1 + 20 } S3: r3 = r1 + 10 Clearly it would be wrong to use the REG_EQUAL note on S2 to optimize S3 to: S0: r1 = ... S1: r2 = r1 + 10 S2: r1 = r1 + 20 S3: r3 = r1 + 30 But the note is wrong by itself. The semantics is clear: the note means that r1 is equal to r1 + 20 right after S2, which doesn't make any sense. Or maybe the semantics are not what the compiler is actually doing. Because clearly several places in the compiler can create this kind of self-referencing note right now, and the main consumer of the notes, CSE, has apparently not had too much trouble handing them. But with the documented semantics, you're right. And, to be honest, I'm of the the fewer notes, the better camp :-) 2: debug_rtx((rtx)0x74df5e58) = (insn 638 637 639 85 (parallel [ (set (reg:SI 891) (minus:SI (reg:SI 890) (reg:SI 546 [ D.45472 ]))) (clobber (reg:CC 17 flags)) ]) ../../trunk/gcc/value-prof.c:928 283 {*subsi_1} (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/v/f:DI 204 [ e12 ]) (const_int 52 [0x34])) [4 e12_223-probability+0 S4 A32]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (minus:SI (const_int 1 [0x2710]) (reg:SI 546 [ D.45472 ])) (nil) void (gdb) Is that valid? Yes, the comment is wrong and should have been removed in r183719: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01547.html Ugh, that doesn't look like a very solid fix. Attached is a different fix. It splits DF_REF_IN_NOTE in two: One flag for each kind of note. This allows the dead note removal code to distinguish the source note for the EQ_USES. I needed to remove one flag to keep the df_ref_flags 16-bit, but the DF_REF_SUBREG flag looks completely unnecessary to me, and only ira.c uses it, so it wasn't to hard to scavenge a single bit. The patch also includes all places I've found so far where the compiler could create self-referencing notes: 1. optabs.c: Not sure what it was trying to do, but now it just refuses to add a note if TARGET is mentioned in one of the source operands. 2. gcse.c: gcse_emit_move_after added notes, but none of them was very useful as far as I could tell, and almost all of them turned self-referencing after CPROP. So I propose we just never add notes in this case. 3. cprop.c: It seems to me that the purpose here is to propagate constants. If a reg could not be propagated, then the REG_EQUAL note will not help much either. Propagating constants via REG_EQUAL notes still helps folding comparisons sometimes, so I'm proposing we only propagate those. As a bonus: less garbage RTL because a cprop_constant_p can be shared. 4. fwprop.c: If the SET_DEST is a REG that is mentioned in the SET_SRC, don' create a note. This one I'm not very happy with, because it doesn't handle the case that the REG is somehow simplified out of the SET_SRC, but being smarter about this would only complicate things. I for one can't think of anything better for the moment, anyway. Finally, it makes sense to compute the NOTE problem for CSE. Bootstraptesting in progress at the older revision I've been using to debug the darwin10 and sparc bugs. I'll test trunk head on x86_64 and powerpc later. In the mean time, let me hear what you think of this one :-) Ciao! Steven PR55006_2.diff Description: Binary data
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... Test case is unchanged so I'm omitting it here. Ciao! Steven gcc/ * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member. (analyze_iv_to_split_insn): Record it. (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL notes that refer to IVs that are being split. (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv. Twice. Use FOR_BB_INSNS_SAFE. That's fine with me, thanks. You might want to defer applying it until the reason why it isn't apparently sufficient for aermod.f90 is understood though. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Nov 27, 2012 at 10:55 AM, Eric Botcazou ebotca...@adacore.com wrote: Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... Test case is unchanged so I'm omitting it here. Ciao! Steven gcc/ * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member. (analyze_iv_to_split_insn): Record it. (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL notes that refer to IVs that are being split. (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv. Twice. Use FOR_BB_INSNS_SAFE. That's fine with me, thanks. You might want to defer applying it until the reason why it isn't apparently sufficient for aermod.f90 is understood though. Thanks. Yes, I'm first going to try and figure out why this doesn't fix aermod for Dominique. I suspect the problem is in variable expansion in the unroller. A previous patch of mine updates REG_EQUAL notes in variable expansion, but it doesn't clean up notes that refer to maybe dead registers. I have to say, I've got a uncomfortable feeling about REG_EQUAL notes not being allowed to refer to dead registers. In the case at hand, the code basically does: S1: r1 = r2 + 0x4 // r2 is the reg from split_iv, r1 was the original IV S2: r3 = mem[r1] S3: if ... goto L1; S4: r4 = r3 // REG_EQUAL mem[r1] L1: S5: r1 = r2 + 0x8 At S4, r1 is not live in the usual meaning of register liveness, but the DEF from S1 still reaches the REG_EQUAL note. This situation is not only created by loop unrolling. At least gcse.c (PRE), loop-invariant.c, cse.c, dce.c and combine.c can create situations like the above. ifcvt.c jumps through hoops to avoid it (see e.g. PR46315, which you fixed). Most of the problems are papered over by occasional calls to df_note_add_problem from some passes, so that df_remove_dead_eq_notes cleans up any notes referencing dead registers. But if we really want to declare this kind of REG_EQUAL note reference to a dead register invalid RTL (which is something at least you, Paolo, and me agree on), and we expect passes to clean up their own mess by either updating or removing any notes they invalidate, then df_remove_dead_eq_notes shouldn't be necessary for correctness because all notes it encounters should be valid (being updated by passes). Removing df_remove_dead_eq_notes breaks bootstrap on the four targets I tried it on (x86_64, powerpc64, ia64, and hppa). That is, breaks *without* -funroll-loops, and *without* -fweb. With a hack to disable pass_initialize_regs, bootstrap still fails, but other files are miscompiled. With another hack on top to disable CSE2, bootstrap still fails, and yet other files are miscompiled. What I'm really trying to say, is that even when I fix this particular issue with aermod (which is apparently 3 issues: the one I fixed last month for x86_64, the one fixed with the patch in this thread for SPARC, and the yet-to-be-identified problem Dominique is still seeing on darwin10) then it's likely other, similar bugs will show up. Bugs that are hard to reproduce, dependent on the target's RTL, and difficult to debug as they are usually wrong-code bugs on larger test cases. We really need a more robust solution. I've added Jeff and rth to the CC, looking for opinions/thoughts/suggestions/$0.02s. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Nov 27, 2012 at 11:34 AM, Steven Bosscher wrote: gcc/ * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member. (analyze_iv_to_split_insn): Record it. (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL notes that refer to IVs that are being split. (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv. Twice. Use FOR_BB_INSNS_SAFE. That's fine with me, thanks. You might want to defer applying it until the reason why it isn't apparently sufficient for aermod.f90 is understood though. Thanks. Yes, I'm first going to try and figure out why this doesn't fix aermod for Dominique. I suspect the problem is in variable expansion in the unroller. A previous patch of mine updates REG_EQUAL notes in variable expansion, but it doesn't clean up notes that refer to maybe dead registers. Well, that's not it. But what I said below: I have to say, I've got a uncomfortable feeling about REG_EQUAL notes not being allowed to refer to dead registers. applies even more after analyzing Dominique's bug. At the start of RTL PRE we have: 2000 {r719:DI=r719:DI+0x4;clobber flags:CC;} REG_UNUSED: flags:CC where r719:DI+0x4 is found to be partially redundant. So PRE optimizes this to: 2889 r719:DI=r1562:DI REG_EQUAL: r719:DI+0x4 ... 2913 r1562:DI=r719:DI+0x4 This self-reference in the REG_EQUAL note is the problem. The SET kills r719, but there is a USE in the PRE'ed expression that keeps r719 alive. But this use is copy-propagated away in CPROP2: 2913 r1562:DI=r1562:DI+0x4 Now that USE of r719 is a use of a dead register, rendering the REG_EQUAL invalid. From there on the problem is the same as the ones I had to fix in loop-unroll.c. First the webizer puts the USE in a different web and renames the USE to r1591: 2889 r719:DI=r1562:DI REG_EQUAL: r1591:DI+0x4 CSE2 uses this and the equivalence of r719 and r1562 to optimize the PRE expression: 2913 r1562:DI=r1591:DI+0x8 Then init-regs notices that this reg is quasi-used uninitialized: 3122 r1591:DI=0 2913 r1562:DI=r1591:DI+0x8 REG_DEAD: r1591:DI And combine finally produces: 2913 r1562:DI=0x8 I'm not sure what to name as the root cause for all of this. It all starts with PRE creating a REG_EQUAL note that references the register that's SET in the insn the note is attached to, but the register is live after the insn so from that point of view the note is not invalid. CPROP2 kills r179 by propagating it out and making the note invalid. I think CPROP2 could do that to any register, and if passes should make sure REG_EQUAL notes are updated or removed then this is a bug in RTL CPROP. Very, very messy... Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Nov 27, 2012 at 1:00 PM, Steven Bosscher wrote: Now that USE of r719 is a use of a dead register, rendering the REG_EQUAL invalid. From there on the problem is the same as the ones I had to fix in loop-unroll.c. First the webizer puts the USE in a different web and renames the USE to r1591: 2889 r719:DI=r1562:DI REG_EQUAL: r1591:DI+0x4 With this patch, the self-referencing EQ_USE is kept together with its DEF: 2889 r719:DI=r1562:DI REG_EQUAL: r719:DI+0x4 Dominique, could you give this a try and see if it helps? (But as I said up-thread: I'm not sure this is a proper fix, or just another band-aid...) Ciao! Steven * web.c (union_eq_uses): New function to keep self-referencing notes intact. (web_main): Call it. Index: web.c === --- web.c (revision 193394) +++ web.c (working copy) @@ -148,6 +148,35 @@ union_match_dups (rtx insn, struct web_e } } +/* If INSN has a REG_EQUAL note that references a DEF of INSN, union + them. FUN is the function that does the union. */ + +static void +union_eq_uses (rtx insn, struct web_entry *def_entry, + struct web_entry *use_entry, + bool (*fun) (struct web_entry *, struct web_entry *)) +{ + struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); + df_ref *def_link = DF_INSN_INFO_DEFS (insn_info); + df_ref *eq_use_link = DF_INSN_INFO_EQ_USES (insn_info); + + if (! (def_link eq_use_link)) +return; + + while (*eq_use_link) +{ + df_ref *def_rec = def_link; + while (*def_rec) + { + if (DF_REF_REAL_REG (*eq_use_link) == DF_REF_REAL_REG (*def_rec)) + (*fun) (use_entry + DF_REF_ID (*eq_use_link), + def_entry + DF_REF_ID (*def_rec)); + def_rec++; + } + eq_use_link++; +} +} + /* For each use, all possible defs reaching it must come in the same register, union them. FUN is the function that does the union. @@ -390,6 +419,7 @@ web_main (void) if (DF_REF_REGNO (use) = FIRST_PSEUDO_REGISTER) union_defs (use, def_entry, used, use_entry, unionfind_union); } + union_eq_uses (insn, def_entry, use_entry, unionfind_union); } }
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Nov 27, 2012 at 1:28 PM, Steven Bosscher wrote: Dominique, could you give this a try and see if it helps? (But as I said up-thread: I'm not sure this is a proper fix, or just another band-aid...) And more band-aid, but this time I'm not even sure where things go wrong. In any case, we end up with a REG_EQUAL note referencing a SUBREG of a dead register. This leaked because df_remove_dead_eq_notes uses loc_mentioned_in_p not on the note but on the first operand of the note. Index: df-problems.c === --- df-problems.c (revision 193394) +++ df-problems.c (working copy) @@ -2907,9 +2907,10 @@ df_remove_dead_eq_notes (rtx insn, bitma if (DF_REF_REGNO (use) FIRST_PSEUDO_REGISTER DF_REF_LOC (use) (DF_REF_FLAGS (use) DF_REF_IN_NOTE) -! bitmap_bit_p (live, DF_REF_REGNO (use)) -loc_mentioned_in_p (DF_REF_LOC (use), XEXP (link, 0))) +! bitmap_bit_p (live, DF_REF_REGNO (use))) { + /* Make sure that DF_SCAN is up-to-date. */ + gcc_assert (loc_mentioned_in_p (DF_REF_LOC (use), link)); deleted = true; break; }
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Dominique, could you give this a try and see if it helps? With the patch, the miscompilation of aermod.f90 is gone. Thanks, Dominique
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
And more band-aid, ... The gcc_assert triggers at bootstrap when compiling gcc/ada/ali.adb: +===GNAT BUG DETECTED==+ | 4.8.0 20121127 (experimental) [trunk revision 193848p10] (x86_64-apple-darwin10.8.0) GCC error:| | in df_remove_dead_eq_notes, at df-problems.c:2917| | Error detected around ../../work/gcc/ada/ali.adb:2682:8 | | Please submit a bug report; see http://gcc.gnu.org/bugs.html.| | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact gcc or gnatmake command that you entered. | | Also include sources listed below in gnatchop format | | (concatenated together with no headers between files). | +==+ Dominique
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 27/11/2012 13:00, Steven Bosscher ha scritto: It all starts with PRE creating a REG_EQUAL note that references the register that's SET in the insn the note is attached to, but the register is live after the insn so from that point of view the note is not invalid. This note seems very very weird. For one thing, it becomes invalid on the very instruction where it is created. I would say that it should not be there. Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Nov 27, 2012 at 3:25 PM, Dominique Dhumieres domi...@lps.ens.fr wrote: And more band-aid, ... The gcc_assert triggers at bootstrap when compiling gcc/ada/ali.adb: +===GNAT BUG DETECTED==+ | 4.8.0 20121127 (experimental) [trunk revision 193848p10] (x86_64-apple-darwin10.8.0) GCC error:| | in df_remove_dead_eq_notes, at df-problems.c:2917| | Error detected around ../../work/gcc/ada/ali.adb:2682:8 | | Please submit a bug report; see http://gcc.gnu.org/bugs.html.| | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact gcc or gnatmake command that you entered. | | Also include sources listed below in gnatchop format | | (concatenated together with no headers between files). | +==+ Yes, I found that one already, too. And, oh joy, we have pseudos in REG_EQUAL notes after LRA! (Probably also after reload, btw.). In the ICE I got, a pseudo's live range got split and an inheritance move is injected, but REG_EQUAL notes were not updated or removed. Finding and removing the notes is hard in IRA and LRA because they don't use the DF caches. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
This note seems very very weird. For one thing, it becomes invalid on the very instruction where it is created. I would say that it should not be there. Agreed. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Nov 27, 2012 at 5:57 PM, Eric Botcazou ebotca...@adacore.com wrote: This note seems very very weird. For one thing, it becomes invalid on the very instruction where it is created. I would say that it should not be there. Agreed. Count me in, too. So let's avoid it: * gcse.c (struct reg_use): Remove unused struct. (gcse_emit_move_after): Do not create REG_EQUAL notes that reference the SET_DEST of the instruction the note would be attached to. Index: gcse.c === --- gcse.c (revision 193394) +++ gcse.c (working copy) @@ -258,8 +258,6 @@ int flag_rerun_cse_after_global_opts; /* An obstack for our working variables. */ static struct obstack gcse_obstack; -struct reg_use {rtx reg_rtx; }; - /* Hash table of expressions. */ struct expr @@ -2482,23 +2480,27 @@ gcse_emit_move_after (rtx dest, rtx src, rtx new_rtx; rtx set = single_set (insn), set2; rtx note; - rtx eqv; + rtx eqv = NULL_RTX; /* This should never fail since we're creating a reg-reg copy we've verified to be valid. */ new_rtx = emit_insn_after (gen_move_insn (dest, src), insn); - /* Note the equivalence for local CSE pass. */ + /* Note the equivalence for local CSE pass. Take the note from the old + set if there was one. Otherwise record the SET_SRC from the old set + unless DEST is also an operand of the SET_SRC. */ set2 = single_set (new_rtx); if (!set2 || !rtx_equal_p (SET_DEST (set2), dest)) return new_rtx; if ((note = find_reg_equal_equiv_note (insn))) eqv = XEXP (note, 0); - else + else if (! REG_P (dest) + || ! reg_mentioned_p (dest, SET_SRC (set))) eqv = SET_SRC (set); - set_unique_reg_note (new_rtx, REG_EQUAL, copy_insn_1 (eqv)); + if (eqv != NULL_RTX) +set_unique_reg_note (new_rtx, REG_EQUAL, copy_insn_1 (eqv)); return new_rtx; } And perhaps a bit in emit-rtl.c for good measure? I'll see if/where this causes breakage... Index: emit-rtl.c === --- emit-rtl.c (revision 193394) +++ emit-rtl.c (working copy) @@ -4932,6 +4932,19 @@ gen_use (rtx x) return seq; } +/* Return true if DATUM has a USE of the SET_DEST of INSN. */ +static bool +self_ref_note_p (rtx insn, rtx datum) +{ + rtx set = single_set (insn); + if (! set) +return false; + rtx dest = SET_DEST (set); + if (! REG_P (dest)) +return false; + return reg_mentioned_p (dest, datum); +} + /* Place a note of KIND on insn INSN with DATUM as the datum. If a note of this type already exists, remove it first. */ @@ -4961,6 +4974,8 @@ set_unique_reg_note (rtx insn, enum reg_ if (note) { + if (self_ref_note_p (insn, datum)) + internal_error (self-reference in REG_EQUAL or REG_EQUIV note!\n); XEXP (note, 0) = datum; df_notes_rescan (insn); return note; @@ -4982,6 +4997,8 @@ set_unique_reg_note (rtx insn, enum reg_ { case REG_EQUAL: case REG_EQUIV: + if (self_ref_note_p (insn, datum)) + internal_error (self-reference in REG_EQUAL or REG_EQUIV note!\n); df_notes_rescan (insn); break; default:
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Count me in, too. So let's avoid it: * gcse.c (struct reg_use): Remove unused struct. (gcse_emit_move_after): Do not create REG_EQUAL notes that reference the SET_DEST of the instruction the note would be attached to. OK (with PR rtl-optimization/55006) if it passes bootstrap regtest, thanks. And perhaps a bit in emit-rtl.c for good measure? I'll see if/where this causes breakage... I think this would need to be wrapped in a #ifdef ENABLE_CHECKING/#endif pair. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Nov 28, 2012 at 12:47 AM, Eric Botcazou wrote: Count me in, too. So let's avoid it: * gcse.c (struct reg_use): Remove unused struct. (gcse_emit_move_after): Do not create REG_EQUAL notes that reference the SET_DEST of the instruction the note would be attached to. OK (with PR rtl-optimization/55006) if it passes bootstrap regtest, thanks. Thanks for the quick review. And perhaps a bit in emit-rtl.c for good measure? I'll see if/where this causes breakage... I think this would need to be wrapped in a #ifdef ENABLE_CHECKING/#endif pair. Well, let me first try to make it pass some set of tests. This breaks the compiler in too many places to name right now. Here's the first of what's probably going to be a whole series of patches if I keep hitting this internal_error as often as I am now in my set of cc1-i files... I know: ChangeLog, testing and all that. That's not the point yet. This is just a brain dump, to get this saved in some places safer than the compile farm or (worse) my head ;-) Index: optabs.c === --- optabs.c(revision 193394) +++ optabs.c(working copy) @@ -170,9 +170,9 @@ optab_libfunc (optab optab, enum machine If the last insn does not set TARGET, don't do anything, but return 1. - If a previous insn sets TARGET and TARGET is one of OP0 or OP1, - don't add the REG_EQUAL note but return 0. Our caller can then try - again, ensuring that TARGET is not one of the operands. */ + If the last insn or a previous insn sets TARGET and TARGET is one of OP0 + or OP1, don't add the REG_EQUAL note but return 0. Our caller can then + try again, ensuring that TARGET is not one of the operands. */ static int add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1) @@ -192,6 +192,12 @@ add_equal_note (rtx insns, rtx target, e if (GET_CODE (target) == ZERO_EXTRACT) return 1; + /* If TARGET is in OP0 or OP1, punt. We'd end up with a note referencing + a value changing in the insn, so the note would be invalid for CSE. */ + if (reg_overlap_mentioned_p (target, op0) + || (op1 reg_overlap_mentioned_p (target, op1))) +return 0; + for (last_insn = insns; NEXT_INSN (last_insn) != NULL_RTX; last_insn = NEXT_INSN (last_insn)) @@ -207,21 +213,6 @@ add_equal_note (rtx insns, rtx target, e || ! rtx_equal_p (XEXP (SET_DEST (set), 0), target))) return 1; - /* If TARGET is in OP0 or OP1, check if anything in SEQ sets TARGET - besides the last insn. */ - if (reg_overlap_mentioned_p (target, op0) - || (op1 reg_overlap_mentioned_p (target, op1))) -{ - insn = PREV_INSN (last_insn); - while (insn != NULL_RTX) - { - if (reg_set_p (target, insn)) - return 0; - - insn = PREV_INSN (insn); - } -} - if (GET_RTX_CLASS (code) == RTX_UNARY) switch (code) {
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Nov 28, 2012 at 12:53 AM, Steven Bosscher wrote: On Wed, Nov 28, 2012 at 12:47 AM, Eric Botcazou wrote: Count me in, too. So let's avoid it: * gcse.c (struct reg_use): Remove unused struct. (gcse_emit_move_after): Do not create REG_EQUAL notes that reference the SET_DEST of the instruction the note would be attached to. OK (with PR rtl-optimization/55006) if it passes bootstrap regtest, thanks. Thanks for the quick review. And perhaps a bit in emit-rtl.c for good measure? I'll see if/where this causes breakage... I think this would need to be wrapped in a #ifdef ENABLE_CHECKING/#endif pair. Well, let me first try to make it pass some set of tests. This breaks the compiler in too many places to name right now. Here's the first of what's probably going to be a whole series of patches if I keep hitting this internal_error as often as I am now in my set of cc1-i files... I know: ChangeLog, testing and all that. That's not the point yet. This is just a brain dump, to get this saved in some places safer than the compile farm or (worse) my head ;-) And another one: Index: cprop.c === --- cprop.c (revision 193394) +++ cprop.c (working copy) @@ -776,7 +776,8 @@ try_replace_reg (rtx from, rtx to, rtx i /* If we've failed perform the replacement, have a single SET to a REG destination and don't yet have a note, add a REG_EQUAL note to not lose information. */ - if (!success note == 0 set != 0 REG_P (SET_DEST (set))) + if (!success note == 0 set != 0 REG_P (SET_DEST (set)) + ! reg_mentioned_p (SET_DEST (set), src)) note = set_unique_reg_note (insn, REG_EQUAL, copy_rtx (src)); } This one's for an interesting case of note, of a kind I've never seen before. Had to share this gem of a note :-) Ciao! Steven Breakpoint 1, internal_error (gmsgid=0x11455f8 self-reference in REG_EQUAL or REG_EQUIV note!\n) at ../../trunk/gcc/diagnostic.c:1087 1087 va_start (ap, gmsgid); (gdb) up #1 0x00709dea in set_unique_reg_note (insn=0x7507b3a8, kind=REG_EQUAL, datum=0x7528c2e0) at ../../trunk/gcc/emit-rtl.c:5001 5001internal_error (self-reference in REG_EQUAL or REG_EQUIV note!\n); (gdb) #2 0x00f608c4 in try_replace_reg (from=0x75301c60, to=0x762e5460, insn=0x7507b3a8) at ../../trunk/gcc/cprop.c:780 780 note = set_unique_reg_note (insn, REG_EQUAL, copy_rtx (src)); (gdb) l 775 776 /* If we've failed perform the replacement, have a single SET to 777 a REG destination and don't yet have a note, add a REG_EQUAL note 778 to not lose information. */ 779 if (!success note == 0 set != 0 REG_P (SET_DEST (set))) 780 note = set_unique_reg_note (insn, REG_EQUAL, copy_rtx (src)); 781 } 782 783 if (set MEM_P (SET_DEST (set)) reg_mentioned_p (from, SET_DEST (set))) 784 { (gdb) p debug_rtx (set) (set (reg/v:SI 173 [ xsize ]) (if_then_else:SI (ge (reg:CCGC 17 flags) (const_int 0 [0])) (reg/v:SI 173 [ xsize ]) (reg:SI 241))) $1 = void (gdb) p debug_rtx(insn) (insn 741 739 678 81 (set (reg/v:SI 173 [ xsize ]) (if_then_else:SI (ge (reg:CCGC 17 flags) (const_int 0 [0])) (reg/v:SI 173 [ xsize ]) (reg:SI 241))) ../../trunk/gcc/alias.c:1940 893 {*movsicc_noc} (expr_list:REG_EQUAL (if_then_else:SI (ge (reg:CCGC 17 flags) (const_int 0 [0])) (reg/v:SI 173 [ xsize ]) (const_int -1 [0x])) (nil))) $2 = void (gdb)
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Nov 28, 2012 at 12:58 AM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Nov 28, 2012 at 12:53 AM, Steven Bosscher wrote: On Wed, Nov 28, 2012 at 12:47 AM, Eric Botcazou wrote: Count me in, too. So let's avoid it: * gcse.c (struct reg_use): Remove unused struct. (gcse_emit_move_after): Do not create REG_EQUAL notes that reference the SET_DEST of the instruction the note would be attached to. OK (with PR rtl-optimization/55006) if it passes bootstrap regtest, thanks. Thanks for the quick review. And perhaps a bit in emit-rtl.c for good measure? I'll see if/where this causes breakage... I think this would need to be wrapped in a #ifdef ENABLE_CHECKING/#endif pair. Well, let me first try to make it pass some set of tests. This breaks the compiler in too many places to name right now. Here's the first of what's probably going to be a whole series of patches if I keep hitting this internal_error as often as I am now in my set of cc1-i files... I know: ChangeLog, testing and all that. That's not the point yet. This is just a brain dump, to get this saved in some places safer than the compile farm or (worse) my head ;-) And another one: And one to end the night here. Again: Is this really the direction we should be going in? (I haven't any better ideas...) Index: fwprop.c === --- fwprop.c(revision 193394) +++ fwprop.c(working copy) @@ -1321,7 +1321,9 @@ forward_propagate_and_simplify (df_ref u separately try plugging the definition in the note and simplifying. And only install a REQ_EQUAL note when the destination is a REG, as the note would be invalid otherwise. */ - set_reg_equal = (note == NULL_RTX REG_P (SET_DEST (use_set))); + set_reg_equal = (note == NULL_RTX REG_P (SET_DEST (use_set)) + ! reg_mentioned_p (SET_DEST (use_set), +SET_SRC (use_set))); } if (GET_MODE (*loc) == VOIDmode)
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Again: Is this really the direction we should be going in? (I haven't any better ideas...) Do you mean for self-referencing notes? If so, definitely, these notes are either wrong or almost always redundant so we should eliminate them. As for the more general problem of REG_EQ* notes, perhaps it's time to devise a global infrastructure to handle them. But, as far as I recall, they always have been problematic, before or after the DF merge. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... This does not fix pr55006 on x86_64-apple-darwin10;-(while the patch in http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01607.html does). Test case is unchanged so I'm omitting it here. This test passes on x86_64-apple-darwin10 for a clean trunk. Thanks for looking at the problem. Dominique
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 26, 2012 at 3:38 PM, Dominique Dhumieres domi...@lps.ens.fr wrote: Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... This does not fix pr55006 on x86_64-apple-darwin10;-(while the patch in http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01607.html does). Does it fail on x86_64-linux also? If not as-is, can you try with -fPIC? Test case is unchanged so I'm omitting it here. This test passes on x86_64-apple-darwin10 for a clean trunk. Yes, I know. As you can probably tell from the patch, I've been working on an older revision because I can't reproduce the problem on the trunk. It's a very elusive problem. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sat, Nov 24, 2012 at 2:09 AM, Steven Bosscher wrote: On Fri, Nov 23, 2012 at 11:45 PM, Steven Bosscher wrote: Removing the note is easier but it may hurt optimization later on: CSE2 puts the note back in but this introduces a pass ordering dependency. Perhaps that's not a big problem, but I'm not sure... Hmm, actually CSE2 fails to put the note back, I guess because it's a local CSE pass only... Here's another possible fix: Just punt on all REG_EQUAL notes for the IV. It's a bit of a big hammer, but there are no code changes in my set of cc1-i files (compiled with -funroll-loops) on x86_64 and powerpc64 so apparently it's not all that important to keep the notes. Comments on this one? Ciao! Steven gcc/ * rtlanal.c (remove_reg_equal_equiv_notes_for_regno): Fix for deferred re-scanning. * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member. (analyze_iv_to_split_insn): Record it. (analyze_insns_in_loop): Remove all REG_EQUAL notes for IVs that will be split. (apply_opt_in_copies): Use FOR_BB_INSNS_SAFE. testsuite/ * gcc.dg/pr55006.c: New test. Index: rtlanal.c === --- rtlanal.c (revision 193394) +++ rtlanal.c (working copy) @@ -2015,15 +2015,18 @@ remove_reg_equal_equiv_notes (rtx insn) void remove_reg_equal_equiv_notes_for_regno (unsigned int regno) { - df_ref eq_use; + df_ref eq_use, next_eq_use; if (!df) return; /* This loop is a little tricky. We cannot just go down the chain because - it is being modified by some actions in the loop. So we just iterate - over the head. We plan to drain the list anyway. */ - while ((eq_use = DF_REG_EQ_USE_CHAIN (regno)) != NULL) + it may be modified by some actions in the loop if re-scanning is not + deferred. We also can't just iterate over the head because the chain + may be *not* modified if re-scanning *is* deferred. */ + for (eq_use = DF_REG_EQ_USE_CHAIN (regno); + eq_use != NULL; + eq_use = next_eq_use) { rtx insn = DF_REF_INSN (eq_use); rtx note = find_reg_equal_equiv_note (insn); @@ -2033,6 +2036,8 @@ remove_reg_equal_equiv_notes_for_regno ( remove_note. */ gcc_assert (note); + next_eq_use = DF_REF_NEXT_REG (eq_use); + remove_note (insn, note); } } Index: loop-unroll.c === --- loop-unroll.c (revision 193394) +++ loop-unroll.c (working copy) @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. struct iv_to_split { rtx insn;/* The insn in that the induction variable occurs. */ + rtx orig_var;/* The variable (register) for the IV before split. */ rtx base_var;/* The variable on that the values in the further iterations are based. */ rtx step;/* Step of the induction variable. */ @@ -1833,6 +1834,7 @@ analyze_iv_to_split_insn (rtx insn) /* Record the insn to split. */ ivts = XNEW (struct iv_to_split); ivts-insn = insn; + ivts-orig_var = dest; ivts-base_var = NULL_RTX; ivts-step = iv.step; ivts-next = NULL; @@ -1913,6 +1915,12 @@ analyze_insns_in_loop (struct loop *loop if (ivts) { + /* Updating REG_EQUAL notes for this IV is tricky: We cannot tell + until after unrolling whether an EQ_USE is reached by the split + IV while the IV reg is still live. See PR55006. + Be conservative, remove all such notes. */ + remove_reg_equal_equiv_notes_for_regno (REGNO (ivts-orig_var)); + slot1 = htab_find_slot (opt_info-insns_to_split, ivts, INSERT); gcc_assert (*slot1 == NULL); *slot1 = ivts; @@ -2293,9 +2301,8 @@ apply_opt_in_copies (struct opt_info *op unrolling); bb-aux = 0; orig_insn = BB_HEAD (orig_bb); - for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = next) + FOR_BB_INSNS_SAFE (bb, insn, next) { - next = NEXT_INSN (insn); if (!INSN_P (insn) || (DEBUG_INSN_P (insn) TREE_CODE (INSN_VAR_LOCATION_DECL (insn)) == LABEL_DECL)) Index: testsuite/gcc.dg/pr55006.c === --- testsuite/gcc.dg/pr55006.c (revision 0) +++ testsuite/gcc.dg/pr55006.c (revision 0) @@ -0,0 +1,88 @@ +/* PR rtl-optimization/55006 */ +/* { dg-do run } */ +/* { dg-options -O3 -fomit-frame-pointer -funroll-loops } */ + +extern void abort (void) __attribute__ ((__noreturn__)); +extern void *memcpy (void *__restrict, const void *__restrict, __SIZE_TYPE__); + +static void __attribute__ ((__noinline__, __noclone__)) ga4076 (void) +{ + int D833; + float D834; + int D837; + int D840; + float D841; + int D844; + float dda[100]; +
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sun, Nov 25, 2012 at 9:44 PM, Steven Bosscher wrote: On Sat, Nov 24, 2012 at 2:09 AM, Steven Bosscher wrote: On Fri, Nov 23, 2012 at 11:45 PM, Steven Bosscher wrote: Removing the note is easier but it may hurt optimization later on: CSE2 puts the note back in but this introduces a pass ordering dependency. Perhaps that's not a big problem, but I'm not sure... Hmm, actually CSE2 fails to put the note back, I guess because it's a local CSE pass only... Here's another possible fix: Just punt on all REG_EQUAL notes for the IV. It's a bit of a big hammer, but there are no code changes in my set of cc1-i files (compiled with -funroll-loops) on x86_64 and powerpc64 so apparently it's not all that important to keep the notes. Comments on this one? Or rather this one. Same hammer, different color. It turns out that the rtlanal.c change caused problems, so I've got to use a home-brewn equivalent of remove_reg_equal_equiv_notes_for_regno... Test case is unchanged so I'm omitting it here. Ciao! Steven gcc/ * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member. (analyze_iv_to_split_insn): Record it. (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL notes that refer to IVs that are being split. (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv. Twice. Use FOR_BB_INSNS_SAFE. testsuite/ * gcc.dg/pr55006.c: New test. Index: loop-unroll.c === --- loop-unroll.c (revision 193394) +++ loop-unroll.c (working copy) @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. struct iv_to_split { rtx insn;/* The insn in that the induction variable occurs. */ + rtx orig_var;/* The variable (register) for the IV before split. */ rtx base_var;/* The variable on that the values in the further iterations are based. */ rtx step;/* Step of the induction variable. */ @@ -1833,6 +1834,7 @@ analyze_iv_to_split_insn (rtx insn) /* Record the insn to split. */ ivts = XNEW (struct iv_to_split); ivts-insn = insn; + ivts-orig_var = dest; ivts-base_var = NULL_RTX; ivts-step = iv.step; ivts-next = NULL; @@ -2253,6 +2255,32 @@ combine_var_copies_in_loop_exit (struct emit_insn_after (seq, insn); } +/* Strip away REG_EQUAL notes for IVs we're splitting. + + Updating REG_EQUAL notes for IVs we split is tricky: We + cannot tell until after unrolling, DF-rescanning, and liveness + updating, whether an EQ_USE is reached by the split IV while + the IV reg is still live. See PR55006. + + ??? We cannot use remove_reg_equal_equiv_notes_for_regno, + because RTL loop-iv requires us to defer rescanning insns and + any notes attached to them. So resort to old techniques... */ + +static void +maybe_strip_eq_note_for_split_iv (struct opt_info *opt_info, rtx insn) +{ + struct iv_to_split *ivts; + rtx note = find_reg_equal_equiv_note (insn); + if (! note) +return; + for (ivts = opt_info-iv_to_split_head; ivts; ivts = ivts-next) +if (reg_mentioned_p (ivts-orig_var, note)) + { + remove_note (insn, note); + return; + } +} + /* Apply loop optimizations in loop copies using the data which gathered during the unrolling. Structure OPT_INFO record that data. @@ -2293,9 +2321,8 @@ apply_opt_in_copies (struct opt_info *op unrolling); bb-aux = 0; orig_insn = BB_HEAD (orig_bb); - for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = next) + FOR_BB_INSNS_SAFE (bb, insn, next) { - next = NEXT_INSN (insn); if (!INSN_P (insn) || (DEBUG_INSN_P (insn) TREE_CODE (INSN_VAR_LOCATION_DECL (insn)) == LABEL_DECL)) @@ -2313,6 +2340,8 @@ apply_opt_in_copies (struct opt_info *op /* Apply splitting iv optimization. */ if (opt_info-insns_to_split) { + maybe_strip_eq_note_for_split_iv (opt_info, insn); + ivts = (struct iv_to_split *) htab_find (opt_info-insns_to_split, ivts_templ); @@ -2378,6 +2407,8 @@ apply_opt_in_copies (struct opt_info *op ivts_templ.insn = orig_insn; if (opt_info-insns_to_split) { + maybe_strip_eq_note_for_split_iv (opt_info, orig_insn); + ivts = (struct iv_to_split *) htab_find (opt_info-insns_to_split, ivts_templ); if (ivts)
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Fri, Nov 23, 2012 at 11:45 PM, Steven Bosscher wrote: Removing the note is easier but it may hurt optimization later on: CSE2 puts the note back in but this introduces a pass ordering dependency. Perhaps that's not a big problem, but I'm not sure... Hmm, actually CSE2 fails to put the note back, I guess because it's a local CSE pass only... Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 19, 2012 at 12:15 AM, Dominique Dhumieres wrote: I think this should fix it. Can't test it right now, so help appreciated (Honza: hint hint! ;-) The change at http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01511/remove_dead_eq_notes.diff (revision 192526) caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55006 Yes, I'll be looking into this soon. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Yes, I'll be looking into this soon. We have a related regression on SPARC: FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test whose failure mode is exactly the same as for Honza's testcase. I even have a more reduced testcase (6 lines, attached) in case you'd prefer working on it. Reproducible with a cross to sparc-linux with -mcpu=v8 -O3 -funroll-loops and grepping for mem:SF (const_int 0 [0]) in the RTL dumps. -- Eric Botcazouprogram GA4076 REAL DDA(100) dda = (/(J1,J1=1,100)/) IDS = MAXLOC(DDA,1) if (ids.ne.100) call abort END
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 19, 2012 at 12:24 PM, Eric Botcazou ebotca...@adacore.com wrote: Yes, I'll be looking into this soon. We have a related regression on SPARC: FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test whose failure mode is exactly the same as for Honza's testcase. I even have a more reduced testcase (6 lines, attached) in case you'd prefer working on it. Reproducible with a cross to sparc-linux with -mcpu=v8 -O3 -funroll-loops and grepping for mem:SF (const_int 0 [0]) in the RTL dumps. Thanks for this reduced test case, that's saving me a lot of work! Can you please try and see if the following C test case also fails? It looks like the memcpy is expanded to a loop that is unrolled, and then mis-optimized. I haven't found out yet why... Ciao! Steven extern void abort (void) __attribute__ ((__noreturn__)); extern void *memcpy (void *__restrict, const void *__restrict, __SIZE_TYPE__); static void __attribute__ ((__noinline__, __noclone__)) ga4076 (void) { int D833; float D834; int D837; int D840; float D841; int D844; float dda[100]; int ids; static float A0[100] = { 10e+0, 20e+0, 30e+0, 40e+0, 50e+0, 60e+0, 70e+0, 80e+0, 90e+0, 10e+1, 11e+1, 12e+1, 13e+1, 14e+1, 15e+1, 16e+1, 17e+1, 18e+1, 19e+1, 20e+1, 21e+1, 22e+1, 23e+1, 24e+1, 25e+1, 26e+1, 27e+1, 28e+1, 29e+1, 30e+1, 31e+1, 32e+1, 33e+1, 34e+1, 35e+1, 36e+1, 37e+1, 38e+1, 39e+1, 40e+1, 41e+1, 42e+1, 43e+1, 44e+1, 45e+1, 46e+1, 47e+1, 48e+1, 49e+1, 50e+1, 51e+1, 52e+1, 53e+1, 54e+1, 55e+1, 56e+1, 57e+1, 58e+1, 59e+1, 60e+1, 61e+1, 62e+1, 63e+1, 64e+1, 65e+1, 66e+1, 67e+1, 68e+1, 69e+1, 70e+1, 71e+1, 72e+1, 73e+1, 74e+1, 75e+1, 76e+1, 77e+1, 78e+1, 79e+1, 80e+1, 81e+1, 82e+1, 83e+1, 84e+1, 85e+1, 86e+1, 87e+1, 88e+1, 89e+1, 90e+1, 91e+1, 92e+1, 93e+1, 94e+1, 95e+1, 96e+1, 97e+1, 98e+1, 99e+1, 10e+2 }; memcpy (dda, A0, sizeof (dda)); float limit3; int offset2; int pos1; int S4; limit3 = -1. * __builtin_inf(); pos1 = 0; offset2 = 0; S4 = 1; D831: if (S4 100) goto L3; else goto D832; D832: D833 = S4 - 1; D834 = dda[D833]; if (D834 = limit3) goto D835; else goto D836; D835: D837 = S4 - 1; limit3 = dda[D837]; pos1 = S4 + offset2; goto L1; D836: S4 = S4 + 1; goto D831; L3: pos1 = 1; goto L2; L1: if (S4 100) goto L2; else goto D839; D839: D840 = S4 - 1; D841 = dda[D840]; if (D841 limit3) goto D842; else goto D843; D842: D844 = S4 - 1; limit3 = dda[D844]; pos1 = S4 + offset2; D843: S4 = S4 + 1; goto L1; L2: ids = pos1; if (ids != 100) abort (); } int main (void) { ga4076 (); return 0; }
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 19, 2012 at 9:34 PM, Steven Bosscher wrote: On Mon, Nov 19, 2012 at 12:24 PM, Eric Botcazou wrote: Yes, I'll be looking into this soon. We have a related regression on SPARC: FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/minmaxloc_5.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/minmaxloc_6.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test whose failure mode is exactly the same as for Honza's testcase. I even have a more reduced testcase (6 lines, attached) in case you'd prefer working on it. Reproducible with a cross to sparc-linux with -mcpu=v8 -O3 -funroll-loops and grepping for mem:SF (const_int 0 [0]) in the RTL dumps. Thanks for this reduced test case, that's saving me a lot of work! Can you please try and see if the following C test case also fails? It looks like the memcpy is expanded to a loop that is unrolled, and then mis-optimized. I haven't found out yet why... It's another case of an invalid REG_EQUAL note. Shown below is an excerpt from the C test case .loop2_done dump (after calling df_analyze to update liveness). The note on insn 172 uses (reg:SI 132) which is not live on entry to basic block 25. ;; basic block 25, loop depth 0, count 0, freq 2485, maybe hot ;; prev block 20, next block 26, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 11 [50.5%] ;; bb 25 artificial_defs: { } ;; bb 25 artificial_uses: { u-1(14){ }u-1(30){ }u-1(101){ }} ;; lr in14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 138 158 165 ;; lr use 14 [%sp] 30 [%fp] 101 [%sfp] 138 158 ;; lr def 131 133 ;; live in 14 [%sp] 101 [%sfp] 138 158 165 ;; live gen 131 133 ;; live kill (code_label 179 149 173 25 22 [1 uses]) (note 173 179 171 25 [bb 25] NOTE_INSN_BASIC_BLOCK) (insn 171 173 172 25 (set (reg/v:SI 133 [ idsD.1386 ]) (reg/v:SI 138 [ idsD.1386 ])) 40 {*movsi_insn} (expr_list:REG_UNUSED (reg/v:SI 133 [ idsD.1386 ]) (nil))) (insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ]) (reg:SF 158 [ limit3D.1388 ])) t.c:62 -1 (expr_list:REG_EQUAL (mem:SF (reg:SI 132 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ]) (nil ;; lr out 14 [%sp] 30 [%fp] 31 [%i7] 101 [%sfp] 131 133 138 165 ;; live out 14 [%sp] 101 [%sfp] 131 133 138 165 The use of the dead reg is renamed in the webizer: (insn 172 171 176 25 (set (reg/v:SF 131 [ limit3D.1388 ]) (reg:SF 172 [ limit3D.1388 ])) t.c:62 66 {*movsf_insn} (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ]) (nil Next, cse2 thinks it's a good idea to actually use the REG_EQUAL note, turning the EQ-use of the uninitialized reg 174 into a real use: (insn 172 171 176 17 (set (reg/v:SF 131 [ limit3D.1388 ]) (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn} (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (expr_list:REG_DEAD (reg:SF 158 [ limit3D.1388 ]) (nil Then init-regs concludes that reg 174 is used ininitialized: adding initialization in ga4076 of reg 174 at in block 16 for insn 172. (insn 206 171 172 16 (set (reg:SI 174 [ ivtmp.7D.1419 ]) (const_int 0 [0])) t.c:62 -1 (nil)) (insn 172 206 176 16 (set (reg/v:SF 131 [ limit3D.1388 ]) (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32])) t.c:62 66 {*movsf_insn} (expr_list:REG_DEAD (reg:SI 174 [ ivtmp.7D.1419 ]) (expr_list:REG_EQUAL (mem:SF (reg:SI 174 [ ivtmp.7D.1419 ]) [2 MEM[base: _23, offset: 0B]+0 S4 A32]) (nil And finally, combine merges the above to yield the final result: Trying 206 - 172: Successfully matched this instruction: (set (reg/v:SF 131 [ limit3D.1388 ]) (mem:SF (const_int 0 [0]) [2 MEM[base: _23, offset: 0B]+0 S4 A32])) The root cause is the bad REG_EQUAL note. I think the most robust solution is to make the webizer re-compute notes before renaming. Patch for that is attached. Ciao! Steven PR rtl-optimization/55006 * web.c (web_main): Add the DF_NOTE problem, and explain whatfor. Index: web.c === --- web.c (revision 193454) +++ web.c (working copy) @@ -335,9 +335,16 @@ web_main (void) unsigned int uses_num = 0; rtx insn; + /* Add the flags and problems we need. The DF_EQ_NOTES flag is set so + that uses of registers in REG_EQUAL and REG_EQUIV notes are included + in the web that their DEF belongs to, so that
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Thanks for this reduced test case, that's saving me a lot of work! You're welcome. Can you please try and see if the following C test case also fails? Yup, with the same compilation options, there are the same dreadful lines ld [%g0+0], %fn in the assembly file, which translates into (mem:SF (const_int 0 [0])) in the RTL dumps. And the executable does segfault at runtime. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
The root cause is the bad REG_EQUAL note. I think the most robust solution is to make the webizer re-compute notes before renaming. Patch for that is attached. Thanks for the analysis. However... Ciao! Steven PR rtl-optimization/55006 * web.c (web_main): Add the DF_NOTE problem, and explain whatfor. Index: web.c === --- web.c (revision 193454) +++ web.c (working copy) @@ -335,9 +335,16 @@ web_main (void) unsigned int uses_num = 0; rtx insn; + /* Add the flags and problems we need. The DF_EQ_NOTES flag is set so + that uses of registers in REG_EQUAL and REG_EQUIV notes are included + in the web that their DEF belongs to, so that these uses are also + properly renamed. The DF_NOTE problem is added to make sure that + all notes are up-to-date and valid: Re-computing the notes problem + also cleans up all dead REG_EQUAL notes. */ df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_chain_add_problem (DF_UD_CHAIN); + df_note_add_problem (); df_analyze (); df_set_flags (DF_DEFER_INSN_RESCAN); ... that's not very satisfactory, as web doesn't use the DF_NOTE problem, so adding it just to clean things up in the other kind of notes is weird. Can't we arrange to clean up the REG_EQUAL/REG_EQUIV notes when we use them, i.e. when DF_EQ_NOTES is set? -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 19, 2012 at 10:50 PM, Eric Botcazou wrote: The root cause is the bad REG_EQUAL note. I think the most robust solution is to make the webizer re-compute notes before renaming. Patch for that is attached. Thanks for the analysis. However... Ciao! Steven PR rtl-optimization/55006 * web.c (web_main): Add the DF_NOTE problem, and explain whatfor. Index: web.c === --- web.c (revision 193454) +++ web.c (working copy) @@ -335,9 +335,16 @@ web_main (void) unsigned int uses_num = 0; rtx insn; + /* Add the flags and problems we need. The DF_EQ_NOTES flag is set so + that uses of registers in REG_EQUAL and REG_EQUIV notes are included + in the web that their DEF belongs to, so that these uses are also + properly renamed. The DF_NOTE problem is added to make sure that + all notes are up-to-date and valid: Re-computing the notes problem + also cleans up all dead REG_EQUAL notes. */ df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_chain_add_problem (DF_UD_CHAIN); + df_note_add_problem (); df_analyze (); df_set_flags (DF_DEFER_INSN_RESCAN); ... that's not very satisfactory, as web doesn't use the DF_NOTE problem, so adding it just to clean things up in the other kind of notes is weird. True. Can't we arrange to clean up the REG_EQUAL/REG_EQUIV notes when we use them, i.e. when DF_EQ_NOTES is set? That could be done, yes. Cleaning up the REG_EQ* notes requires liveness at the insn level, so it'd require a bigger re-organization of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES) over all insn in df_lr_finalize, tracking liveness and calling df_remove_dead_eq_notes on each insn. But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag. Perhaps df_note_add_problem should imply setting DF_EQ_NOTES? Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
That could be done, yes. Cleaning up the REG_EQ* notes requires liveness at the insn level, so it'd require a bigger re-organization of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES) over all insn in df_lr_finalize, tracking liveness and calling df_remove_dead_eq_notes on each insn. But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag. Right. After the DF merge, the de-facto consensus was that individual passes were still responsible for cleaning up the REG_EQ* notes when they change the code, whereas the REG_DEAD/REG_UNUSED notes are entirely handled by DF. As such, the bug here is in the unroller, not in the webizer. So I think that we should try to fix the unroller first. If that's not really doable, then a fallback could be to shield the passes using DF_EQ_NOTES from dangling REG_EQ* notes by means of DF. But using df_note_add_problem for this task seems to be a little far-fetched. Perhaps df_note_add_problem should imply setting DF_EQ_NOTES? Do you mean the opposite, i.e setting DF_EQ_NOTES should imply adding DF_NOTE? -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Nov 19, 2012 at 11:42 PM, Eric Botcazou wrote: That could be done, yes. Cleaning up the REG_EQ* notes requires liveness at the insn level, so it'd require a bigger re-organization of the code. Perhaps adding a new pass (conditional on DF_EQ_NOTES) over all insn in df_lr_finalize, tracking liveness and calling df_remove_dead_eq_notes on each insn. But most passes that use the REG_EQ* notes don't set the DF_EQ_NOTES flag. Right. After the DF merge, the de-facto consensus was that individual passes were still responsible for cleaning up the REG_EQ* notes when they change the code, whereas the REG_DEAD/REG_UNUSED notes are entirely handled by DF. As such, the bug here is in the unroller, not in the webizer. Then it's in -fsplit-ivs-in-unroller. Looking into it. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
I think this should fix it. Can't test it right now, so help appreciated (Honza: hint hint! ;-) The change at http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01511/remove_dead_eq_notes.diff (revision 192526) caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55006 Dominique
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Oct 17, 2012 at 6:54 AM, Steven Bosscher stevenb@gmail.com wrote: On Tue, Oct 16, 2012 at 1:38 PM, Paolo Bonzini wrote: Bottom line: This is a bug in df_kill_notes. Yep, and it could cause wrong code generation, couldn't it? Because the new (reg:DI 72) is obviously not equal to (plus:DI (reg:DI 72) (const_int 24)), but it could be propagated to subsequent insns. Right. So I think this patch should be backported to all release branches after regstrapping. Agreed, but let's wait a few days at least to see what happens with the patch on the trunk. Ok after bootstrap, regtest and checking that it actually fixes the bug. I made sure of that before posting the patch ;-) Bootstrapregtest is running on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. I'll commit the patch tomorrow if there are no test suite regressions. Well done to us all for probing until we found the root cause of this bug! Hi Jan, For the test case: main() { configure2(); } Please make main return ZERO by default. When cross testing with qemu/simulator, the wrap exit checks whether the case by verifying the return value. For ARM target, R0 is checked while it may be clobbered(thus non-ZERO) if main does not return any value, which causes failure of test case. Thanks very much. -- Best Regards.
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Oct 15, 2012 at 3:26 PM, Steven Bosscher wrote: On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini wrote: Il 15/10/2012 14:53, Steven Bosscher ha scritto: I think I've shown above that we're all looking at the wrong pass... I think you have... so we want a patch like this? I don't think so. df_kill_notes is already supposed to take care of this. But it doesn't because if the SET_DEST of an insn is the same as the register dieing in the insn's REG_EQUAL note, the reg is live at the end of the insn and so the note stays: Breakpoint 2, df_kill_notes (insn=0x75e3e7e0, live=0x7fffda90) at ../../trunk/gcc/df-problems.c:2833 2833 rtx *pprev = REG_NOTES (insn); 1: debug_rtx(insn) = (insn 79 50 52 8 (set (reg:DI 72 [ ivtmp.17D.1758 ]) (reg:DI 103 [ ivtmp.17D.1758 ])) -1 (expr_list:REG_DEAD (reg:DI 103 [ ivtmp.17D.1758 ]) (expr_list:REG_EQUAL (plus:DI (reg:DI 72 [ ivtmp.17D.1758 ]) (const_int 24 [0x18])) (nil void (gdb) undisp 1 (gdb) p debug_bitmap(live) first = 0x1627200 current = 0x1627200 indx = 0 0x1627200 next = (nil) prev = (nil) indx = 0 bits = { 6 7 16 20 72 82 85 87 } $2 = void (gdb) So GCC should be looking at whether the reg in the REG_EQUAL note is dead *before* the insn. Bottom line: This is a bug in df_kill_notes. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Oct 16, 2012 at 12:15 PM, Steven Bosscher wrote: On Mon, Oct 15, 2012 at 3:26 PM, Steven Bosscher wrote: On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini wrote: Il 15/10/2012 14:53, Steven Bosscher ha scritto: I think I've shown above that we're all looking at the wrong pass... I think you have... so we want a patch like this? I don't think so. df_kill_notes is already supposed to take care of this. But it doesn't because if the SET_DEST of an insn is the same as the register dieing in the insn's REG_EQUAL note, the reg is live at the end of the insn and so the note stays: Breakpoint 2, df_kill_notes (insn=0x75e3e7e0, live=0x7fffda90) at ../../trunk/gcc/df-problems.c:2833 2833 rtx *pprev = REG_NOTES (insn); 1: debug_rtx(insn) = (insn 79 50 52 8 (set (reg:DI 72 [ ivtmp.17D.1758 ]) (reg:DI 103 [ ivtmp.17D.1758 ])) -1 (expr_list:REG_DEAD (reg:DI 103 [ ivtmp.17D.1758 ]) (expr_list:REG_EQUAL (plus:DI (reg:DI 72 [ ivtmp.17D.1758 ]) (const_int 24 [0x18])) (nil void (gdb) undisp 1 (gdb) p debug_bitmap(live) first = 0x1627200 current = 0x1627200 indx = 0 0x1627200 next = (nil) prev = (nil) indx = 0 bits = { 6 7 16 20 72 82 85 87 } $2 = void (gdb) So GCC should be looking at whether the reg in the REG_EQUAL note is dead *before* the insn. Bottom line: This is a bug in df_kill_notes. I think this should fix it. Can't test it right now, so help appreciated (Honza: hint hint! ;-) Ciao! Steven remove_dead_eq_notes.diff Description: Binary data
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 16/10/2012 12:35, Steven Bosscher ha scritto: On Tue, Oct 16, 2012 at 12:15 PM, Steven Bosscher wrote: On Mon, Oct 15, 2012 at 3:26 PM, Steven Bosscher wrote: On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini wrote: Il 15/10/2012 14:53, Steven Bosscher ha scritto: I think I've shown above that we're all looking at the wrong pass... I think you have... so we want a patch like this? I don't think so. df_kill_notes is already supposed to take care of this. But it doesn't because if the SET_DEST of an insn is the same as the register dieing in the insn's REG_EQUAL note, the reg is live at the end of the insn and so the note stays: Breakpoint 2, df_kill_notes (insn=0x75e3e7e0, live=0x7fffda90) at ../../trunk/gcc/df-problems.c:2833 2833 rtx *pprev = REG_NOTES (insn); 1: debug_rtx(insn) = (insn 79 50 52 8 (set (reg:DI 72 [ ivtmp.17D.1758 ]) (reg:DI 103 [ ivtmp.17D.1758 ])) -1 (expr_list:REG_DEAD (reg:DI 103 [ ivtmp.17D.1758 ]) (expr_list:REG_EQUAL (plus:DI (reg:DI 72 [ ivtmp.17D.1758 ]) (const_int 24 [0x18])) (nil void (gdb) undisp 1 (gdb) p debug_bitmap(live) first = 0x1627200 current = 0x1627200 indx = 0 0x1627200 next = (nil) prev = (nil) indx = 0 bits = { 6 7 16 20 72 82 85 87 } $2 = void (gdb) So GCC should be looking at whether the reg in the REG_EQUAL note is dead *before* the insn. Bottom line: This is a bug in df_kill_notes. Yep, and it could cause wrong code generation, couldn't it? Because the new (reg:DI 72) is obviously not equal to (plus:DI (reg:DI 72) (const_int 24)), but it could be propagated to subsequent insns. So I think this patch should be backported to all release branches after regstrapping. I think this should fix it. Can't test it right now, so help appreciated (Honza: hint hint! ;-) Ok after bootstrap, regtest and checking that it actually fixes the bug. Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Tue, Oct 16, 2012 at 1:38 PM, Paolo Bonzini wrote: Bottom line: This is a bug in df_kill_notes. Yep, and it could cause wrong code generation, couldn't it? Because the new (reg:DI 72) is obviously not equal to (plus:DI (reg:DI 72) (const_int 24)), but it could be propagated to subsequent insns. Right. So I think this patch should be backported to all release branches after regstrapping. Agreed, but let's wait a few days at least to see what happens with the patch on the trunk. Ok after bootstrap, regtest and checking that it actually fixes the bug. I made sure of that before posting the patch ;-) Bootstrapregtest is running on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. I'll commit the patch tomorrow if there are no test suite regressions. Well done to us all for probing until we found the root cause of this bug! Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 14/10/2012 22:59, Steven Bosscher ha scritto: On Sun, Oct 14, 2012 at 9:02 AM, Paolo Bonzini wrote: Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV notes that refer to a dead pseudo? I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. I disagree that it is valid. At least it is risky to consider it valid, because a pass that simulates liveness might end up doing something wrong because of that note. If simulation is done backwards, it doesn't even require any interaction with REG_DEAD notes. I also don't want to compute DF_LR taking EQ_USES into account as real uses for liveness, because that involves recomputing and enlarging the DF_LR sets (all of them, both globally and locally) before LRRD and after LRRD. That's why I implemented the quick-and-dirty liveness computation for the notes: It's non-intrusive on DF_LR and it's cheap. Yes, I agree on that part of the implementation. :) Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Oct 15, 2012 at 9:38 AM, Paolo Bonzini wrote: I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. I disagree that it is valid. At least it is risky to consider it valid, because a pass that simulates liveness might end up doing something wrong because of that note. If simulation is done backwards, it doesn't even require any interaction with REG_DEAD notes. In any case, if web doesn't properly rename the register in the REG_EQUAL note (which it doesn't do without my patch) and we declare such a note invalid, then we should remove the note. You're right that GCC ends up doing something wrong, that's why Honza's test case fails. With my patch, the registers in the notes are properly renamed and there are no REG_EQUAL notes that refer to dead registers, so the point whether such a note would be valid is moot. After renaming, the note is valid, and the behavior is restored that GCC had before I added DF_RD_PRUNE_DEAD_DEFS. I think we should come to a conclusion of this discussion: Either we drop the notes (e.g. by re-computing the DF_NOTE problem after web) or we update them, like my patch does. I prefer the patch I proposed because it re-instates the behavior GCC had before. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 15/10/2012 10:13, Steven Bosscher ha scritto: I disagree that it is valid. At least it is risky to consider it valid, because a pass that simulates liveness might end up doing something wrong because of that note. If simulation is done backwards, it doesn't even require any interaction with REG_DEAD notes. In any case, if web doesn't properly rename the register in the REG_EQUAL note (which it doesn't do without my patch) and we declare such a note invalid, then we should remove the note. You're right that GCC ends up doing something wrong, that's why Honza's test case fails. I think we should come to a conclusion of this discussion: Either we drop the notes (e.g. by re-computing the DF_NOTE problem after web) or we update them, like my patch does. I prefer the patch I proposed because it re-instates the behavior GCC had before. I prefer to declare the notes invalid and drop the notes. Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Oct 15, 2012 at 10:19 AM, Paolo Bonzini wrote: I prefer to declare the notes invalid and drop the notes. Then, afaic, our only option is to drop them all in web, as per attached patch. I strongly disagree with this approach though. It destroys information that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can update, and that helps with optimization. This whole discussion about notes being dead has gone in completely the wrong direction. With renaming these notes are valid, and do not refer to dead regs. Perhaps you could be convinced if you look at Honza's test case with the patch of r192413 reverted. The test case still fails with --param max-unroll-times=3, that makes visualizing the problem easier. Ciao! Steven web_destroy_eq_notes.diff Description: Binary data
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Oct 15, 2012 at 10:37 AM, Steven Bosscher stevenb@gmail.com wrote: On Mon, Oct 15, 2012 at 10:19 AM, Paolo Bonzini wrote: I prefer to declare the notes invalid and drop the notes. I strongly disagree with this approach though. It destroys information that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can update, and that helps with optimization. PR54916 is a case where dropping the notes would cause a missed optimization in cse-after-loop. With my patch to update the notes, the optimization (CSE of a load-immediate) is retained. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 15/10/2012 10:37, Steven Bosscher ha scritto: I prefer to declare the notes invalid and drop the notes. Then, afaic, our only option is to drop them all in web, as per attached patch. I strongly disagree with this approach though. It destroys information that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can update, and that helps with optimization. With renaming these notes are valid, and do not refer to dead regs I agree it is bad. But I do not understand the last sentence: I suppose you mean that _without_ renaming these notes are valid, on the other hand it is normal that some of the notes will be dropped if you shorten live ranges. Without removing all of the notes you can do something like this: - drop the deferred rescanning from web.c. Instead, make replace_ref return a bool and call df_insn_rescan manually from web_main. - attribute new registers to webs in a separate pass that happens before rewriting, and compute a special version of LR_IN/LR_OUT that uses the rewritten registers. - process instructions in reverse order; before starting the visit of a basic block, initialize the local LR bitmap with the rewritten LR_OUT of the previous step - after rewriting and scanning each statement, simulate liveness using the new defs and uses. - after rewriting each statement, look for EQ_USES referring to registers that are dead just before the statement, and delete REG_EQUAL notes if this is the case Paolo This whole discussion about notes being dead has gone in completely the wrong direction. With renaming these notes are valid, and do not refer to dead regs. Perhaps you could be convinced if you look at Honza's test case with the patch of r192413 reverted. The test case still fails with --param max-unroll-times=3, that makes visualizing the problem easier.
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Oct 15, 2012 at 1:49 PM, Paolo Bonzini wrote: I strongly disagree with this approach though. It destroys information that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can update, and that helps with optimization. With renaming these notes are valid, and do not refer to dead regs I agree it is bad. But I do not understand the last sentence: I suppose you mean that _without_ renaming these notes are valid, on the other hand it is normal that some of the notes will be dropped if you shorten live ranges. OK, I now got so confused that I looked into this a bit deeper. The situation is something like the following just after unrolling, but before web: 33 r72:DI=[`rowArray'] 34 {r103:DI=r72:DI+0x18;clobber flags:CC;} ... 45 flags:CCNO=cmp([r72:DI+0x20],0) REG_DEAD: r72:DI ;; diamond shape region follows, joining up again in bb 9: 79 r72:DI=r103:DI REG_EQUAL: r72:DI+0x18 On entry to bb9, r72 is not in LR_IN, so after loop unrolling this note is already invalid if we say that a note should not refer to a dead register. But the register dies much earlier. The first place where insn 79 appears is in the .169r.pre dump: ;; basic block 8, loop depth 1, count 0, freq 9100, maybe hot ;; prev block 7, next block 9, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 7 ;; 6 ;; bb 8 artificial_defs: { } ;; bb 8 artificial_uses: { u43(6){ }u44(7){ }u45(16){ }u46(20){ }} ;; lr in6 [bp] 7 [sp] 16 [argp] 20 [frame] 72 82 85 87 ;; lr use 6 [bp] 7 [sp] 16 [argp] 20 [frame] 72 87 ;; lr def 17 [flags] 72 ;; live in 6 [bp] 7 [sp] 16 [argp] 20 [frame] 72 82 85 87 ;; live gen 17 [flags] 72 ;; live kill17 [flags] L49: 50 NOTE_INSN_BASIC_BLOCK 79 r72:DI=r103:DI REG_EQUAL: r72:DI+0x18 Here, r72 is still in LR_IN so the note is valid. Then in .171r.cprop2: ;; basic block 8, loop depth 1, count 0, freq 9100, maybe hot ;; prev block 7, next block 9, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 7 ;; 6 ;; bb 8 artificial_defs: { } ;; bb 8 artificial_uses: { u43(6){ }u44(7){ }u45(16){ }u46(20){ }} ;; lr in6 [bp] 7 [sp] 16 [argp] 20 [frame] 82 85 87 103 ;; lr use 6 [bp] 7 [sp] 16 [argp] 20 [frame] 87 103 ;; lr def 17 [flags] 72 ;; live in 6 [bp] 7 [sp] 16 [argp] 20 [frame] 82 85 87 103 ;; live gen 17 [flags] 72 ;; live kill L49: 50 NOTE_INSN_BASIC_BLOCK 79 r72:DI=r103:DI REG_EQUAL: r72:DI+0x18 So already after CPROP2, the REG_EQUAL note is invalid if we require that they only refer to live registers. This all happens well before any pass uses the DF_RD problem, so this is a pre-existing problem if we consider this kind of REG_EQUAL note to be invalid. Without removing all of the notes you can do something like this: - drop the deferred rescanning from web.c. Instead, make replace_ref return a bool and call df_insn_rescan manually from web_main. - attribute new registers to webs in a separate pass that happens before rewriting, and compute a special version of LR_IN/LR_OUT that uses the rewritten registers. - process instructions in reverse order; before starting the visit of a basic block, initialize the local LR bitmap with the rewritten LR_OUT of the previous step - after rewriting and scanning each statement, simulate liveness using the new defs and uses. - after rewriting each statement, look for EQ_USES referring to registers that are dead just before the statement, and delete REG_EQUAL notes if this is the case I think I've shown above that we're all looking at the wrong pass... Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 15/10/2012 14:53, Steven Bosscher ha scritto: I think I've shown above that we're all looking at the wrong pass... I think you have... so we want a patch like this? Index: df-problems.c === --- df-problems.c (revisione 183719) +++ df-problems.c (copia locale) @@ -3480,6 +3485,18 @@ df_note_bb_compute (unsigned int bb_inde } } + for (use_rec = DF_INSN_UID_EQ_USES (uid); *use_rec; use_rec++) + { + df_ref use = *use_rec; + unsigned int uregno = DF_REF_REGNO (use); + + if (!bitmap_bit_p (live, uregno)) +{ + remove_note (insn, find_reg_equal_equiv_note (insn)); + break; +} + } + if (debug_insn == -1) { /* ??? We could probably do better here, replacing dead Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 15/10/2012 14:53, Steven Bosscher ha scritto: I think I've shown above that we're all looking at the wrong pass... I think you have... so we want a patch like this? I don't think so. df_kill_notes is already supposed to take care of this. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 13/10/2012 00:25, Steven Bosscher ha scritto: On Fri, Oct 12, 2012 at 11:16 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Oct 12, 2012 at 10:44 PM, Jan Hubicka hubi...@ucw.cz wrote: 1) computing liveness with REG_EQUAL included prior RD that means a lot of shuffling of REG_DEAD notes I was already working on a patch for this. I'll send it here later tonight. Great, thanks! This is probably most sensible approach even if we will need to recompute liveness before/after webizer. I don't think we have to touch the liveness sets. We can compute an extra set of registers live only for REG_EQUAL/REG_EQUIV notes. Attached is what I had in mind. Untested, etc. it's late (and the Yankees are playing) so I'll get back to properly testing this tomorrow. Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV notes that refer to a dead pseudo? Paolo
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sun, Oct 14, 2012 at 9:02 AM, Paolo Bonzini wrote: Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV notes that refer to a dead pseudo? I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. I also don't want to compute DF_LR taking EQ_USES into account as real uses for liveness, because that involves recomputing and enlarging the DF_LR sets (all of them, both globally and locally) before LRRD and after LRRD. That's why I implemented the quick-and-dirty liveness computation for the notes: It's non-intrusive on DF_LR and it's cheap. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. But if you have a REG_DEAD note for b on the first insn, then you cannot rematerialize the REG_EQUAL note after it, otherwise bad things can happen. See PR rtl-optimization/51505 for an example. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sun, Oct 14, 2012 at 11:25 PM, Eric Botcazou wrote: I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. But if you have a REG_DEAD note for b on the first insn, then you cannot rematerialize the REG_EQUAL note after it, otherwise bad things can happen. See PR rtl-optimization/51505 for an example. That's not the case here. The register is only dead because the webizer renamed one of its live ranges but forgets to rename the EQ_NOTE use. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On 2012.10.12 at 21:43 +0200, Jan Hubicka wrote: I finally tracked down twolf misoptimization triggered by my loop-unroll.c changes. It has turned out to be semi-latent wrong code issue in webizer. What happens is: 1) gcse.c drop REG_EQUAL note on the induction variable 2) loop optimizer unrolls the loop enabling webizer to cleanup 3) webizer do not track reaching refs into REG_EQUAL note because the variable is dead and renames it to unused pseudo Program is stil valid but the REG_EQUAL is bogus. 4) CSE eventually takes the value and put it back into the code 5) init-regs initializes it to 0 and result is a segfault on the following testcase. Fixed by making webizer to not prune dead regs. Will commit it after testing on x86_64-linux. FYI this also fixes the lto/profiledbootstrap breakage, PR 54885. Thanks. -- Markus
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Fri, Oct 12, 2012 at 9:43 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, I finally tracked down twolf misoptimization triggered by my loop-unroll.c changes. It has turned out to be semi-latent wrong code issue in webizer. What happens is: 1) gcse.c drop REG_EQUAL note on the induction variable 2) loop optimizer unrolls the loop enabling webizer to cleanup 3) webizer do not track reaching refs into REG_EQUAL note because the variable is dead and renames it to unused pseudo Program is stil valid but the REG_EQUAL is bogus. What do you mean, not track? web.c does track EQ notes via DF this: web.c:315: df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES); 4) CSE eventually takes the value and put it back into the code 5) init-regs initializes it to 0 and result is a segfault on the following testcase. Fixed by making webizer to not prune dead regs. Will commit it after testing on x86_64-linux. Honza /* { dg-do run } */ /* { dg-options -O3 -funroll-loops } */ typedef struct rowbox { int startx ; int endx ; int endx1 ; int startx2 ; int ypos ; int desiredL ; } ROWBOX ; ROWBOX rowArray1[2] ; ROWBOX *rowArray = rowArray1; int numRows = 2; int row = 1; int block = 0; double ckt_size_factor ; __attribute__ ((noinline)) configure2() { block = 0 ; for( row = 1 ; row = numRows ; row++ ) { block++ ; if( rowArray[row].endx1 0 ) { block++ ; } } } main() { configure2(); } * web.c (web_main): Do not set DF_RD_PRUNE_DEAD_DEFS flag. Index: web.c === --- web.c (revision 192369) +++ web.c (working copy) @@ -313,7 +313,8 @@ web_main (void) rtx insn; df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES); - df_set_flags (DF_RD_PRUNE_DEAD_DEFS); + /* We can not RD_PRUNE_DEAD_DEFS, because we care about REG_EQUAL + notes. */ df_chain_add_problem (DF_UD_CHAIN); df_analyze (); df_set_flags (DF_DEFER_INSN_RESCAN); This is bogus. You're papering over the underlying bug of an invalid REG_EQUAL note. This patch is not OK! I did not add DF_RD_PRUNE_DEAD_DEFS for no reason. You're re-introducing a major compile time hog. Did you do timings on some significant body of code with -fweb enabled? Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Hi, On your test case, I have this: Web oldreg=72 newreg=111 Web oldreg=72 newreg=112 Web oldreg=72 newreg=139 Web oldreg=72 newreg=145 Web oldreg=72 newreg=151 Web oldreg=72 newreg=157 Web oldreg=72 newreg=163 Web oldreg=72 newreg=169 --- t.c.182r.loop2_done 2012-10-12 22:23:59.0 +0200 +++ t.c.183r.web 2012-10-12 22:24:23.0 +0200 - 79 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 79 r112:DI=r102:DI + REG_EQUAL: r111:DI+0x18 102 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 124 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 150 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 176 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 202 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 228 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 254 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + REG_EQUAL: r111:DI+0x18 - 285 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 285 r139:DI=r114:DI + REG_EQUAL: r111:DI+0x18 - 306 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 306 r145:DI=r141:DI + REG_EQUAL: r111:DI+0x18 - 327 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 327 r151:DI=r147:DI + REG_EQUAL: r111:DI+0x18 - 348 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 348 r157:DI=r153:DI + REG_EQUAL: r111:DI+0x18 - 369 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 369 r163:DI=r159:DI + REG_EQUAL: r111:DI+0x18 - 390 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 390 r169:DI=r165:DI + REG_EQUAL: r111:DI+0x18 - 411 r72:DI=r102:DI - REG_EQUAL: r72:DI+0x18 + 411 r72:DI=r171:DI + REG_EQUAL: r111:DI+0x18 So all appearances of r72 in REG_EQUAL notes have been replaced with r111. That means the constructed webs are wrong. If the REG_EQUAL is for the DEF operand of the insn (which must be a single_set insn, or there wouldn't be a note) then the DEF and the EQ_USE should be in the same web. I don't think this has anything to do with DF_RD_PRUNE_DEAD_DEFS. This is a pre-existing bug in web.c that just happens to be exposed by loop unrolling. Please do not apply the patch, it only papers over another problem in web.c. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Fri, Oct 12, 2012 at 9:43 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, I finally tracked down twolf misoptimization triggered by my loop-unroll.c changes. It has turned out to be semi-latent wrong code issue in webizer. What happens is: 1) gcse.c drop REG_EQUAL note on the induction variable 2) loop optimizer unrolls the loop enabling webizer to cleanup 3) webizer do not track reaching refs into REG_EQUAL note because the variable is dead and renames it to unused pseudo Program is stil valid but the REG_EQUAL is bogus. What do you mean, not track? web.c does track EQ notes via DF this: web.c:315: df_set_flags (DF_NO_HARD_REGS + DF_EQ_NOTES); This is bogus. You're papering over the underlying bug of an invalid REG_EQUAL note. This patch is not OK! The REG_EQUAL note is valid until webizer breaks it. DF_EQ_NOTES makes UD chain problem to look into uses of REG_EQUAL and add all reaching definitions into the list. To do the reaching definitions it uses RD problem solution. This solution do not really care about REG_EQUAL notes because it does only over definitions. It however uses results of LIVENESS problem via REG_DEAD notes and the pruning trick. Now th REG_EQUAL note uses dead register (that is aloved), because the liveness is computed ignoring REG_EQUAL notes and thus the definition gets prunned from the list and consequentely webizer misses the link. I did not add DF_RD_PRUNE_DEAD_DEFS for no reason. You're re-introducing a major compile time hog. Did you do timings on some significant body of code with -fweb enabled? Well, I can not think how to fix it without 1) computing liveness with REG_EQUAL included prior RD that means a lot of shuffling of REG_DEAD notes 2) making webizer to drop all REG_EQUAL notes that use dead register (for that it will need to track liveness too) Do you have better ideas? Honza Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Fri, Oct 12, 2012 at 10:44 PM, Jan Hubicka hubi...@ucw.cz wrote: 1) computing liveness with REG_EQUAL included prior RD that means a lot of shuffling of REG_DEAD notes I was already working on a patch for this. I'll send it here later tonight. Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Fri, Oct 12, 2012 at 10:44 PM, Jan Hubicka hubi...@ucw.cz wrote: 1) computing liveness with REG_EQUAL included prior RD that means a lot of shuffling of REG_DEAD notes I was already working on a patch for this. I'll send it here later tonight. Great, thanks! This is probably most sensible approach even if we will need to recompute liveness before/after webizer. This bug really took me days to reduce into something sensible. It tends to reproduce only in terribly complex functions with many loops. Honza Ciao! Steven
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Fri, Oct 12, 2012 at 11:16 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Oct 12, 2012 at 10:44 PM, Jan Hubicka hubi...@ucw.cz wrote: 1) computing liveness with REG_EQUAL included prior RD that means a lot of shuffling of REG_DEAD notes I was already working on a patch for this. I'll send it here later tonight. Great, thanks! This is probably most sensible approach even if we will need to recompute liveness before/after webizer. I don't think we have to touch the liveness sets. We can compute an extra set of registers live only for REG_EQUAL/REG_EQUIV notes. Attached is what I had in mind. Untested, etc. it's late (and the Yankees are playing) so I'll get back to properly testing this tomorrow. Ciao! Steven df_rd_lr_eq.diff Description: Binary data