[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #13 from Richard Biener --- Fixed.
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #12 from Richard Biener --- Author: rguenth Date: Thu Aug 11 09:02:04 2016 New Revision: 239357 URL: https://gcc.gnu.org/viewcvs?rev=239357=gcc=rev Log: 2016-08-11 Richard BienerPR tree-optimization/72772 * cfgloopmanip.c (create_preheader): Use split_edge if there is a single loop entry, avoiding degenerate PHIs. * gcc.dg/graphite/pr35356-1.c: Adjust. * gcc.dg/tree-ssa/pr59597.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgloopmanip.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/graphite/pr35356-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #11 from amker at gcc dot gnu.org --- Author: amker Date: Tue Aug 9 15:08:02 2016 New Revision: 239291 URL: https://gcc.gnu.org/viewcvs?rev=239291=gcc=rev Log: PR tree-optimization/72772 * tree-ssa-loop-niter.c (loop_exits_before_overflow): Check equality for expanded base. gcc/testsuite PR tree-optimization/pr72772 * gcc.dg/tree-ssa/pr72772.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr72772.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-niter.c
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #10 from amker at gcc dot gnu.org --- Author: amker Date: Tue Aug 9 15:01:49 2016 New Revision: 239290 URL: https://gcc.gnu.org/viewcvs?rev=239290=gcc=rev Log: PR tree-optimization/72772 * tree-ssa-loop-niter.h (simplify_using_initial_conditions): Delete parameter STOP. * tree-ssa-loop-niter.c (tree_simplify_using_condition_1): Delete parameter STOP and update calls. Move expand_simple_operations function call from here... (simplify_using_initial_conditions): ...to here. Delete parameter STOP. (tree_simplify_using_condition): Delete parameter STOP. * tree-scalar-evolution.c (simple_iv_with_niters): Update call to simplify_using_initial_conditions. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-scalar-evolution.c trunk/gcc/tree-ssa-loop-niter.c trunk/gcc/tree-ssa-loop-niter.h
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #9 from Richard Biener --- FAIL: gcc.dg/tree-ssa/20030714-2.c scan-tree-dump-times dom2 "if " 3 is because jump-threading doesn't want to thread to a loop header and w/o the patch we have a pre-header with the degenerate PHI which causes it to stop there. With the patch we have an empty pre-header which is threaded as well and thus threading fails. We hit /* One case occurs when there was loop header buried in a jump threading path that crosses loop boundaries. We do not try and thread this elsewhere, so just cancel the jump threading request by clearing the AUX field now. */ if ((bb->loop_father != e2->src->loop_father && !loop_exit_edge_p (e2->src->loop_father, e2)) || (e2->src->loop_father != e2->dest->loop_father && !loop_exit_edge_p (e2->src->loop_father, e2))) { /* Since this case is not handled by our special code to thread through a loop header, we must explicitly cancel the threading request here. */ delete_jump_thread_path (path); e->aux = NULL; continue; } as e2 is the loop entry edge. Registering jump thread: (3, 5) incoming edge; (5, 8) normal; (8, 6) nocopy; and 8->6 is this edge while 8 is the loop preheader. Not sure why we have the above 2nd condition. Not sure why, if we have this condition, we do not simply drop the tail of this path. It looks like mark_threaded_blocks would fix this up if the crossed_header check would be >= 1 and not just > 1. Index: gcc/tree-ssa-threadupdate.c === --- gcc/tree-ssa-threadupdate.c (revision 239117) +++ gcc/tree-ssa-threadupdate.c (working copy) @@ -1531,10 +1531,8 @@ thread_block_1 (basic_block bb, bool nol threading path that crosses loop boundaries. We do not try and thread this elsewhere, so just cancel the jump threading request by clearing the AUX field now. */ - if ((bb->loop_father != e2->src->loop_father - && !loop_exit_edge_p (e2->src->loop_father, e2)) - || (e2->src->loop_father != e2->dest->loop_father - && !loop_exit_edge_p (e2->src->loop_father, e2))) + if (bb->loop_father != e2->src->loop_father + && !loop_exit_edge_p (e2->src->loop_father, e2)) { /* Since this case is not handled by our special code to thread through a loop header, we must explicitly fixes this testcase.
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #8 from Richard Biener --- A more light-weight fix for the specific issue would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 239117) +++ gcc/tree-cfgcleanup.c (working copy) @@ -344,7 +344,7 @@ tree_forwarder_block_p (basic_block bb, { basic_block dest; /* Protect loop headers. */ - if (bb->loop_father->header == bb) + if (bb_loop_header_p (bb)) return false; dest = EDGE_SUCC (bb, 0)->dest; where bb_loop_header_p only relies on DOM info. Of course other checks are still bogus in the presence of new, undiscovered loops (and overly cautionous in the case of removed loops not marked for removal). This applies to all loop related stuff done in CFG hooks as well (though if LOOPS_NEED_FIXUP is set all the updates they do bogously or fail to do are fixed by the required loop-fixup). So in the end only CFG cleanup routines guarding transforms require up-to-date loop info. The above is nearly the only one (with the following preheader/latch checks falling into the same category as the CFG hook cases). So the above might be a complete fix at this point. Testing that now.
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #7 from Richard Biener --- Thus the following "fixes" it. I'm really not sure at which level we should try to fix this - ideally passes doing this kind of transform would reset the upper bound (and niter info). Unfortunately in this case it is really CFG cleanup ... but the thread pass threading destination is a new header and not yet identified as such (the thread pass fails to create the new loop itself, leaving it to loop fixup - in this case CFG cleanup avoids this new loop by mashing it with the existing one). In fact tree_forwarder_block_p guards against the transform (but it doesnt' work because BB is not yet marked as loop header): if (current_loops) { basic_block dest; /* Protect loop headers. */ if (bb->loop_father->header == bb) return false; CFG cleanup runs loop fixup after its job and I'm (not) sure we want to do it at its start as well to catch this case. Index: gcc/loop-init.c === --- gcc/loop-init.c (revision 239117) +++ gcc/loop-init.c (working copy) @@ -306,6 +311,8 @@ fix_loop_structure (bitmap changed_bbs) flow_loop_free (loop); any_deleted = true; } +else if (loop) + loop->any_upper_bound = false; /* If we deleted loops then the cached scalar evolutions refering to those loops become invalid. */ The alternative is to make sure loops are fixed up before CFG cleanup as it guards transforms using loop info: Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 239117) +++ gcc/tree-cfgcleanup.c (working copy) @@ -731,59 +731,19 @@ cleanup_tree_cfg_1 (void) } -/* Remove unreachable blocks and other miscellaneous clean up work. - Return true if the flowgraph was modified, false otherwise. */ - -static bool -cleanup_tree_cfg_noloop (void) -{ - bool changed; - - timevar_push (TV_TREE_CLEANUP_CFG); - - /* Iterate until there are no more cleanups left to do. If any - iteration changed the flowgraph, set CHANGED to true. - - If dominance information is available, there cannot be any unreachable - blocks. */ - if (!dom_info_available_p (CDI_DOMINATORS)) -{ - changed = delete_unreachable_blocks (); - calculate_dominance_info (CDI_DOMINATORS); -} - else -{ - checking_verify_dominators (CDI_DOMINATORS); - changed = false; -} - - changed |= cleanup_tree_cfg_1 (); - - gcc_assert (dom_info_available_p (CDI_DOMINATORS)); - compact_blocks (); - - checking_verify_flow_info (); - - timevar_pop (TV_TREE_CLEANUP_CFG); - - if (changed && current_loops) -loops_state_set (LOOPS_NEED_FIXUP); - - return changed; -} - /* Repairs loop structures. */ @@ -803,17 +764,57 @@ repair_loop_structures (void) timevar_pop (TV_REPAIR_LOOPS); } -/* Cleanup cfg and repair loop structures. */ +/* Cleanup the CFG by removing unreachable blocks and doing other miscellaneous + clean up work. Also repair loop structures if necessary. + + Return true if the flowgraph was modified, false otherwise. */ bool cleanup_tree_cfg (void) { - bool changed = cleanup_tree_cfg_noloop (); + bool changed; + + timevar_push (TV_TREE_CLEANUP_CFG); + + /* Iterate until there are no more cleanups left to do. If any + iteration changed the flowgraph, set CHANGED to true. + + If dominance information is available, there cannot be any unreachable + blocks. */ + if (!dom_info_available_p (CDI_DOMINATORS)) +{ + changed = delete_unreachable_blocks (); + calculate_dominance_info (CDI_DOMINATORS); +} + else +{ + checking_verify_dominators (CDI_DOMINATORS); + changed = false; +} + /* Repair loop structures as we rely on loops being correct with + guarding several of the transforms. */ if (current_loops != NULL && loops_state_satisfies_p (LOOPS_NEED_FIXUP)) repair_loop_structures (); + changed |= cleanup_tree_cfg_1 (); + + gcc_assert (dom_info_available_p (CDI_DOMINATORS)); + compact_blocks (); + + checking_verify_flow_info (); + + timevar_pop (TV_TREE_CLEANUP_CFG); + + if (changed && current_loops != NULL) +{ + /* We are not marking loops for fixup in all cases, thus force it + here. */ + loops_state_set (LOOPS_NEED_FIXUP); + repair_loop_structures (); +} + return changed; } I'm leaning towards this one even though it is more expensive :/ (and we should fix that last comment issue and re-visit the loop-closed SSA update stuff we do in repair_loop_structures).
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #6 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #5) > Ok, so we record a correct upper bound for a loop but then later thread1 > makes > a mess of the loop structures, introducing multiple back-edges by mashing > an outer loop of said loop into it, making the number of iteration upper > bound obviously incorrect. > > And it is CFG cleanup which introduces those multiple back-edges, mashing > the two loops! It does this by treating the outer loop header as > "forwarder". > > Now we could detect this situation in redirect_edge_and_branch or avoid this > loop smashing (in the end loop init will disambiguate the latches again if > requested, re-creating the outer loop). I think we at least want such > smashing late (more aggressive cleanup was introduced because otherwise > we create worse initial RTL and assembler). > > The issue is how to detect this reliably in redirect_edge_and_branch given > it has to work when LOOPS_NEED_FIXUP is set ... we have to reset niter > info and estimates / upper bound whenever we redirect a latch - of both > the source and destination loop. With LOOPS_NEED_FIXUP we only can rely > on headers (well, not even that, and all other cfghook loop fixup stuff > has a similar issue). The other option is to simply wipe niter info whenever > we fixup loops (loop fixup is unfortunately a global thing). > > I believe I saw this issue on another PR as well. OTOH, I noticed that (at least) several optimizers rely on bound information passed along in compilation flow. And it's hard to get accurate bound information by niter analyzer itself, for example, vectorized loops generally have parameterized start/end condition values.
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #5 from Richard Biener --- Ok, so we record a correct upper bound for a loop but then later thread1 makes a mess of the loop structures, introducing multiple back-edges by mashing an outer loop of said loop into it, making the number of iteration upper bound obviously incorrect. And it is CFG cleanup which introduces those multiple back-edges, mashing the two loops! It does this by treating the outer loop header as "forwarder". Now we could detect this situation in redirect_edge_and_branch or avoid this loop smashing (in the end loop init will disambiguate the latches again if requested, re-creating the outer loop). I think we at least want such smashing late (more aggressive cleanup was introduced because otherwise we create worse initial RTL and assembler). The issue is how to detect this reliably in redirect_edge_and_branch given it has to work when LOOPS_NEED_FIXUP is set ... we have to reset niter info and estimates / upper bound whenever we redirect a latch - of both the source and destination loop. With LOOPS_NEED_FIXUP we only can rely on headers (well, not even that, and all other cfghook loop fixup stuff has a similar issue). The other option is to simply wipe niter info whenever we fixup loops (loop fixup is unfortunately a global thing). I believe I saw this issue on another PR as well.
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 Richard Biener changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #4 from Richard Biener --- Most of the fallout is doing more jump-threading. It looks like threading was also confused by the extra degenerate PHIs. The two ICEs are selective scheduling messing up the loop structure somehow.
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #3 from Richard Biener --- The patch has quite some fallout, including ICEs and wrong-code :/ FAIL: c-c++-common/ubsan/pr71403-1.c -O3 -g execution test FAIL: gcc.dg/graphite/pr35356-1.c scan-tree-dump graphite "if (P_9 >= P_10 \ \\\+ 1 && P_10 >= 0) {" FAIL: gcc.dg/tree-ssa/20030714-2.c scan-tree-dump-times dom2 "if " 3 FAIL: gcc.dg/tree-ssa/ivopt_5.c scan-tree-dump-times ivopts "ivtmp.[0-9_]* = PHI <[^0]" 0 FAIL: gcc.dg/tree-ssa/loadpre2.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre21.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre22.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/pr21417.c scan-tree-dump-times thread4 "FSM jump thread" 1 FAIL: gcc.dg/tree-ssa/pr59597.c scan-tree-dump vrp1 "Cancelling" FAIL: gcc.dg/tree-ssa/ssa-pre-23.c scan-tree-dump pre "Eliminated: 3" FAIL: gcc.target/i386/pr60901.c (internal compiler error) FAIL: gcc.target/i386/pr60901.c (test for excess errors) and with -m32 additionally FAIL: gcc.dg/pr45570.c (internal compiler error) FAIL: gcc.dg/pr45570.c (test for excess errors) FAIL: gcc.dg/pr70920-4.c scan-tree-dump forwprop1 "if (_[0-9]* == 0)" FAIL: gcc.dg/graphite/scop-dsyr2k.c scan-tree-dump-times graphite "number of SCoPs: 1" 1 FAIL: gcc.dg/graphite/scop-dsyrk.c scan-tree-dump-times graphite "number of SCoPs: 1" 1 FAIL: gcc.dg/tree-ssa/pr22051-2.c scan-tree-dump-times optimized "r_. = (int) q" 1 FAIL: gcc.dg/tree-ssa/pr71078-3.c scan-tree-dump forwprop1 "__builtin_copysign" I fixed the PRE testcases sofar, need to investigate the above some more. The pasted patch contained an error, the if (single_edge) condition has to be if (nentry == 1).
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 --- Comment #2 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #1) > Note that for some odd reason creating the preheader causes the PHI to appar. > Fixing that would be appreciated - it's the make_forwarder_block CFG hook > that causes this. Not sure why create_preheader doesn't special-case the > single_entry != NULL case by simply splitting the edge. > Thanks. I am also testing patch to simplify_using_initial_conditions and loop_exits_before_oveflow. I think they worth fixing too because patches are kind like a simplification. Anyway I will send for comment once passes bootstrap
[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2016-08-02 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Note that for some odd reason creating the preheader causes the PHI to appar. Fixing that would be appreciated - it's the make_forwarder_block CFG hook that causes this. Not sure why create_preheader doesn't special-case the single_entry != NULL case by simply splitting the edge. The following fixes the testcase: Index: cfgloopmanip.c === --- cfgloopmanip.c (revision 238938) +++ cfgloopmanip.c (working copy) @@ -1497,7 +1497,7 @@ has_preds_from_loop (basic_block block, basic_block create_preheader (struct loop *loop, int flags) { - edge e, fallthru; + edge e; basic_block dummy; int nentry = 0; bool irred = false; @@ -1544,9 +1544,14 @@ create_preheader (struct loop *loop, int mfb_kj_edge = loop_latch_edge (loop); latch_edge_was_fallthru = (mfb_kj_edge->flags & EDGE_FALLTHRU) != 0; - fallthru = make_forwarder_block (loop->header, mfb_keep_just, NULL); - dummy = fallthru->src; - loop->header = fallthru->dest; + if (single_entry) +dummy = split_edge (single_entry); + else +{ + edge fallthru = make_forwarder_block (loop->header, mfb_keep_just, NULL); + dummy = fallthru->src; + loop->header = fallthru->dest; +} /* Try to be clever in placing the newly created preheader. The idea is to avoid breaking any "fallthruness" relationship between blocks.