[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #28 from Jeffrey A. Law --- Author: law Date: Fri Nov 7 22:55:00 2014 New Revision: 217239 URL: https://gcc.gnu.org/viewcvs?rev=217239&root=gcc&view=rev Log: PR tree-optimization/61515 * tree-ssa-threadedge.c (invalidate_equivalences): Walk the unwinding stack rather than looking at ever SSA_NAME's value. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-threadedge.c
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #27 from Jeffrey A. Law --- Richi, I thought you had mentioned something along those lines as well and I scanned for it yesterday but didn't see it. Maybe it's in a different BZ or something. I'll probably come across it as a I work through the various jump threading BZs ;-)
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #26 from Richard Biener --- (In reply to Jeffrey A. Law from comment #25) > So I think there's another approach. invalidate_equivalences is passed in > the stack of temporary equivalences, which include those created by jump > threading as well as those created by DOM. > > This means that in theory, we could walk that stack (in reverse order) to > find all the equivalences we need to invalidate. If the stack has fewer > entries than we have SSA_NAMES, then we win. > > For the stripped down testcase in this PR we go from ~40 seconds to ~1 > second in the relevant part of the compiler with a hack that does exactly > what's described above. > > I need to do a bit more review/research, but it looks like a promising > alternative. Huh, I thought I suggested that somewhere but you said the information is not available here. But yes, this sounds exactly what we want to do complexity-wise. I suppose we would need to amend the stack of tempoary equivalences to record the basic-block we are creating the actual set of equivalences for (to be able to terminate the walk?)
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #25 from Jeffrey A. Law --- So I think there's another approach. invalidate_equivalences is passed in the stack of temporary equivalences, which include those created by jump threading as well as those created by DOM. This means that in theory, we could walk that stack (in reverse order) to find all the equivalences we need to invalidate. If the stack has fewer entries than we have SSA_NAMES, then we win. For the stripped down testcase in this PR we go from ~40 seconds to ~1 second in the relevant part of the compiler with a hack that does exactly what's described above. I need to do a bit more review/research, but it looks like a promising alternative.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #24 from Richard Biener --- (In reply to Jeffrey A. Law from comment #23) > The piece you're missing in this is that when we seen an assignment to some > SSA_NAME, call it LHS and we've traversed a backedge, then we have to walk > through all the equivalences and see if there's anything that's been marked > as equivalent to LHS and invalidate those. Then we also ahve to invalidate > LHS. > > for (unsigned int i = 1; i < num_ssa_names; i++) > if (ssa_name (i) && SSA_NAME_VALUE (ssa_name (i)) == lhs) > record_temporary_equivalence (ssa_name (i), NULL_TREE, stack); > > if (SSA_NAME_VALUE (lhs)) > record_temporary_equivalence (lhs, NULL_TREE, stack); > > > The loop finds handles things equivalent to LHS, the second handles LHS > itself. Both are required for correctness. > > In the past there was a map similar to your implementation, but it's not > sufficient. See pr61289.C and pr61289-2.C. > > The problem is you may need to invalidate equivalences that are created > *outside* tree-ssa-threadedge.c. So any attempts to track inside > tree-ssa-threadedge are not sufficient for correctness. > > What's still not clear to me here is *why* we're calling the invalidation > code in the first place. It's only supposed to be called when we encounter > a statement which produces an output that we can't handle *and* we've > traversed a backedge. The combination of those two events is supposed to be > very rare. Otherwise the invalidation is obviously too expensive. That's > why I suggested in c#12 that we need to look at *why* we're calling the > invalidation code at all. Well, I don't know the code at all so I can just try to fix the complexity in the present algorithm. Please have a look here during stage3.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #23 from Jeffrey A. Law --- The piece you're missing in this is that when we seen an assignment to some SSA_NAME, call it LHS and we've traversed a backedge, then we have to walk through all the equivalences and see if there's anything that's been marked as equivalent to LHS and invalidate those. Then we also ahve to invalidate LHS. for (unsigned int i = 1; i < num_ssa_names; i++) if (ssa_name (i) && SSA_NAME_VALUE (ssa_name (i)) == lhs) record_temporary_equivalence (ssa_name (i), NULL_TREE, stack); if (SSA_NAME_VALUE (lhs)) record_temporary_equivalence (lhs, NULL_TREE, stack); The loop finds handles things equivalent to LHS, the second handles LHS itself. Both are required for correctness. In the past there was a map similar to your implementation, but it's not sufficient. See pr61289.C and pr61289-2.C. The problem is you may need to invalidate equivalences that are created *outside* tree-ssa-threadedge.c. So any attempts to track inside tree-ssa-threadedge are not sufficient for correctness. What's still not clear to me here is *why* we're calling the invalidation code in the first place. It's only supposed to be called when we encounter a statement which produces an output that we can't handle *and* we've traversed a backedge. The combination of those two events is supposed to be very rare. Otherwise the invalidation is obviously too expensive. That's why I suggested in c#12 that we need to look at *why* we're calling the invalidation code at all.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.9.2 |4.9.3 --- Comment #22 from Jakub Jelinek --- GCC 4.9.2 has been released.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #21 from Richard Biener --- So to recap - apart from really fixing the quadraticness - it would be nice if we can just disable threading over backedges at -O1, thus for !flag_expensive_optimizations. Especially on the 4.9 branch.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #20 from Richard Biener --- Created attachment 33766 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33766&action=edit non-working patch Of course this still walks all SSA names (but only once per BB), so it isn't really removing the quadraticness. Moreover the patch (see attachment) I was testing fails to bootstrap on the 4.9 branch, seemingly miscompiling stage2 gcc (and crashing in building stage2 libgcc in swap_tree_comparison called from IVOPTs). Eventually record_temporary_equivalences_from_stmts_at_dest depends on the exact ordering of record_temporary_equivalence calls (I do the invaldiate ones late). Jeff? As this is a regression from 4.9.0 appearing in 4.9.1 it's quite bad that this is now still unfixed given we are about to release 4.9.2. I'm out of here again ;)
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #19 from Richard Biener --- Btw, are you sure all temporary equivalences are to SSA names only? ISTR we have memory equivalencies as well.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #18 from Richard Biener --- Testing that.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #17 from Richard Biener --- Oops, that's not 100% the same. But /* Now invalidate all equivalencies we have to invalidate. */ for (unsigned int i = 1; i < num_ssa_names; ++i) { tree name = ssa_name (i); if (!name) continue; tree val = SSA_NAME_VALUE (name); if (TREE_CODE (val) == SSA_NAME && bitmap_bit_p (to_invalidate, SSA_NAME_VERSION (val))) record_temporary_equivalence (name, NULL_TREE, stack); } unsigned i; bitmap_iterator bi; EXECUTE_IF_SET_IN_BITMAP (to_invalidate, 0, i, bi) { tree name = ssa_name (i); if (SSA_NAME_VALUE (ssa_name (i))) record_temporary_equivalence (name, NULL_TREE, stack); } would be.
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #16 from Richard Biener --- Btw, why is it recording twice a temporary equivalence? Might be simpler even with /* Now invalidate all equivalencies we have to invalidate. */ unsigned i; sbitmap_iterator bi; EXECUTE_IF_SET_IN_BITMAP (to_invalidate, 0, i, bi) record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);
[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #15 from Richard Biener --- (In reply to rguent...@suse.de from comment #13) > On Tue, 17 Jun 2014, law at redhat dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 > > > > Jeffrey A. Law changed: > > > >What|Removed |Added > > > > CC||law at redhat dot com > >Assignee|unassigned at gcc dot gnu.org |law at redhat dot > > com > > > > --- Comment #12 from Jeffrey A. Law --- > > Fundamentally we don't have a way to know what equivalences we need to > > invalidate. > > Invalidation is, umm, painful. The question in my mind is why are we > > getting > > so many invalidations to start with. That's the first thing to look at. > > Well, it's easy to avoid the quadraticness - you can always create > testcases that need a lot of invalidates. But the current algorithm > really does not scale. > > > Honestly though, I really wonder if handling backedges is worth the effort, > > even though it's important for one benchmark. > > Not sure about that, but trivial improvements to the scalability are > possible here. Walking all SSA names is O(number of stmts) - if you > do that O(number of stmts) time (as you do) that's clearly the bug. Sth like (untested) Index: tree-ssa-threadedge.c === --- tree-ssa-threadedge.c (revision 216464) +++ tree-ssa-threadedge.c (working copy) @@ -287,20 +287,6 @@ fold_assignment_stmt (gimple stmt) } } -/* A new value has been assigned to LHS. If necessary, invalidate any - equivalences that are no longer valid. */ -static void -invalidate_equivalences (tree lhs, vec *stack) -{ - - for (unsigned int i = 1; i < num_ssa_names; i++) -if (ssa_name (i) && SSA_NAME_VALUE (ssa_name (i)) == lhs) - record_temporary_equivalence (ssa_name (i), NULL_TREE, stack); - - if (SSA_NAME_VALUE (lhs)) -record_temporary_equivalence (lhs, NULL_TREE, stack); -} - /* Try to simplify each statement in E->dest, ultimately leading to a simplification of the COND_EXPR at the end of E->dest. @@ -335,6 +321,8 @@ record_temporary_equivalences_from_stmts we discover. Note any equivalences we discover are context sensitive (ie, are dependent on traversing E) and must be unwound when we're finished processing E. */ + sbitmap to_invalidate = sbitmap_alloc (num_ssa_names); + bitmap_clear (to_invalidate); for (gsi = gsi_start_bb (e->dest); !gsi_end_p (gsi); gsi_next (&gsi)) { tree cached_lhs = NULL; @@ -379,7 +367,7 @@ record_temporary_equivalences_from_stmts if (backedge_seen) FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_DEF) - invalidate_equivalences (op, stack); + bitmap_set_bit (to_invalidate, SSA_NAME_VERSION (op)); continue; } @@ -419,7 +407,7 @@ record_temporary_equivalences_from_stmts if (backedge_seen) { tree lhs = gimple_get_lhs (stmt); - invalidate_equivalences (lhs, stack); + bitmap_set_bit (to_invalidate, SSA_NAME_VERSION (lhs)); } continue; } @@ -497,8 +485,24 @@ record_temporary_equivalences_from_stmts || is_gimple_min_invariant (cached_lhs))) record_temporary_equivalence (gimple_get_lhs (stmt), cached_lhs, stack); else if (backedge_seen) - invalidate_equivalences (gimple_get_lhs (stmt), stack); + bitmap_set_bit (to_invalidate, + SSA_NAME_VERSION (gimple_get_lhs (stmt))); } + + /* Now invalidate all equivalencies we have to invalidate. */ + unsigned i; + sbitmap_iterator bi; + EXECUTE_IF_SET_IN_BITMAP (to_invalidate, 0, i, bi) +{ + tree name = ssa_name (i); + record_temporary_equivalence (name, NULL_TREE, stack); + + if (name) + record_temporary_equivalence (name, NULL_TREE, stack); +} + + sbitmap_free (to_invalidate); + return stmt; }