[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-11-07 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #28 from Jeffrey A. Law  ---
Author: law
Date: Fri Nov  7 22:55:00 2014
New Revision: 217239

URL: https://gcc.gnu.org/viewcvs?rev=217239&root=gcc&view=rev
Log:
PR tree-optimization/61515
* tree-ssa-threadedge.c (invalidate_equivalences): Walk the unwinding stack
rather than looking at ever SSA_NAME's value.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-threadedge.c


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-11-07 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #27 from Jeffrey A. Law  ---
Richi,

I thought you had mentioned something along those lines as well and I scanned
for it yesterday but didn't see it.   Maybe it's in a different BZ or
something.  I'll probably come across it as a I work through the various jump
threading BZs ;-)


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-11-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #26 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #25)
> So I think there's another approach.  invalidate_equivalences is passed in
> the stack of temporary equivalences, which include those created by jump
> threading as well as those created by DOM.
> 
> This means that in theory, we could walk that stack (in reverse order) to
> find all the equivalences we need to invalidate.  If the stack has fewer
> entries than we have SSA_NAMES, then we win.
> 
> For the stripped down testcase in this PR we go from ~40 seconds to ~1
> second in the relevant part of the compiler with a hack that does exactly
> what's described above.
> 
> I need to do a bit more review/research, but it looks like a promising
> alternative.

Huh, I thought I suggested that somewhere but you said the information is not
available here.

But yes, this sounds exactly what we want to do complexity-wise.  I suppose
we would need to amend the stack of tempoary equivalences to record the
basic-block we are creating the actual set of equivalences for (to be able
to terminate the walk?)


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-11-06 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #25 from Jeffrey A. Law  ---
So I think there's another approach.  invalidate_equivalences is passed in the
stack of temporary equivalences, which include those created by jump threading
as well as those created by DOM.

This means that in theory, we could walk that stack (in reverse order) to find
all the equivalences we need to invalidate.  If the stack has fewer entries
than we have SSA_NAMES, then we win.

For the stripped down testcase in this PR we go from ~40 seconds to ~1 second
in the relevant part of the compiler with a hack that does exactly what's
described above.

I need to do a bit more review/research, but it looks like a promising
alternative.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-11-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #24 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #23)
> The piece you're missing in this is that when we seen an assignment to some
> SSA_NAME, call it LHS and we've traversed a backedge, then we have to walk
> through all the equivalences and see if there's anything that's been marked
> as equivalent to LHS and invalidate those.  Then we also ahve to invalidate
> LHS.
> 
>  for (unsigned int i = 1; i < num_ssa_names; i++)
> if (ssa_name (i) && SSA_NAME_VALUE (ssa_name (i)) == lhs)
>   record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);
> 
>   if (SSA_NAME_VALUE (lhs))
> record_temporary_equivalence (lhs, NULL_TREE, stack);
> 
> 
> The loop finds handles things equivalent to LHS, the second handles LHS
> itself.  Both are required for correctness.
> 
> In the past there was a map similar to your implementation, but it's not
> sufficient.  See pr61289.C and pr61289-2.C.
> 
> The problem is you may need to invalidate equivalences that are created
> *outside* tree-ssa-threadedge.c.  So any attempts to track inside
> tree-ssa-threadedge are not sufficient for correctness.
> 
> What's still not clear to me here is *why* we're calling the invalidation
> code in the first place.  It's only supposed to be called when we encounter
> a statement which produces an output that we can't handle *and* we've
> traversed a backedge.  The combination of those two events is supposed to be
> very rare.  Otherwise the invalidation is obviously too expensive.  That's
> why I suggested in c#12 that we need to look at *why* we're calling the
> invalidation code at all.

Well, I don't know the code at all so I can just try to fix the complexity
in the present algorithm.

Please have a look here during stage3.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-11-04 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #23 from Jeffrey A. Law  ---
The piece you're missing in this is that when we seen an assignment to some
SSA_NAME, call it LHS and we've traversed a backedge, then we have to walk
through all the equivalences and see if there's anything that's been marked as
equivalent to LHS and invalidate those.  Then we also ahve to invalidate LHS.

 for (unsigned int i = 1; i < num_ssa_names; i++)
if (ssa_name (i) && SSA_NAME_VALUE (ssa_name (i)) == lhs)
  record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);

  if (SSA_NAME_VALUE (lhs))
record_temporary_equivalence (lhs, NULL_TREE, stack);


The loop finds handles things equivalent to LHS, the second handles LHS itself.
 Both are required for correctness.

In the past there was a map similar to your implementation, but it's not
sufficient.  See pr61289.C and pr61289-2.C.

The problem is you may need to invalidate equivalences that are created
*outside* tree-ssa-threadedge.c.  So any attempts to track inside
tree-ssa-threadedge are not sufficient for correctness.

What's still not clear to me here is *why* we're calling the invalidation code
in the first place.  It's only supposed to be called when we encounter a
statement which produces an output that we can't handle *and* we've traversed a
backedge.  The combination of those two events is supposed to be very rare. 
Otherwise the invalidation is obviously too expensive.  That's why I suggested
in c#12 that we need to look at *why* we're calling the invalidation code at
all.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|4.9.2   |4.9.3

--- Comment #22 from Jakub Jelinek  ---
GCC 4.9.2 has been released.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #21 from Richard Biener  ---
So to recap - apart from really fixing the quadraticness - it would be nice
if we can just disable threading over backedges at -O1, thus for
!flag_expensive_optimizations.  Especially on the 4.9 branch.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #20 from Richard Biener  ---
Created attachment 33766
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33766&action=edit
non-working patch

Of course this still walks all SSA names (but only once per BB), so it isn't
really removing the quadraticness.

Moreover the patch (see attachment) I was testing fails to bootstrap on the
4.9 branch, seemingly miscompiling stage2 gcc (and crashing in building stage2
libgcc in swap_tree_comparison called from IVOPTs).

Eventually record_temporary_equivalences_from_stmts_at_dest depends on the
exact ordering of record_temporary_equivalence calls (I do the invaldiate ones
late).  Jeff?  As this is a regression from 4.9.0 appearing in 4.9.1 it's quite
bad that this is now still unfixed given we are about to release 4.9.2.

I'm out of here again ;)


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #19 from Richard Biener  ---
Btw, are you sure all temporary equivalences are to SSA names only?  ISTR
we have memory equivalencies as well.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #18 from Richard Biener  ---
Testing that.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #17 from Richard Biener  ---
Oops, that's not 100% the same.  But

  /* Now invalidate all equivalencies we have to invalidate.  */
  for (unsigned int i = 1; i < num_ssa_names; ++i)
{
  tree name = ssa_name (i);
  if (!name)
continue;

  tree val = SSA_NAME_VALUE (name);
  if (TREE_CODE (val) == SSA_NAME
  && bitmap_bit_p (to_invalidate, SSA_NAME_VERSION (val)))
record_temporary_equivalence (name, NULL_TREE, stack);
}
  unsigned i;
  bitmap_iterator bi;
  EXECUTE_IF_SET_IN_BITMAP (to_invalidate, 0, i, bi)
{
  tree name = ssa_name (i);
  if (SSA_NAME_VALUE (ssa_name (i)))
record_temporary_equivalence (name, NULL_TREE, stack);
}

would be.


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #16 from Richard Biener  ---
Btw, why is it recording twice a temporary equivalence?  Might be simpler even
with

  /* Now invalidate all equivalencies we have to invalidate.  */
  unsigned i;
  sbitmap_iterator bi;
  EXECUTE_IF_SET_IN_BITMAP (to_invalidate, 0, i, bi)
record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);


[Bug tree-optimization/61515] [4.9/5 Regression] Extremely long compile time for generated code

2014-10-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #15 from Richard Biener  ---
(In reply to rguent...@suse.de from comment #13)
> On Tue, 17 Jun 2014, law at redhat dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515
> > 
> > Jeffrey A. Law  changed:
> > 
> >What|Removed |Added
> > 
> >  CC||law at redhat dot com
> >Assignee|unassigned at gcc dot gnu.org  |law at redhat dot 
> > com
> > 
> > --- Comment #12 from Jeffrey A. Law  ---
> > Fundamentally we don't have a way to know what equivalences we need to
> > invalidate.
> > Invalidation is, umm, painful.  The question in my mind is why are we 
> > getting
> > so many invalidations to start with.  That's the first thing to look at.
> 
> Well, it's easy to avoid the quadraticness - you can always create 
> testcases that need a lot of invalidates.  But the current algorithm
> really does not scale.
> 
> > Honestly though, I really wonder if handling backedges is worth the effort,
> > even though it's important for one benchmark.
> 
> Not sure about that, but trivial improvements to the scalability are
> possible here.  Walking all SSA names is O(number of stmts) - if you
> do that O(number of stmts) time (as you do) that's clearly the bug.

Sth like (untested)

Index: tree-ssa-threadedge.c
===
--- tree-ssa-threadedge.c   (revision 216464)
+++ tree-ssa-threadedge.c   (working copy)
@@ -287,20 +287,6 @@ fold_assignment_stmt (gimple stmt)
 }
 }

-/* A new value has been assigned to LHS.  If necessary, invalidate any
-   equivalences that are no longer valid.  */
-static void
-invalidate_equivalences (tree lhs, vec *stack)
-{
-
-  for (unsigned int i = 1; i < num_ssa_names; i++)
-if (ssa_name (i) && SSA_NAME_VALUE (ssa_name (i)) == lhs)
-  record_temporary_equivalence (ssa_name (i), NULL_TREE, stack);
-
-  if (SSA_NAME_VALUE (lhs))
-record_temporary_equivalence (lhs, NULL_TREE, stack);
-}
-
 /* Try to simplify each statement in E->dest, ultimately leading to
a simplification of the COND_EXPR at the end of E->dest.

@@ -335,6 +321,8 @@ record_temporary_equivalences_from_stmts
  we discover.  Note any equivalences we discover are context
  sensitive (ie, are dependent on traversing E) and must be unwound
  when we're finished processing E.  */
+  sbitmap to_invalidate = sbitmap_alloc (num_ssa_names);
+  bitmap_clear (to_invalidate);
   for (gsi = gsi_start_bb (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   tree cached_lhs = NULL;
@@ -379,7 +367,7 @@ record_temporary_equivalences_from_stmts

  if (backedge_seen)
FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_DEF)
- invalidate_equivalences (op, stack);
+ bitmap_set_bit (to_invalidate, SSA_NAME_VERSION (op));

  continue;
}
@@ -419,7 +407,7 @@ record_temporary_equivalences_from_stmts
  if (backedge_seen)
{
  tree lhs = gimple_get_lhs (stmt);
- invalidate_equivalences (lhs, stack);
+ bitmap_set_bit (to_invalidate, SSA_NAME_VERSION (lhs));
}
  continue;
}
@@ -497,8 +485,24 @@ record_temporary_equivalences_from_stmts
  || is_gimple_min_invariant (cached_lhs)))
record_temporary_equivalence (gimple_get_lhs (stmt), cached_lhs,
stack);
   else if (backedge_seen)
-   invalidate_equivalences (gimple_get_lhs (stmt), stack);
+   bitmap_set_bit (to_invalidate,
+   SSA_NAME_VERSION (gimple_get_lhs (stmt)));
 }
+
+  /* Now invalidate all equivalencies we have to invalidate.  */
+  unsigned i;
+  sbitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (to_invalidate, 0, i, bi)
+{
+  tree name = ssa_name (i);
+  record_temporary_equivalence (name, NULL_TREE, stack);
+
+  if (name)
+   record_temporary_equivalence (name, NULL_TREE, stack);
+}
+
+  sbitmap_free (to_invalidate);
+
   return stmt;
 }