[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-02-13 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2005-02-13 
23:14 ---
Moving to 4.1 as I don't care much if this gets fixed.

-- 
   What|Removed |Added

   Target Milestone|4.0.0   |4.1.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-27 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-27 13:18 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

With the following patch I got some speedup for depth 100.

from: 
 tree iv optimization  :   2.62 (62%) usr   0.27 (82%) sys   2.92 (62%) wall
 tree iv optimization  :   2.33 (59%) usr   0.25 (83%) sys   2.63 (54%) wall

to:
 tree iv optimization  :   1.19 (46%) usr   0.04 (40%) sys   1.26 (45%) wall
 tree iv optimization  :   1.21 (47%) usr   0.05 (45%) sys   1.30 (46%) wall

Basically we're reseting too much information each time scev_reset is
called.  It would even be possible to reset only the part of the scev
database that contains symbols instead of erasing everything.

Do somebody know how to iterate over all the elements of the hash
setting to NULL_TREE the elements that answer true to
chrec_contains_symbols?

Index: tree-scalar-evolution.c
===
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v
retrieving revision 2.16
diff -d -u -p -r2.16 tree-scalar-evolution.c
--- tree-scalar-evolution.c 18 Jan 2005 11:36:26 -  2.16
+++ tree-scalar-evolution.c 27 Jan 2005 13:12:36 -
@@ -2490,7 +2490,7 @@ scev_reset (void)
   for (i = 1; i  current_loops-num; i++)
 {
   loop = current_loops-parray[i];
-  if (loop)
+  if (loop  chrec_contains_symbols (loop-nb_iterations))
loop-nb_iterations = NULL_TREE;
 }
 }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-27 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-27 15:00 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 --- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
 2005-01-27 13:18 ---
 Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)
 
 With the following patch I got some speedup for depth 100.
 
 from: 
  tree iv optimization  :   2.62 (62%) usr   0.27 (82%) sys   2.92 (62%) wall
  tree iv optimization  :   2.33 (59%) usr   0.25 (83%) sys   2.63 (54%) wall
 
 to:
  tree iv optimization  :   1.19 (46%) usr   0.04 (40%) sys   1.26 (45%) wall
  tree iv optimization  :   1.21 (47%) usr   0.05 (45%) sys   1.30 (46%) wall
 
 Basically we're reseting too much information each time scev_reset is
 called.  It would even be possible to reset only the part of the scev
 database that contains symbols instead of erasing everything.
 
 Do somebody know how to iterate over all the elements of the hash
 setting to NULL_TREE the elements that answer true to
 chrec_contains_symbols?

the patch is below (in stronger form -- only removing entries that
contain invalidated symbols), and indeed helps with this testcase.
It however causes measurable slow down on normal code (see analysis
in one of the previous mails).

Note that your patch is not entirely correct -- in case loop is
unrolled, its number of iterations is changed, so in this case
scev_reset should clear its number of iterations unconditionally
(I think something similar occurs with vectorizer, so we need to
be a bit careful).

Index: tree-loop-linear.c
===
RCS file: /cvs/gcc/gcc/gcc/tree-loop-linear.c,v
retrieving revision 2.6
diff -c -3 -p -r2.6 tree-loop-linear.c
*** tree-loop-linear.c  18 Jan 2005 11:36:24 -  2.6
--- tree-loop-linear.c  24 Jan 2005 01:34:04 -
*** linear_transform_loops (struct loops *lo
*** 373,379 
free_data_refs (datarefs);
  }
free_df ();
!   scev_reset ();
rewrite_into_loop_closed_ssa ();
  #ifdef ENABLE_CHECKING
verify_loop_closed_ssa ();
--- 373,379 
free_data_refs (datarefs);
  }
free_df ();
!   scev_reset (NULL);
rewrite_into_loop_closed_ssa ();
  #ifdef ENABLE_CHECKING
verify_loop_closed_ssa ();
Index: tree-scalar-evolution.c
===
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v
retrieving revision 2.16
diff -c -3 -p -r2.16 tree-scalar-evolution.c
*** tree-scalar-evolution.c 18 Jan 2005 11:36:26 -  2.16
--- tree-scalar-evolution.c 24 Jan 2005 01:34:05 -
*** scev_initialize (struct loops *loops)
*** 2475,2484 
loops-parray[i]-nb_iterations = NULL_TREE;
  }
  
! /* Cleans up the information cached by the scalar evolutions analysis.  */
  
  void
! scev_reset (void)
  {
unsigned i;
struct loop *loop;
--- 2475,2539 
loops-parray[i]-nb_iterations = NULL_TREE;
  }
  
! /* Returns true if EXPR references one of the ssa names in set
!INVALIDATED_NAMES.  */
! 
! static bool
! tree_references_names (tree expr, bitmap invalidated_names)
! {
!   if (!expr)
! return false;
! 
!   if (TREE_CODE (expr) == SSA_NAME)
! return bitmap_bit_p (invalidated_names, SSA_NAME_VERSION (expr));
! 
!   switch (TREE_CODE_LENGTH (TREE_CODE (expr)))
! {
! case 4:
!   if (tree_references_names (TREE_OPERAND (expr, 3), invalidated_names))
!   return true;
! case 3:
!   if (tree_references_names (TREE_OPERAND (expr, 2), invalidated_names))
!   return true;
! case 2:
!   if (tree_references_names (TREE_OPERAND (expr, 1), invalidated_names))
!   return true;
! case 1:
!   if (tree_references_names (TREE_OPERAND (expr, 0), invalidated_names))
!   return true;
! case 0:
!   return false;
! 
! default:
!   /* Don't worry about handling strange cases.  This function is only used
!in a way in that it does not matter if we return true here.  */
!   return true;
! }
! }
! 
! /* If the SLOT in the scev cache contains ssa name belonging to the set
!passed in DATA, the function removes it from the cache.  Callback
!for htab_traverse.  */
! 
! static int
! invalidate_names_reference (void **slot, void *data)
! {
!   bitmap invalidated_names = data;
!   struct scev_info_str *elt = *slot;
! 
!   if (tree_references_names (elt-var, invalidated_names)
!   || tree_references_names (elt-chrec, invalidated_names))
! htab_clear_slot (scalar_evolution_info, slot);
! 
!   return 1;
! }
! 
! /* Cleans up the information cached by the scalar evolutions analysis.
!If INVALIDATED_NAMES is provided, only the references to invalidated
!ssa names are removed.  */
  
  void
! scev_reset (bitmap invalidated_names)
  {
unsigned i;
struct loop *loop;
*** scev_reset (void)
*** 2486,2497 
if 

[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-27 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-27 15:12 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
 the patch is below (in stronger form -- only removing entries that
 contain invalidated symbols), and indeed helps with this testcase.

Many thanks.

 It however causes measurable slow down on normal code (see analysis
 in one of the previous mails).
 

I see. 

Another idea: would it be possible to insert the invalidated names
during the optimization pass instead of invalidating all the symbols?
This would avoid the first pass over the scev database that search for
symbols.

 Note that your patch is not entirely correct -- in case loop is
 unrolled, its number of iterations is changed, so in this case
 scev_reset should clear its number of iterations unconditionally

or the unroller should do that work on the unrolled loops?



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-27 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-27 19:10 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 Another idea: would it be possible to insert the invalidated names
 during the optimization pass instead of invalidating all the symbols?
 This would avoid the first pass over the scev database that search for
 symbols.

I don't understand this proposal.

  Note that your patch is not entirely correct -- in case loop is
  unrolled, its number of iterations is changed, so in this case
  scev_reset should clear its number of iterations unconditionally
 
 or the unroller should do that work on the unrolled loops?

This seems to be the right approach.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-27 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-27 20:38 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
  Another idea: would it be possible to insert the invalidated names
  during the optimization pass instead of invalidating all the symbols?
  This would avoid the first pass over the scev database that search for
  symbols.
 
 I don't understand this proposal.
 

Anyway, I see that I was wrong: even if you mark the SSA_NAMEs that
are removed in an optimization, and that you invalidate only those
symbols, you still have to walk the scev database to ensure that there
is no other evolution that references the removed SSA_NAMEs.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-25 10:32 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
 Adding the instantiation cache was compile time neutral on normal code,
 so I don't think the effect on scev cost is significant.
 

How that?  You end up querying and caching an evolution for every
instantiation of an SSA_NAME!  

I will quantify the number of queries wrt the number of times this
information is useful.  I think that with my patch, the instantiation
cache information is used zero times on a bootstrap.

 The problem is that we end up calling the instantiate_parameters_1
 function 83214+2453273 (outside + recursive) times (for N=100).
 We also call analyze_scalar_evolution_1 1146317 times.  Many of these
 calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat).
 Large part of 5142869 of build_int_cst_wide calls is very likely also
 due to scev analysis.  This is not going to be cheap.  Removing the
 instantiation cache definitely would not decrease the number of
 instantiate_parameters_1 calls (might increase it, in fact, although
 I don't know how significantly).
 

This also could be a bad use of the scev analysis.

For Steven: you can have a O(N**3) algorithm very simply:

  loop O(N) times
loop O(N) times
  algorithm in O(N)

 One part of the problem is that loop optimizers need to throw away the
 scev caches after each change to loops, which leads to recounting the
 information too much.  Allowing to invalidate only changed parts helps,
 but seems to be relatively costly on normal code -- you need to scan the
 hash table for things that refer to removed or from some other reason
 invalidated ssa names, which is slow.  

Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time
you'll ask for the evolution of this SSA_NAME you'll analyze the
evolution from scratch.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-25 10:39 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
 
 How?  If the reference is left in symbolic form, it means that you know
 nothing about its value, so the only thing you can do with it is to
 check its equality/inequality, in code like
 
 while (...)
   {
 i = something_weird ();
 
 a[i] = ...;  (a)
 a[i+1] = ...;  (b)
 a[i] = ...;  (c)
   }
 

The following is probably a more frequent case:

i = 0
x = something_weird () or a function parameter
loop
  i = i + 1 
  a[i] = ...
  ... = a[i + x]
endloop

where you end with a symbolic distance vector.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-25 11:02 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
 
 (*) I hope; scev is a mess of mutualy recursive functions --
 analyze_scalar_evolution calling number_of_iterations calling
 instantiate_parameters calling analyze_scalar_evolution again, with
 instantiate_parameters hacked so that the infinite cycle cannot occur is
 my favourite one.  Nobody can say anything sure about behavior of scev
 -- it is not even well defined what analyze_scalar_evolutions will
 return to you, 

It returns to you an evolution that might contain SSA_NAMEs.

 unless you call instantiate_parameters or resolve_mixers
 on the result.
 

And once you call instantiate_parameters on the result you're
guaranteed that what you get does contain only determined constants,
or otherwise the result is chrec_dont_know.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 11:14 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
  Adding the instantiation cache was compile time neutral on normal code,
  so I don't think the effect on scev cost is significant.
  
 
 How that?  You end up querying and caching an evolution for every
 instantiation of an SSA_NAME!  

which is pretty cheap (constant time operation); note that we do an
equivalent lookup in the analyze_scalar_evolution call in any case, also
without causing any compile time problems.  I haven't measured any slowdown on
normal testcases.

 I will quantify the number of queries wrt the number of times this
 information is useful.  I think that with my patch, the instantiation
 cache information is used zero times on a bootstrap.

I don't think so.  Please come up with some numbers, otherwise this
discussion is pointless.

  The problem is that we end up calling the instantiate_parameters_1
  function 83214+2453273 (outside + recursive) times (for N=100).
  We also call analyze_scalar_evolution_1 1146317 times.  Many of these
  calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat).
  Large part of 5142869 of build_int_cst_wide calls is very likely also
  due to scev analysis.  This is not going to be cheap.  Removing the
  instantiation cache definitely would not decrease the number of
  instantiate_parameters_1 calls (might increase it, in fact, although
  I don't know how significantly).
  
 
 This also could be a bad use of the scev analysis.

partly -- see the analysis at the PR.  However one O(n) per query comes
from scev itself.

 For Steven: you can have a O(N**3) algorithm very simply:
 
   loop O(N) times
 loop O(N) times
   algorithm in O(N)
 
  One part of the problem is that loop optimizers need to throw away the
  scev caches after each change to loops, which leads to recounting the
  information too much.  Allowing to invalidate only changed parts helps,
  but seems to be relatively costly on normal code -- you need to scan the
  hash table for things that refer to removed or from some other reason
  invalidated ssa names, which is slow.  
 
 Just set the SSA_NAME to not_analyzed_yet or NULL_TREE, and the next time
 you'll ask for the evolution of this SSA_NAME you'll analyze the
 evolution from scratch.

This does not work, of course.  When the ssa name is removed, you must
remove all references to it from the cache.  Otherwise you will ICE when
you try to process the relevant entry in (say) instantiate_parameters.
What's worse, the ssa name may get reused by ssa name management, thus
causing you not even be able to note the change, and thus possibly to
misscompilations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 11:16 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  How?  If the reference is left in symbolic form, it means that you know
  nothing about its value, so the only thing you can do with it is to
  check its equality/inequality, in code like
  
  while (...)
{
  i = something_weird ();
  
  a[i] = ...;  (a)
  a[i+1] = ...;  (b)
  a[i] = ...;  (c)
}
  
 
 The following is probably a more frequent case:
 
 i = 0
 x = something_weird () or a function parameter
 loop
   i = i + 1 
   a[i] = ...
   ... = a[i + x]
 endloop
 
 where you end with a symbolic distance vector.

But here x is a loop invariant.  Nothing would change here if we were
keeping the evolutions fully instantiated.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 11:29 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  (*) I hope; scev is a mess of mutualy recursive functions --
  analyze_scalar_evolution calling number_of_iterations calling
  instantiate_parameters calling analyze_scalar_evolution again, with
  instantiate_parameters hacked so that the infinite cycle cannot occur is
  my favourite one.  Nobody can say anything sure about behavior of scev
  -- it is not even well defined what analyze_scalar_evolutions will
  return to you, 
 
 It returns to you an evolution that might contain SSA_NAMEs.

Hmm... if this is all you can tell, I start fear even more; this does
not define any semantics at all :-)

More seriously -- which of the possibilities?  If I have loops like

while (...)
  {
while (...)
  {
x_1 = something ();
  }
x_2 = phi (x_1);
x_3 = x_2 + 1;
  }

What will analyze_scalar_evolutions return for x_3? There are (at least)
three possible valid values:

x_3
x_2 + 1
x_1 + 1

In this example probably the last one will be chosen, but in more
complicated cases you cannot tell (without simulating what scev analysis
does).  It might be better to not export analyze_scalar_evolution at
all and to force it to be used through
instantiate_parameters/resolve_mixers, that have at least a well defined
semantics of return value -- I hope.  Till recently I believed that I
made the functions to behave reasonably in cases when values defined
inside loop are used outside of it (through chain of lcssa phi nodes),
but my attempt to remove the hacks that ensure sane behavior of the
results for ivopts causes misscompilations; I am investigating the cause
now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-25 12:03 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
 
 --- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
 dot cz  2005-01-25 11:14 ---
 Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)
 
  rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
   Adding the instantiation cache was compile time neutral on normal code,
   so I don't think the effect on scev cost is significant.
   
  
  How that?  You end up querying and caching an evolution for every
  instantiation of an SSA_NAME!  
 
 which is pretty cheap (constant time operation); note that we do an
 equivalent lookup in the analyze_scalar_evolution call in any case, also
 without causing any compile time problems.  I haven't measured any slowdown on
 normal testcases.
 
  I will quantify the number of queries wrt the number of times this
  information is useful.  I think that with my patch, the instantiation
  cache information is used zero times on a bootstrap.
 
 I don't think so.  Please come up with some numbers, otherwise this
 discussion is pointless.
 

during a bootstrap: 

instantiation cache queries: 1146452

instantiation cache uses: 247452 of which 143977 were scev_not_known,
the other were SSA_NAMEs.

this was counted with a grep|wc with the following patch: 

Index: tree-scalar-evolution.c
===
RCS file: /cvs/gcc/gcc/gcc/tree-scalar-evolution.c,v
retrieving revision 2.16
diff -d -u -p -r2.16 tree-scalar-evolution.c
--- tree-scalar-evolution.c 18 Jan 2005 11:36:26 -  2.16
+++ tree-scalar-evolution.c 25 Jan 2005 12:00:57 -
@@ -1898,8 +1898,15 @@ get_instantiated_value (htab_t cache, tr
   pattern.var = version;
   info = htab_find (cache, pattern);
 
+  fprintf (stdout, IC_query \n);
+
   if (info)
-return info-chrec;
+{
+  fprintf (stdout, IC_used_once \n);
+  print_generic_expr (stdout, info-chrec, 0);
+  fprintf (stdout, \n);
+  return info-chrec;
+}
   else
 return NULL_TREE;
 }
@@ -1915,6 +1922,8 @@ set_instantiated_value (htab_t cache, tr
   pattern.var = version;
   slot = htab_find_slot (cache, pattern, INSERT);
 
+  fprintf (stdout, IC_query \n);
+
   if (*slot)
 info = *slot;
   else
@@ -1990,7 +1999,8 @@ instantiate_parameters_1 (struct loop *l
 result again.  */
   bitmap_set_bit (already_instantiated, SSA_NAME_VERSION (chrec));
   res = analyze_scalar_evolution (def_loop, chrec);
-  res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, 
cache);
+  if  (res != chrec_dont_know)
+   res = instantiate_parameters_1 (loop, res, allow_superloop_chrecs, 
cache);
   bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec));
 
   /* Store the correct value to the cache.  */
@@ -2000,8 +2010,14 @@ instantiate_parameters_1 (struct loop *l
 case POLYNOMIAL_CHREC:
   op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec),
  allow_superloop_chrecs, cache);
+  if (op0 == chrec_dont_know)
+   return chrec_dont_know;
+
   op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec),
  allow_superloop_chrecs, cache);
+  if (op1 == chrec_dont_know)
+   return chrec_dont_know;
+
   if (CHREC_LEFT (chrec) != op0
  || CHREC_RIGHT (chrec) != op1)
chrec = build_polynomial_chrec (CHREC_VARIABLE (chrec), op0, op1);
@@ -2010,8 +2026,14 @@ instantiate_parameters_1 (struct loop *l
 case PLUS_EXPR:
   op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
  allow_superloop_chrecs, cache);
+  if (op0 == chrec_dont_know)
+   return chrec_dont_know;
+
   op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
  allow_superloop_chrecs, cache);
+  if (op1 == chrec_dont_know)
+   return chrec_dont_know;
+
   if (TREE_OPERAND (chrec, 0) != op0
  || TREE_OPERAND (chrec, 1) != op1)
chrec = chrec_fold_plus (TREE_TYPE (chrec), op0, op1);
@@ -2020,8 +2042,14 @@ instantiate_parameters_1 (struct loop *l
 case MINUS_EXPR:
   op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
  allow_superloop_chrecs, cache);
+  if (op0 == chrec_dont_know)
+   return chrec_dont_know;
+
   op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
  allow_superloop_chrecs, cache);
+  if (op1 == chrec_dont_know)
+return chrec_dont_know;
+
   if (TREE_OPERAND (chrec, 0) != op0
  || TREE_OPERAND (chrec, 1) != op1)
 chrec = chrec_fold_minus (TREE_TYPE (chrec), op0, op1);
@@ -2030,8 +2058,14 @@ instantiate_parameters_1 (struct loop *l
 

[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-25 12:44 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
 More seriously -- which of the possibilities?  If I have loops like
 
 while (...)
   {
 while (...)
   {
 x_1 = something ();
   }
 x_2 = phi (x_1);
 x_3 = x_2 + 1;
   }
 
 What will analyze_scalar_evolutions return for x_3? There are (at least)
 three possible valid values:
 
 x_3

This would be the answer if analyze_scalar_evolutions would be the
identity function.  If you want, you could change analyze_scalar_evolutions
such that it behaves like that, and decide that the instantiation do
the rest of the work (I mean moving the code that is currently in
analyze_scalar_evolutions to the instantiation phase).

 x_2 + 1

If you decide to reconstruct the tree expression, there is no reason
to stop on a phi node that has a single argument.  Why would you like
to get this answer as the reconstructed tree?

 x_1 + 1
 

IMO this would be the answer, although I didn't checked.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 15:54 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:
  More seriously -- which of the possibilities?  If I have loops like
  
  while (...)
{
  while (...)
{
  x_1 = something ();
}
  x_2 = phi (x_1);
  x_3 = x_2 + 1;
}
  
  What will analyze_scalar_evolutions return for x_3? There are (at least)
  three possible valid values:
  
  x_3
 
 This would be the answer if analyze_scalar_evolutions would be the
 identity function.  If you want, you could change analyze_scalar_evolutions
 such that it behaves like that, and decide that the instantiation do
 the rest of the work (I mean moving the code that is currently in
 analyze_scalar_evolutions to the instantiation phase).
 
  x_2 + 1
 
 If you decide to reconstruct the tree expression, there is no reason
 to stop on a phi node that has a single argument.  Why would you like
 to get this answer as the reconstructed tree?

because this answer preserves loop closed ssa form -- the answer x_1 + 1
copy propagates the value outsied of the loop.  In some applications
this choice could make sense.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-25 16:26 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  If you decide to reconstruct the tree expression, there is no reason
  to stop on a phi node that has a single argument.  Why would you like
  to get this answer as the reconstructed tree?
 
 because this answer preserves loop closed ssa form -- the answer x_1 + 1
 copy propagates the value outsied of the loop.  In some applications
 this choice could make sense.
 

Okay, if it makes sense, you could modify the analyzer such that it
stops reconstructing the tree once it is on the phi following a loop.
This won't change the information provided after the instantiation.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-25 Thread pinskia at gcc dot gnu dot org


-- 
   What|Removed |Added

   Severity|normal  |minor
   Priority|P2  |P3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread sebastian dot pop at cri dot ensmp dot fr

--- Additional Comments From sebastian dot pop at cri dot ensmp dot fr  
2005-01-24 21:36 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 
 Still there remain some inefficiences within the scev analysis itself.
 

Zdenek, have you tried to revert the patch that caches the
instantiation information?  This could speed up things a little.  

On a side note, I wonder how many times we're using this instantiation
cache, in other words whether we pay a high price for the scev
analysis for some (probably) rare cases...



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-24 22:16 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  Still there remain some inefficiences within the scev analysis itself.
  
 
 Zdenek, have you tried to revert the patch that caches the
 instantiation information?  This could speed up things a little.  

 On a side note, I wonder how many times we're using this instantiation
 cache, in other words whether we pay a high price for the scev
 analysis for some (probably) rare cases...

Adding the instantiation cache was compile time neutral on normal code,
so I don't think the effect on scev cost is significant.

The problem is that we end up calling the instantiate_parameters_1
function 83214+2453273 (outside + recursive) times (for N=100).
We also call analyze_scalar_evolution_1 1146317 times.  Many of these
calls create POLYNOMIAL_CHREC nodes (2894131 calls to build3_stat).
Large part of 5142869 of build_int_cst_wide calls is very likely also
due to scev analysis.  This is not going to be cheap.  Removing the
instantiation cache definitely would not decrease the number of
instantiate_parameters_1 calls (might increase it, in fact, although
I don't know how significantly).

One part of the problem is that loop optimizers need to throw away the
scev caches after each change to loops, which leads to recounting the
information too much.  Allowing to invalidate only changed parts helps,
but seems to be relatively costly on normal code -- you need to scan the
hash table for things that refer to removed or from some other reason
invalidated ssa names, which is slow.  And to make things worse, this
change of course means that most of the information is left in the hash
table, i.e. the size of the table grows and these scans get slower and
slower.  The alternative -- checking the validity of entries only when
querying the cache -- is not possible, since we reuse the released
ssa names, so we would be unable to recognize the validity of the
information.

Other part is that scev tries to be too clever.  Without need to
represent nonaffine induction variables (that we do not use anywhere)
we could use more memory efficient representation of ivs.
Without need to handle symbolic references (that we also do not use
anywhere, we could store evolutions in a fully instantiated form, and
we would not need instantiate_parameters/resolve_mixers functions at all.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-24 
22:25 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)


 Other part is that scev tries to be too clever.  Without need to
 represent nonaffine induction variables (that we do not use anywhere)
 we could use more memory efficient representation of ivs.
 Without need to handle symbolic references (that we also do not use
 anywhere, we could store evolutions in a fully instantiated form, and
 we would not need instantiate_parameters/resolve_mixers functions atall.

Uh, symbolic references are or will be used to do data dependence when 
MEM_REF and ARRAY_REF couldn't be generated from the pointers.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-24 23:09 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  Other part is that scev tries to be too clever.  Without need to
  represent nonaffine induction variables (that we do not use anywhere)
  we could use more memory efficient representation of ivs.
  Without need to handle symbolic references (that we also do not use
  anywhere, we could store evolutions in a fully instantiated form, and
  we would not need instantiate_parameters/resolve_mixers functions atall.
 
 Uh, symbolic references are or will be used to do data dependence when 
 MEM_REF and ARRAY_REF couldn't be generated from the pointers.

How?  If the reference is left in symbolic form, it means that you know
nothing about its value, so the only thing you can do with it is to
check its equality/inequality, in code like

while (...)
  {
i = something_weird ();

a[i] = ...;  (a)
a[i+1] = ...;  (b)
a[i] = ...;  (c)
  }

to find that (b) is independent on (a) and (c) and to find the
dependence between (a) and (c), but you do not need scev for this --
value numbering info is enough.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-24 
23:12 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



On Mon, 24 Jan 2005, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:


 --- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
 dot cz  2005-01-24 23:09 ---
 Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 Other part is that scev tries to be too clever.  Without need to
 represent nonaffine induction variables (that we do not use anywhere)
 we could use more memory efficient representation of ivs.
 Without need to handle symbolic references (that we also do not use
 anywhere, we could store evolutions in a fully instantiated form, and
 we would not need instantiate_parameters/resolve_mixers functions atall.

 Uh, symbolic references are or will be used to do data dependence when
 MEM_REF and ARRAY_REF couldn't be generated from the pointers.

 How?  If the reference is left in symbolic form, it means that you know
 nothing about its value,

Wrong.

We could know things about it's value, such as ranges. We just don't know 
an exact number.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 00:27 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  Other part is that scev tries to be too clever.  Without need to
  represent nonaffine induction variables (that we do not use anywhere)
  we could use more memory efficient representation of ivs.
  Without need to handle symbolic references (that we also do not use
  anywhere, we could store evolutions in a fully instantiated form, and
  we would not need instantiate_parameters/resolve_mixers functions atall.
 
  Uh, symbolic references are or will be used to do data dependence when
  MEM_REF and ARRAY_REF couldn't be generated from the pointers.
 
  How?  If the reference is left in symbolic form, it means that you know
  nothing about its value,
 
 Wrong.
 
 We could know things about it's value, such as ranges. We just don't know 
 an exact number.

OK.  This is work for VRP, not scev.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-25 
00:39 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



 Uh, symbolic references are or will be used to do data dependence when
 MEM_REF and ARRAY_REF couldn't be generated from the pointers.

 How?  If the reference is left in symbolic form, it means that you know
 nothing about its value,

 Wrong.

 We could know things about it's value, such as ranges. We just don't know
 an exact number.

 OK.  This is work for VRP, not scev.

And once we have it, we can talk about removing the symbolic stuff from 
SCEV :)



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 00:49 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  Uh, symbolic references are or will be used to do data dependence when
  MEM_REF and ARRAY_REF couldn't be generated from the pointers.
 
  How?  If the reference is left in symbolic form, it means that you know
  nothing about its value,
 
  Wrong.
 
  We could know things about it's value, such as ranges. We just don't know
  an exact number.
 
  OK.  This is work for VRP, not scev.
 
 And once we have it, we can talk about removing the symbolic stuff from 
 SCEV :)

Once we use any value range info from SCEV, we can speak about leaving
symbolic references in SCEV until we have a proper VRP :-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-25 
00:50 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



On Mon, 25 Jan 2005, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote:


 --- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
 dot cz  2005-01-25 00:49 ---
 Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 Uh, symbolic references are or will be used to do data dependence when
 MEM_REF and ARRAY_REF couldn't be generated from the pointers.

 How?  If the reference is left in symbolic form, it means that you know
 nothing about its value,

 Wrong.

 We could know things about it's value, such as ranges. We just don't know
 an exact number.

 OK.  This is work for VRP, not scev.

 And once we have it, we can talk about removing the symbolic stuff from
 SCEV :)

 Once we use any value range info from SCEV, we can speak about leaving
 symbolic references in SCEV until we have a proper VRP :-)
See autovec branch.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-25 
00:52 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 See autovec branch.


You could also look at recent patches posted by sebastian and i for the 
autovect branch that have been adding this support.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-25 
01:11 ---
Which one? I cannot find anything relevant in changelog.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-25 
01:15 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)


 Which one? I cannot find anything relevant in changelog.



 * tree-data-ref.c (analyze_subscript_affine_affine): Implement a
 solution for the FIXME concerning the symbolic affine
 dependence testing; remove the FIXME.
 (can_use_analyze_subscript_affine_affine): New function.
 (analyze_siv_subscript): Use it.

and


2004-12-15  Daniel Berlin  [EMAIL PROTECTED]

 * Makefile.in (tree-chrec.o): Add cfgloop.h

 * tree-chrec.c: Add cfgloop.h, tree-flow.h.
 (evolution_function_is_invariant_p): New function.
 (evolution_function_is_affine_multivariate_p): Use
 evolution_function_is_invariant_p instead of
 evolution_function_is_constant_p.

 * tree-chrec.h: Add prototype for
 evolution_function_is_invariant_p.
 (evolution_function_is_affine_p): Use
 evolution_function_is_invariant_p.

 * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
 are equal overlap on every iteration.

This stuff is just simple symbolic differencing, and checking of 
invariantness of symbols,
But it is indeed starting to se the symbolic scev info



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 01:24 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

  Which one? I cannot find anything relevant in changelog.
 
 
 
  * tree-data-ref.c (analyze_subscript_affine_affine): Implement a
  solution for the FIXME concerning the symbolic affine
  dependence testing; remove the FIXME.
  (can_use_analyze_subscript_affine_affine): New function.
  (analyze_siv_subscript): Use it.
 
 and
 
 
 2004-12-15  Daniel Berlin  [EMAIL PROTECTED]
 
  * Makefile.in (tree-chrec.o): Add cfgloop.h
 
  * tree-chrec.c: Add cfgloop.h, tree-flow.h.
  (evolution_function_is_invariant_p): New function.
  (evolution_function_is_affine_multivariate_p): Use
  evolution_function_is_invariant_p instead of
  evolution_function_is_constant_p.
 
  * tree-chrec.h: Add prototype for
  evolution_function_is_invariant_p.
  (evolution_function_is_affine_p): Use
  evolution_function_is_invariant_p.
 
  * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
  are equal overlap on every iteration.
 
 This stuff is just simple symbolic differencing, and checking of 
 invariantness of symbols,
 But it is indeed starting to se the symbolic scev info

Ugh... OK, if you think it is a right way...  Anyway, I am seriously
considering resurrecting the simple iv analysis for purposes of passes
that are not interested in this fancy stuff.  Overhead of scev is
probably acceptable if it is only used for this type of analysis,
but for other purposes it is clearly overkill.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-25 
01:28 ---
Do remember that this bug is about bad behavior with deeply nested loops 
and we already have other algorithms that are quadratic in the loop nest 
depth.  So I really wouldn't consider this to be a very serious problem, 
but rather something that could be improved. 
 
It is a shame that Seb has so far not commented on the behavior of his 
scev algorithm with respect to the loop nest depth.  Is it expected to 
be cubic in the loop nest depth?  Perhaps he can improve the algorithm? 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-25 
01:42 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)


  * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
  are equal overlap on every iteration.

 This stuff is just simple symbolic differencing, and checking of
 invariantness of symbols,
 But it is indeed starting to se the symbolic scev info

 Ugh... OK, if you think it is a right way...  Anyway, I am seriously
 considering resurrecting the simple iv analysis for purposes of passes
 that are not interested in this fancy stuff.  Overhead of scev is
 probably acceptable if it is only used for this type of analysis,
 but for other purposes it is clearly overkill.

Uh, almost all high level loop optimizations are going to do very badly in 
the depth of the loop nest.

The fact that scev doesn't do so well on a 100 deep loop nest doesn't 
really concern me at all.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 01:52 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 Do remember that this bug is about bad behavior with deeply nested loops 
 and we already have other algorithms that are quadratic in the loop nest 
 depth.  So I really wouldn't consider this to be a very serious problem, 
 but rather something that could be improved. 
  
 It is a shame that Seb has so far not commented on the behavior of his 
 scev algorithm with respect to the loop nest depth.  Is it expected to 
 be cubic in the loop nest depth?  Perhaps he can improve the algorithm? 

the algorithm itself is not cubic with loop depth, but worst case linear
(per query) (*). This time complexity is acceptable, IMHO.

(*) I hope; scev is a mess of mutualy recursive functions --
analyze_scalar_evolution calling number_of_iterations calling
instantiate_parameters calling analyze_scalar_evolution again, with
instantiate_parameters hacked so that the infinite cycle cannot occur is
my favourite one.  Nobody can say anything sure about behavior of scev
-- it is not even well defined what analyze_scalar_evolutions will
return to you, unless you call instantiate_parameters or resolve_mixers
on the result.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-25 01:57 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

   * tree-data-ref.c (analyze_overlapping_iterations): chrecs that
   are equal overlap on every iteration.
 
  This stuff is just simple symbolic differencing, and checking of
  invariantness of symbols,
  But it is indeed starting to se the symbolic scev info
 
  Ugh... OK, if you think it is a right way...  Anyway, I am seriously
  considering resurrecting the simple iv analysis for purposes of passes
  that are not interested in this fancy stuff.  Overhead of scev is
  probably acceptable if it is only used for this type of analysis,
  but for other purposes it is clearly overkill.
 
 Uh, almost all high level loop optimizations are going to do very badly in 
 the depth of the loop nest.

Remove almost and high level from this sentence and I will agree
with you.  This really does not worry me that much.

 The fact that scev doesn't do so well on a 100 deep loop nest doesn't 
 really concern me at all.

Me neither.  Its time and memory consumption on normal testcases however
is not entirely satisfactory (not critical, but there definitely is a
place for improvement), and implementation could (should?) also be
significantly cleaned up.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-23 
14:18 ---
It's definitely not the bitmaps: 
 time   seconds   secondscalls   s/call   s/call  name 
 10.28 25.1425.14 24356740 0.00 0.00  ggc_alloc_stat 
  5.39 38.3213.18  3133403 0.00 0.00  instantiate_parameters_1 
  5.25 51.1512.83 34260186 0.00 0.00  htab_find_slot_with_hash 
  4.41 61.9410.79 40175413 0.00 0.00  build_int_cst_wide 
  4.35 72.5810.64  1139473 0.00 0.00  chrec_convert 
  3.79 81.86 9.28 1810 0.01 0.01  htab_empty 
  3.63 90.73 8.87 22863854 0.00 0.00  build3_stat 
  3.41 99.07 8.34  800 0.01 0.01  for_each_insn_in_loop 
  2.74105.76 6.69   415264 0.00 0.00  follow_ssa_edge 
  2.64112.21 6.45  3085512 0.00 0.00  
find_interesting_uses_stmt 
  2.14117.45 5.24  3197505 0.00 0.00  record_invariant 
  2.13122.66 5.21  9695984 0.00 0.00  
analyze_scalar_evolution_1 
  1.65126.70 4.04 11769469 0.00 0.00  comptypes 
  1.64130.70 4.00 11710741 0.00 0.00  fold_convert_const 
  1.63134.68 3.98 23551936 0.00 0.00  make_node_stat 
  1.44138.19 3.51 11464308 0.00 0.00  force_fit_type 
  1.28141.31 3.12 12009984 0.00 0.00  fold_convert 
  1.25144.37 3.06 21935414 0.00 0.00  eq_scev_info 
  1.18147.26 2.89  9695984 0.00 0.00  analyze_scalar_evolution 
  1.09149.93 2.67   326865 0.00 0.00  chrec_fold_plus_1 
  1.01152.41 2.48  200 0.01 0.94  
tree_ssa_iv_optimize_loop 
  1.01154.88 2.47 29473273 0.00 0.00  is_gimple_min_invariant 
  0.99157.30 2.43 26291603 0.00 0.00  flow_bb_inside_loop_p 
  0.98159.70 2.40 22605936 0.00 0.00  flow_loop_nested_p 
  0.95162.03 2.33  3135046 0.00 0.00  htab_delete 
  0.95164.35 2.32   164459 0.00 0.00  htab_expand 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-23 
14:23 ---
Looks to me like Seb may want to look at this bug too... 

-- 
   What|Removed |Added

 CC||spop at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-23 
14:24 ---
One more, because I also run out of memory pretty quickly: 
--- 
0.000.00   1/22863854 c_build_bind_expr [1608] 
0.000.00   1/22863854 thread_across_edge [392] 
0.000.00 199/22863854 
estimate_numbers_of_iterations [184] 
0.000.00 200/22863854 c_finish_loop [754] 
0.000.019801/22863854 follow_ssa_edge_in_rhs 
cycle 2 [38] 
0.020.06   50602/22863854 add_to_evolution [324] 
0.100.32  247913/22863854 simple_iv cycle 2 [35] 
0.170.55  428920/22863854 analyze_scalar_evolution_1 
cycle 2 [12] 
2.197.24 5648016/22863854 chrec_fold_plus_1 [24] 
2.257.43 5793593/22863854 instantiate_parameters_1 
cycle 2 [11] 
4.15   13.69 10684608/22863854 chrec_convert [10] 
[13]15.68.87   29.30 22863854 build3_stat [13] 
3.86   25.44 22863854/23551936 make_node_stat [15] 
--- 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-23 
14:22 ---
More profile data: 
 
--- 
0.00  187.04   1/1   execute_pass_list [6] 
[7] 76.50.00  187.04   1 tree_ssa_iv_optimize [7] 
2.48  184.56 200/200 tree_ssa_iv_optimize_loop [8] 
0.000.00   1/201 free_loop_data [349] 
0.000.00   2/30137   bitmap_obstack_free [178] 
0.000.00 201/7042572 xcalloc [127] 
0.000.00   3/2369varray_init [1063] 
0.000.00   2/36950   bitmap_obstack_alloc [794] 
--- 
2.48  184.56 200/200 tree_ssa_iv_optimize [7] 
[8] 76.52.48  184.56 200 tree_ssa_iv_optimize_loop [8] 
   30.61  134.21  309011/311017  simple_iv cycle 2 [35] 
6.459.73 3085512/3085512 find_interesting_uses_stmt 
[25] 
0.041.03 200/202 scev_reset [93] 
0.000.76 200/200 determine_use_iv_costs [107] 
0.000.50 200/200 rewrite_uses [142] 
0.000.35 200/202 loop_commit_inserts [181] 
0.110.11   20100/20100   find_interesting_uses_cond 
[241] 
0.020.09 200/311017  number_of_iterations_exit 
cycle 2 [41] 
0.060.04 200/201 free_loop_data [349] 
0.000.09 200/200 create_new_ivs [360] 
0.000.08 200/401 get_loop_body_in_dom_order 
[274] 
0.000.06 200/2601get_loop_body [111] 
0.050.00 200/200 remove_unused_ivs [440] 
0.020.03   19703/99904   
find_interesting_uses_outer_or_nonlin [223] 
0.000.042004/2804add_candidate [413] 
0.000.02 200/200 find_optimal_iv_set [609] 
0.000.02   20507/40407   set_iv [503] 
0.000.02 800/800 add_iv_value_candidates [689] 
0.010.01   80200/26291603 flow_bb_inside_loop_p [44] 
0.010.00 800/30137   bitmap_obstack_free [178] 
0.000.001006/5012force_var_cost [590] 
0.000.00 200/200 iv_ca_free [1067] 
0.000.00 201/3005add_candidate_1 [530] 
0.000.00   20708/370258  zero_p [517] 
0.000.001399/3525413 get_iv [69] 
0.000.00 402/12009984 fold_convert [17] 
0.000.001198/3329551 is_gimple_reg [71] 
0.000.001202/40175413 build_int_cst_wide [26] 
0.000.00 200/200 single_dom_exit [1344] 
0.000.00 402/132285  find_edge [436] 
0.000.001003/8376514 bitmap_set_bit [92] 
0.000.00   20507/20507   contains_abnormal_ssa_name_p 
[1427] 
0.000.00 201/57945   int_cst_value [804] 
0.000.001202/22889750 build_int_cst [141] 
0.000.001006/5034add_cost [1463] 
0.000.001007/22165   cst_and_fits_in_hwi [1748] 
0.000.00 402/1401loop_latch_edge [2001] 
0.000.00 201/1597loop_preheader_edge [1984] 
--- 
[9] 67.8   30.81  135.08  311017+24191806 cycle 2 as a whole [9] 
   13.18   43.06 3133403+18788021 instantiate_parameters_1 
cycle 2 [11] 
5.21   44.15 9695984 analyze_scalar_evolution_1 
cycle 2 [12] 
0.22   20.27  703698 interpret_rhs_modify_expr 
cycle 2 [22] 
2.897.22 9695984 analyze_scalar_evolution 
cycle 2 [32] 
0.576.74  313469+81390   follow_ssa_edge_in_rhs cycle 
2 [38] 
6.690.56  415264+1352605 follow_ssa_edge cycle 2 
[39] 
0.224.74   21906 number_of_iterations_exit 
cycle 2 [41] 
0.051.16   30495 number_of_iterations_in_loop 
cycle 2 [86] 
0.040.07   43784 instantiate_parameters cycle 
2 [347] 
0.050.01   65718 
simplify_using_outer_evolutions cycle 2 [441] 
0.050.00   30295 
compute_overall_effect_of_inner_loop cycle 2 [459] 
--- 
 22088230 chrec_convert [10] 
   

Re: [Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread Daniel Berlin
I believe seb/zdenek already submitted patches for speeding up scev quite 
recently, with the goal of alleviating this problem.
I'm pretty sure they have not been applied yet.




[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-23 
15:01 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

I believe seb/zdenek already submitted patches for speeding up scev quite 
recently, with the goal of alleviating this problem.
I'm pretty sure they have not been applied yet.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-23 15:06 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 I believe seb/zdenek already submitted patches for speeding up scev quite 
 recently, with the goal of alleviating this problem.
 I'm pretty sure they have not been applied yet.

the Sebastian's patch is not in yet; it might help a bit in this case, a
believe, although I suspect a real source of the problem is elsewhere.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-23 
16:38 ---
Sebastian's patch does not help significantly :-(

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-23 
17:14 ---
Also did not help much.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-23 
17:04 ---
One extra N factor seems to be coming from simple_iv
(analyze_scalar_evolution_in_loop).  The function was created before
loop closed ssa form, under assumptions of lcssa it should be possible
to get rid of this.  I am working on patch

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-24 
01:40 ---
The patch that causes ivopts to reset just the relevant parts of the scev cache
(just regtesting) instead of clearing it completely helps a bit -- the compile
time for N=100 gets about 2x better and memory consumption about 10x.

Still there remain some inefficiences within the scev analysis itself.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-24 
01:46 ---
On a side note, PRE also seems to have problems with the testcase.  With the
patch mentioned above, the largest consumers of compile time are ivopts (45%)
and pre (20%).

-- 
   What|Removed |Added

 CC||dberlin at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


Re: [Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread Daniel Berlin

On Sun, 24 Jan 2005, rakdver at gcc dot gnu dot org wrote:
--- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-24 
01:46 ---
On a side note, PRE also seems to have problems with the testcase.  With the
patch mentioned above, the largest consumers of compile time are ivopts (45%)
and pre (20%).
Uh, there was a bug filed about this, and i fixed it, last i looked.


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-24 
01:49 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)



On Sun, 24 Jan 2005, rakdver at gcc dot gnu dot org wrote:


 --- Additional Comments From rakdver at gcc dot gnu dot org  2005-01-24 
 01:46 ---
 On a side note, PRE also seems to have problems with the testcase.  With the
 patch mentioned above, the largest consumers of compile time are ivopts (45%)
 and pre (20%).

Uh, there was a bug filed about this, and i fixed it, last i looked.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2005-01-23 Thread dberlin at dberlin dot org

--- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-24 
01:57 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 On a side note, PRE also seems to have problems with the testcase.  With the
 patch mentioned above, the largest consumers of compile time are ivopts (45%)
 and pre (20%).

 Uh, there was a bug filed about this, and i fixed it, last i looked.

Here it is.
Bug 18673.

N=100 takes .25 seconds now.

I can't make PRE much faster than that, that's almost all hash function 
time.
--Dan


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-12-22 Thread belyshev at depni dot sinp dot msu dot ru

--- Additional Comments From belyshev at depni dot sinp dot msu dot ru  
2004-12-22 23:40 ---
It seems that extra O(N) factor comes from high memory usage. For smaller N
(0..30) compiler behaves like O(2) and for larger ( 50) like O(2.5 .. 3).

gnuplot fit [0:30] A*x**k+B data via A,k,B
[snip]
Final set of parametersAsymptotic Standard Error
=====

A   = 0.000465461  +/- 2.625e-05(5.64%)
k   = 1.90612  +/- 0.01652  (0.8667%)
B   = 0.0328722+/- 0.0007677(2.335%)

gnuplot fit [50:175] A*x**k+B data via A,k,B
[snip]
Final set of parametersAsymptotic Standard Error
=====

A   = 4.76329e-06  +/- 3.493e-07(7.332%)
k   = 3.0033   +/- 0.01405  (0.4678%)
B   = 0.392746 +/- 0.02972  (7.567%)

For N=175 memory usage on my machine exceedes 600 MB

-- 
   What|Removed |Added

   Last reconfirmed|2004-11-25 11:16:30 |2004-12-22 23:40:41
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-12-22 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-12-23 
01:26 ---
hmmm maybe the extra O(N) comes from O(N) bitmap operations? (Just guessing) 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-12-22 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2004-12-23 01:29 ---
Subject: Re:  [4.0 Regression] IV-OPTS is O(N^3)

 hmmm maybe the extra O(N) comes from O(N) bitmap operations? (Just guessing) 

that might be the case, but I don't think it is likely (also just
guessing :-)  As far as I can tell all bitmaps in ivopts should be small
for this testcase.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-11-28 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-11-28 
18:42 ---
IVOPTs should behave quadratically on this type of tests; I do not know where
does the extra O(N) factor come from, and I would not expect the times to grow
this fast.

-- 
   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |rakdver at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-11-27 Thread neroden at gcc dot gnu dot org


-- 
   What|Removed |Added

OtherBugsDependingO||18693
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-11-26 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-11-27 
01:10 ---
Of course it's not very useful to claim that some algorithm
is O(N^x) when you don't say what N is...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-11-26 Thread belyshev at lubercy dot com

--- Additional Comments From belyshev at lubercy dot com  2004-11-27 01:45 
---
(In reply to comment #3)
 Of course it's not very useful to claim that some algorithm
 is O(N^x) when you don't say what N is...

N is number of nested loops.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


[Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)

2004-11-25 Thread belyshev at lubercy dot com

--- Additional Comments From belyshev at lubercy dot com  2004-11-25 11:16 
---
use this awk script to generate testcase (first arg is number of loops):

BEGIN {
ORS=
print int f ()\n{\tint 
for (j = 0; j  ARGV [1]; j++)
print j j , 
print a;\n\ta = 0;\n
print \tfor (j0 = 0; j0  2; j0 ++)\n
for (j = 1; j  ARGV [1]; j++)
print \tfor (j j  = j j-1 ; j j   2; j j ++)\n
print \ta += 
for (j = 0; j  ARGV [1]-1; j++)
print j j  + 
print j j ;\n\treturn a;\n}\n
}


N loops   Time, s
 50.05
100.17
150.38
201.14
252.81
304.68
357.52
4013.6
4521.8
5025.6
5533.9


-- 
   What|Removed |Added

   Severity|minor   |normal
 Status|UNCONFIRMED |NEW
 Ever Confirmed||1
   Keywords||memory-hog
  Known to fail||4.0.0
   Last reconfirmed|-00-00 00:00:00 |2004-11-25 11:16:30
   date||
Summary|IV-OPTS is slow (and does   |[4.0 Regression] IV-OPTS is
   |not scale)  |O(N^3)
   Target Milestone|--- |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595