Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-20 Thread Richard Biener
On Wed, 19 Sep 2018, Richard Biener wrote:

> 
> The second testcase in the above PR runs into our O(N) bitmap element
> search limitation and spends 8s (60%) of the compile-time in the SSA 
> propagator
> engine (when optimizing).  The patch improves that to 0.9s (15%).  For the 
> first testcase it isn't that bad but still the patch improves CCP from 36% to 
> 14%.
> 
> The "solution" is to use sbitmap instead of a bitmap to avoid
> the linearity when doing add_ssa_edge.  We pay for that (but not
> actually with the testcases) with a linear-time bitmap_first_set_bit
> in process_ssa_edge_worklist.  I do not (yet...) have a testcase
> that overall gets slower with this approach.  I suppose using
> std::set would "solve" the complexity issue but we'd pay
> back with horribly inefficient memory use.  Similarly with
> our sparse bitmap implementation which lacks an ordered
> first_set_bit (it only can get any set bit fast, breaking optimal
> iteration order).
> 
> If we'd only had a O(log n) search sparse bitmap implementation ...
> (Steven posted patches to switch bitmap from/to such one but IIRC
> that at least lacked bitmap_first_set_bit).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

So it turns out that while the bitmap data structure isn't optimal
the issue with the testcase is that we end up with a full-set-universe
for the SSA worklist mostly because we are queueing PHIs via uses
on backedges that are not yet executable (so we'd re-simulate the PHI
anyway once that became excutable - no need to set the individual bits).

So I'm testing the following instead.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-09-20  Richard Biener  

PR tree-optimization/63155
* tree-ssa-propagate.c (add_ssa_edge): Avoid adding PHIs to
the worklist when the edge of the respective argument isn't
executable.

Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 264438)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -168,10 +168,18 @@ add_ssa_edge (tree var)
   FOR_EACH_IMM_USE_FAST (use_p, iter, var)
 {
   gimple *use_stmt = USE_STMT (use_p);
+  basic_block use_bb = gimple_bb (use_stmt);
 
   /* If we did not yet simulate the block wait for this to happen
  and do not add the stmt to the SSA edge worklist.  */
-  if (! (gimple_bb (use_stmt)->flags & BB_VISITED))
+  if (! (use_bb->flags & BB_VISITED))
+   continue;
+
+  /* If this is a use on a not yet executable edge do not bother to
+queue it.  */
+  if (gimple_code (use_stmt) == GIMPLE_PHI
+ && !(EDGE_PRED (use_bb, PHI_ARG_INDEX_FROM_USE (use_p))->flags
+  & EDGE_EXECUTABLE))
continue;
 
   if (prop_simulate_again_p (use_stmt)


Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-20 Thread Richard Biener
On Wed, 19 Sep 2018, Steven Bosscher wrote:

> On Wed, Sep 19, 2018 at 3:06 PM Richard Biener wrote:
> > If we'd only had a O(log n) search sparse bitmap implementation ...
> > (Steven posted patches to switch bitmap from/to such one but IIRC
> > that at least lacked bitmap_first_set_bit).
> 
> But bitmap_first_set_bit would be easy to implement. Just take the
> left-most node of the splay tree.
> 
> Actually all bit-tests would be easy to implement. It's only
> enumeration and set operations on the tree-views that would be
> complicated fluff (easier to "switch views" than re-implement).
> 
> Impressive that you remember >5yr old patches like that ;-)

;)

Well, it's still useful (but obviously doesn't apply).  Not sure
iff the worst-case behavior of splay-trees makes it a pointless
exercise though ;)

Richard.


Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-19 Thread Steven Bosscher
On Wed, Sep 19, 2018 at 3:06 PM Richard Biener wrote:
> If we'd only had a O(log n) search sparse bitmap implementation ...
> (Steven posted patches to switch bitmap from/to such one but IIRC
> that at least lacked bitmap_first_set_bit).

But bitmap_first_set_bit would be easy to implement. Just take the
left-most node of the splay tree.

Actually all bit-tests would be easy to implement. It's only
enumeration and set operations on the tree-views that would be
complicated fluff (easier to "switch views" than re-implement).

Impressive that you remember >5yr old patches like that ;-)

Ciao!
Steven


Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-19 Thread Richard Biener
On Wed, 19 Sep 2018, Jakub Jelinek wrote:

> On Wed, Sep 19, 2018 at 03:06:30PM +0200, Richard Biener wrote:
> > 
> > The second testcase in the above PR runs into our O(N) bitmap element
> > search limitation and spends 8s (60%) of the compile-time in the SSA 
> > propagator
> > engine (when optimizing).  The patch improves that to 0.9s (15%).  For the 
> > first testcase it isn't that bad but still the patch improves CCP from 36% 
> > to 
> > 14%.
> > 
> > The "solution" is to use sbitmap instead of a bitmap to avoid
> > the linearity when doing add_ssa_edge.  We pay for that (but not
> > actually with the testcases) with a linear-time bitmap_first_set_bit
> > in process_ssa_edge_worklist.  I do not (yet...) have a testcase
> 
> Perhaps you could remember the first set bit number in an integral variable
> next to the bitmap.  On bitmap_set_bit this would be just a comparison +
> conditional update, in simulate_stmt it would be again conditional on
> clearing the first set bit and in that case it could start from that bit
> onwards rather than from the beginning (EXECUTE_IF_SET_IN_BITMAP), similarly
> in process_ssa_edge_worklist (non-conditional in that case, we are always
> clearing the first bit).  Or instead of tracking exact first bit track
> it lazily, i.e. just remember the smallest bitnum we've set and in
> process_ssa_edge_worklist don't use bitmap_first_set_bit, but walk from
> that minimum using EXECUTE_IF_SET_IN_BITMAP and update the minimum to
> whatever we've found.

I thought about that but as you say it's easily invalidated and we can
only optimize the searching start point.  So it felt somewhat like
a hack ;)

I can still do that if you think that otherwise using an sbitmap
is fine (I'm not 100% convinced about that yet).

> Similarly, empty_p can be done linearly by keeping on the side the count of
> set bits and updating it on set_bit when previously not set, as well on
> clear_bit.

I elided empty_p by re-using the fact that bitmap_first_set_bit
already has to do the walk (and may fail).  Of course using a 
on-the-side count might be more optimal (I was wondering why
sbitmap itself didn't have that).

I wonder if reviving Stevens [patch][RFC] bitmaps as lists *or* trees
makes more sense so we can turn this kind of random-access bitmaps
to tree view.

Richard.


Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-19 Thread Jakub Jelinek
On Wed, Sep 19, 2018 at 03:06:30PM +0200, Richard Biener wrote:
> 
> The second testcase in the above PR runs into our O(N) bitmap element
> search limitation and spends 8s (60%) of the compile-time in the SSA 
> propagator
> engine (when optimizing).  The patch improves that to 0.9s (15%).  For the 
> first testcase it isn't that bad but still the patch improves CCP from 36% to 
> 14%.
> 
> The "solution" is to use sbitmap instead of a bitmap to avoid
> the linearity when doing add_ssa_edge.  We pay for that (but not
> actually with the testcases) with a linear-time bitmap_first_set_bit
> in process_ssa_edge_worklist.  I do not (yet...) have a testcase

Perhaps you could remember the first set bit number in an integral variable
next to the bitmap.  On bitmap_set_bit this would be just a comparison +
conditional update, in simulate_stmt it would be again conditional on
clearing the first set bit and in that case it could start from that bit
onwards rather than from the beginning (EXECUTE_IF_SET_IN_BITMAP), similarly
in process_ssa_edge_worklist (non-conditional in that case, we are always
clearing the first bit).  Or instead of tracking exact first bit track
it lazily, i.e. just remember the smallest bitnum we've set and in
process_ssa_edge_worklist don't use bitmap_first_set_bit, but walk from
that minimum using EXECUTE_IF_SET_IN_BITMAP and update the minimum to
whatever we've found.

Similarly, empty_p can be done linearly by keeping on the side the count of
set bits and updating it on set_bit when previously not set, as well on
clear_bit.

Jakub


[PATCH][RFC] Fix PR63155 (some more)

2018-09-19 Thread Richard Biener


The second testcase in the above PR runs into our O(N) bitmap element
search limitation and spends 8s (60%) of the compile-time in the SSA propagator
engine (when optimizing).  The patch improves that to 0.9s (15%).  For the 
first testcase it isn't that bad but still the patch improves CCP from 36% to 
14%.

The "solution" is to use sbitmap instead of a bitmap to avoid
the linearity when doing add_ssa_edge.  We pay for that (but not
actually with the testcases) with a linear-time bitmap_first_set_bit
in process_ssa_edge_worklist.  I do not (yet...) have a testcase
that overall gets slower with this approach.  I suppose using
std::set would "solve" the complexity issue but we'd pay
back with horribly inefficient memory use.  Similarly with
our sparse bitmap implementation which lacks an ordered
first_set_bit (it only can get any set bit fast, breaking optimal
iteration order).

If we'd only had a O(log n) search sparse bitmap implementation ...
(Steven posted patches to switch bitmap from/to such one but IIRC
that at least lacked bitmap_first_set_bit).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

2018-09-19  Richard Biener  

PR tree-optimization/63155
* tree-ssa-propagate.h
(ssa_propagation_engine::process_ssa_edge_worklist): Change
prototype.
* tree-ssa-propagate.c (ssa_edge_worklist): Turn into an sbitmap.
(add_ssa_edge): Adjust.
(ssa_propagation_engine::process_ssa_edge_worklist): If there
was nothing to do return false.
(ssa_prop_init): Allocate and clear ssa_edge_worklist.
(ssa_prop_fini): Adjust.
(ssa_propagation_engine::ssa_propagate): Elide the check
for an empty ssa_edge_worklist by using the
process_ssa_edge_worklist return value.

Index: gcc/tree-ssa-propagate.h
===
--- gcc/tree-ssa-propagate.h(revision 264418)
+++ gcc/tree-ssa-propagate.h(working copy)
@@ -94,7 +94,7 @@ class ssa_propagation_engine
  private:
   /* Internal implementation details.  */
   void simulate_stmt (gimple *stmt);
-  void process_ssa_edge_worklist (void);
+  bool process_ssa_edge_worklist (void);
   void simulate_block (basic_block);
 
 };
Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 264418)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -119,7 +119,7 @@ static int *cfg_order_to_bb;
definition has changed.  SSA edges are def-use edges in the SSA
web.  For each D-U edge, we store the target statement or PHI node
UID in a bitmap.  UIDs order stmts in execution order.   */
-static bitmap ssa_edge_worklist;
+static sbitmap ssa_edge_worklist;
 static vec uid_to_stmt;
 
 /* Return true if the block worklist empty.  */
@@ -175,8 +175,9 @@ add_ssa_edge (tree var)
continue;
 
   if (prop_simulate_again_p (use_stmt)
- && bitmap_set_bit (ssa_edge_worklist, gimple_uid (use_stmt)))
+ && !bitmap_bit_p (ssa_edge_worklist, gimple_uid (use_stmt)))
{
+ bitmap_set_bit (ssa_edge_worklist, gimple_uid (use_stmt));
  uid_to_stmt[gimple_uid (use_stmt)] = use_stmt;
  if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -317,11 +318,14 @@ ssa_propagation_engine::simulate_stmt (g
when an SSA edge is added to it in simulate_stmt.  Return true if a stmt
was simulated.  */
 
-void
+bool
 ssa_propagation_engine::process_ssa_edge_worklist (void)
 {
   /* Process the next entry from the worklist.  */
-  unsigned stmt_uid = bitmap_first_set_bit (ssa_edge_worklist);
+  int stmt_uid = bitmap_first_set_bit (ssa_edge_worklist);
+  if (stmt_uid == -1)
+return false;
+
   bitmap_clear_bit (ssa_edge_worklist, stmt_uid);
   gimple *stmt = uid_to_stmt[stmt_uid];
 
@@ -335,6 +339,7 @@ ssa_propagation_engine::process_ssa_edge
 }
 
   simulate_stmt (stmt);
+  return true;
 }
 
 
@@ -412,9 +417,6 @@ ssa_prop_init (void)
   edge_iterator ei;
   basic_block bb;
 
-  /* Worklists of SSA edges.  */
-  ssa_edge_worklist = BITMAP_ALLOC (NULL);
-
   /* Worklist of basic-blocks.  */
   bb_to_cfg_order = XNEWVEC (int, last_basic_block_for_fn (cfun) + 1);
   cfg_order_to_bb = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
@@ -455,6 +457,10 @@ ssa_prop_init (void)
 }
   uid_to_stmt.safe_grow (gimple_stmt_max_uid (cfun));
 
+  /* Worklists of SSA edges.  */
+  ssa_edge_worklist = sbitmap_alloc (gimple_stmt_max_uid (cfun));
+  bitmap_clear (ssa_edge_worklist);
+
   /* Seed the algorithm by adding the successors of the entry block to the
  edge worklist.  */
   FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
@@ -473,7 +479,8 @@ ssa_prop_fini (void)
   BITMAP_FREE (cfg_blocks);
   free (bb_to_cfg_order);
   free (cfg_order_to_bb);
-  BITMAP_FREE (ssa_edge_worklist);
+  sbitmap_free (ssa_edge_worklist);
+  ssa_edge_worklist = NULL;
   

Re: [PATCH][RFC] Fix PR63155

2015-03-19 Thread Richard Biener
On Wed, 18 Mar 2015, Richard Biener wrote:

 On March 18, 2015 4:59:30 PM GMT+01:00, Alan Lawrence alan.lawre...@arm.com 
 wrote:
 Following this patch (r221318), we're seeing what appears to be a
 miscompile of 
 glibc on AArch64. This causes quite a bunch of tests to fail, segfaults
 etc., if 
 LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs without
 (same 
 glibc sources). We are still working on a reduced testcase, but this is
 proving 
 difficult...
 
 The patch was supposed to end up in the very same code generation.  
 Thus it's enough to identify the problematic source file and send me 
 preprocessed source and cc1 invocation so I can investigate with a 
 cross.

Ok - I _think_ that live_track_process_def doesn't handle partitions
with more than one member.  It's too closely tied to SSA.

Thus the whole idea of iterating the coalescing doesn't work without
fixing that.  Like conservatively with

Index: gcc/tree-ssa-coalesce.c
===
--- gcc/tree-ssa-coalesce.c (revision 221490)
+++ gcc/tree-ssa-coalesce.c (working copy)
@@ -724,7 +724,10 @@ live_track_clear_var (live_track_p ptr,
   int p;
 
   p = var_to_partition (ptr-map, var);
-  if (p != NO_PARTITION)
+  if (p != NO_PARTITION
+  /* ???  If this partition has more than one member don't do anything.  */
+   ptr-map-var_partition-elements[SSA_NAME_VERSION
+ (partition_to_var (ptr-map, p))].class_count == 1)
 live_track_remove_partition (ptr, p);
 }
 
@@ -780,8 +783,11 @@ live_track_process_def (live_track_p ptr
   if (p == NO_PARTITION)
 return;
 
-  /* Clear the liveness bit.  */
-  live_track_remove_partition (ptr, p);
+  /* Clear the liveness bit.
+ ???  If this partition has more than one member we can't do that.  */
+  if (ptr-map-var_partition-elements[SSA_NAME_VERSION
+ (partition_to_var (ptr-map, p))].class_count == 1)
+live_track_remove_partition (ptr, p);
 
   /* If the bitmap isn't empty now, conflicts need to be added.  */
   root = basevar_index (ptr-map, p);
@@ -789,7 +795,8 @@ live_track_process_def (live_track_p ptr
 {
   b = ptr-live_base_partitions[root];
   EXECUTE_IF_SET_IN_BITMAP (b, 0, x, bi)
-ssa_conflicts_add (graph, p, x);
+   if (x != (unsigned) p)
+ ssa_conflicts_add (graph, p, x);
 }
 }
 

Does that fix it?

Given the issue I'll revert the patch.

Thanks,
Richard.

 Thanks,
 Richard.
 
 --Alan
 
 Richard Biener wrote:
  On Mon, 9 Mar 2015, Richard Biener wrote:
  
  On Fri, 6 Mar 2015, Jeff Law wrote:
 
  On 03/06/15 06:16, Richard Biener wrote:
  This fixes PR63155 and reduces the memory usage at -O0 from
 reported
  10GB (couldn't verify/update on my small box) to 350MB (still
 worse
  compared to 4.8 which needs only 50MB).
 
  It fixes this by no longer computing live info or building a
 conflict
  graph for coalescing of SSA names flowing over abnormal edges
  (which needs to succeed).
 
  Of course this also removes verification that this coalescing is
 valid
  (but computing this has quadratic cost).  With this it turns
  ICEs into miscompiles.
 
  We could restrict verifying that we can perform abnormal
 coalescing
  to ENABLE_CHECKING (and I've wanted a verifier pass that can
 verify
  this multiple times to be able to catch what breaks it and not
 having
  to work back from out-of-SSA ICEing...).
 
  So any opinion on this patch welcome.
 
  Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
  Ok for trunk? ;)
 
  Thanks,
  Richard.
 
  2015-03-06  Richard Biener  rguent...@suse.de
 
PR middle-end/63155
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being
 NULL.
(coalesce_partitions): Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without
 computing
live/conflict.
  I'd personally like to keep the checking when ENABLE_CHECKING.
 
  I haven't followed this discussion real closely, but I wonder if
 some kind of
  blocking approach would work without blowing up the memory
 consumption.
  There's no inherent reason why we have to coalesce everything at
 the same
  time.  We can use a blocking factor and do coalescing on some N
 number of
  SSA_NAMEs at a time.
  Yes, that's possible at (quite?) some compile-time cost.  Note that
 we
  can't really guarantee that the resulting live/conflict problems
 shrink
  significantly enough without sorting the coalesces in a different
 way
  (not after important coalesces but after their basevars).
 
  I suspect we can select an N that ultimately degenerates into the
 current do
  everything together for the common case and only has to iterate
 over blocks
  of SSA_NAMEs in extreme cases.
  I've attached a patch to the PR that adds such a number N after
 which we
  simply stop coalescing.  Of course that doesn't work for abnormals
 where
  we _must_ 

Re: [PATCH][RFC] Fix PR63155

2015-03-18 Thread Alan Lawrence
Following this patch (r221318), we're seeing what appears to be a miscompile of 
glibc on AArch64. This causes quite a bunch of tests to fail, segfaults etc., if 
LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs without (same 
glibc sources). We are still working on a reduced testcase, but this is proving 
difficult...


--Alan

Richard Biener wrote:

On Mon, 9 Mar 2015, Richard Biener wrote:


On Fri, 6 Mar 2015, Jeff Law wrote:


On 03/06/15 06:16, Richard Biener wrote:

This fixes PR63155 and reduces the memory usage at -O0 from reported
10GB (couldn't verify/update on my small box) to 350MB (still worse
compared to 4.8 which needs only 50MB).

It fixes this by no longer computing live info or building a conflict
graph for coalescing of SSA names flowing over abnormal edges
(which needs to succeed).

Of course this also removes verification that this coalescing is valid
(but computing this has quadratic cost).  With this it turns
ICEs into miscompiles.

We could restrict verifying that we can perform abnormal coalescing
to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
this multiple times to be able to catch what breaks it and not having
to work back from out-of-SSA ICEing...).

So any opinion on this patch welcome.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk? ;)

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

  PR middle-end/63155
  * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
  (coalesce_partitions): Split out abnormal coalescing to ...
  (perform_abnormal_coalescing): ... this function.
  (coalesce_ssa_name): Perform abnormal coalescing without computing
  live/conflict.

I'd personally like to keep the checking when ENABLE_CHECKING.

I haven't followed this discussion real closely, but I wonder if some kind of
blocking approach would work without blowing up the memory consumption.
There's no inherent reason why we have to coalesce everything at the same
time.  We can use a blocking factor and do coalescing on some N number of
SSA_NAMEs at a time.

Yes, that's possible at (quite?) some compile-time cost.  Note that we
can't really guarantee that the resulting live/conflict problems shrink
significantly enough without sorting the coalesces in a different way
(not after important coalesces but after their basevars).


I suspect we can select an N that ultimately degenerates into the current do
everything together for the common case and only has to iterate over blocks
of SSA_NAMEs in extreme cases.

I've attached a patch to the PR that adds such a number N after which we
simply stop coalescing.  Of course that doesn't work for abnormals where
we _must_ coalesce.  Thus this patch ...

As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder
if we should make some of the checking we do a runtime choice rather
than a compile-time one...).  I'll update the patch.


Ok, like the following which adds a verify_ssa_coalescing () function
(which could in theory be called from IL verification like verify_ssa)
and calls it when ENABLE_CHECKING is defined.

Bootstrap  regtest running on x86_64-unknown-linux-gnu.

It didn't look appropriate for this stage to implement virtual operand
verification.

Ok this way?

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/63155
* tree-ssa-coalesce.h (verify_ssa_coalescing): Declare.
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
(coalesce_partitions): Call verify_ssa_coalescing if ENABLE_CHECKING.
Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without computing
live/conflict.
(verify_ssa_coalescing_worker): New function.
(verify_ssa_coalescing): Likewise.

Index: gcc/tree-ssa-coalesce.c
===
*** gcc/tree-ssa-coalesce.c (revision 221278)
--- gcc/tree-ssa-coalesce.c (working copy)
*** create_outofssa_var_map (coalesce_list_p
*** 1121,1128 


  /* Attempt to coalesce ssa versions X and Y together using the partition
!mapping in MAP and checking conflicts in GRAPH.  Output any debug info to
!DEBUG, if it is nun-NULL.  */

  static inline bool
  attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
--- 1121,1128 


  /* Attempt to coalesce ssa versions X and Y together using the partition
!mapping in MAP and checking conflicts in GRAPH if not NULL.
!Output any debug info to DEBUG, if it is nun-NULL.  */

  static inline bool
  attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
*** attempt_coalesce (var_map map, ssa_confl
*** 1154,1160 
  fprintf (debug,  [map: %d, %d] , p1, p2);


!   if (!ssa_conflicts_test_p (graph, p1, p2))
  {
var1 = partition_to_var (map, p1);
var2 = 

Re: [PATCH][RFC] Fix PR63155

2015-03-18 Thread Richard Biener
On March 18, 2015 4:59:30 PM GMT+01:00, Alan Lawrence alan.lawre...@arm.com 
wrote:
Following this patch (r221318), we're seeing what appears to be a
miscompile of 
glibc on AArch64. This causes quite a bunch of tests to fail, segfaults
etc., if 
LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs without
(same 
glibc sources). We are still working on a reduced testcase, but this is
proving 
difficult...

The patch was supposed to end up in the very same code generation.  Thus it's 
enough to identify the problematic source file and send me preprocessed source 
and cc1 invocation so I can investigate with a cross.

Thanks,
Richard.

--Alan

Richard Biener wrote:
 On Mon, 9 Mar 2015, Richard Biener wrote:
 
 On Fri, 6 Mar 2015, Jeff Law wrote:

 On 03/06/15 06:16, Richard Biener wrote:
 This fixes PR63155 and reduces the memory usage at -O0 from
reported
 10GB (couldn't verify/update on my small box) to 350MB (still
worse
 compared to 4.8 which needs only 50MB).

 It fixes this by no longer computing live info or building a
conflict
 graph for coalescing of SSA names flowing over abnormal edges
 (which needs to succeed).

 Of course this also removes verification that this coalescing is
valid
 (but computing this has quadratic cost).  With this it turns
 ICEs into miscompiles.

 We could restrict verifying that we can perform abnormal
coalescing
 to ENABLE_CHECKING (and I've wanted a verifier pass that can
verify
 this multiple times to be able to catch what breaks it and not
having
 to work back from out-of-SSA ICEing...).

 So any opinion on this patch welcome.

 Bootstrap and regtest running on x86_64-unknown-linux-gnu.

 Ok for trunk? ;)

 Thanks,
 Richard.

 2015-03-06  Richard Biener  rguent...@suse.de

   PR middle-end/63155
   * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being
NULL.
   (coalesce_partitions): Split out abnormal coalescing to ...
   (perform_abnormal_coalescing): ... this function.
   (coalesce_ssa_name): Perform abnormal coalescing without
computing
   live/conflict.
 I'd personally like to keep the checking when ENABLE_CHECKING.

 I haven't followed this discussion real closely, but I wonder if
some kind of
 blocking approach would work without blowing up the memory
consumption.
 There's no inherent reason why we have to coalesce everything at
the same
 time.  We can use a blocking factor and do coalescing on some N
number of
 SSA_NAMEs at a time.
 Yes, that's possible at (quite?) some compile-time cost.  Note that
we
 can't really guarantee that the resulting live/conflict problems
shrink
 significantly enough without sorting the coalesces in a different
way
 (not after important coalesces but after their basevars).

 I suspect we can select an N that ultimately degenerates into the
current do
 everything together for the common case and only has to iterate
over blocks
 of SSA_NAMEs in extreme cases.
 I've attached a patch to the PR that adds such a number N after
which we
 simply stop coalescing.  Of course that doesn't work for abnormals
where
 we _must_ coalesce.  Thus this patch ...

 As said, it's simple to keep the checking with ENABLE_CHECKING (I
wonder
 if we should make some of the checking we do a runtime choice rather
 than a compile-time one...).  I'll update the patch.
 
 Ok, like the following which adds a verify_ssa_coalescing () function
 (which could in theory be called from IL verification like
verify_ssa)
 and calls it when ENABLE_CHECKING is defined.
 
 Bootstrap  regtest running on x86_64-unknown-linux-gnu.
 
 It didn't look appropriate for this stage to implement virtual
operand
 verification.
 
 Ok this way?
 
 Thanks,
 Richard.
 
 2015-03-06  Richard Biener  rguent...@suse.de
 
 PR middle-end/63155
 * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare.
 * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being
NULL.
 (coalesce_partitions): Call verify_ssa_coalescing if
ENABLE_CHECKING.
 Split out abnormal coalescing to ...
 (perform_abnormal_coalescing): ... this function.
 (coalesce_ssa_name): Perform abnormal coalescing without
computing
 live/conflict.
 (verify_ssa_coalescing_worker): New function.
 (verify_ssa_coalescing): Likewise.
 
 Index: gcc/tree-ssa-coalesce.c
 ===
 *** gcc/tree-ssa-coalesce.c (revision 221278)
 --- gcc/tree-ssa-coalesce.c (working copy)
 *** create_outofssa_var_map (coalesce_list_p
 *** 1121,1128 
 
 
   /* Attempt to coalesce ssa versions X and Y together using the
partition
 !mapping in MAP and checking conflicts in GRAPH.  Output any
debug info to
 !DEBUG, if it is nun-NULL.  */
 
   static inline bool
   attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
 --- 1121,1128 
 
 
   /* Attempt to coalesce ssa versions X and Y together using the
partition
 !mapping in MAP and checking conflicts in GRAPH if not 

Re: [PATCH][RFC] Fix PR63155

2015-03-18 Thread Andrew Pinski
On Wed, Mar 18, 2015 at 8:59 AM, Alan Lawrence alan.lawre...@arm.com wrote:
 Following this patch (r221318), we're seeing what appears to be a miscompile
 of glibc on AArch64. This causes quite a bunch of tests to fail, segfaults
 etc., if LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs
 without (same glibc sources). We are still working on a reduced testcase,
 but this is proving difficult...

I was just debugging the same issue.  From what I can tell vfprintf is
being miscompiled.
An example code which shows the issue but not self-contained yet as I
thought it was related to varargs but I suspect it is more related to
the computed gotos inside vprintf in glibc.
#include stdarg.h
#include stdio.h

int g(int a, va_list ap)
{
  int b;
  b = va_arg(ap, int);
  if (b != SECOND)
__builtin_abort ();
  if (a != FIRST)
__builtin_abort ();
  printf(first: %d.\n, a);
  printf(second: %d.\n, b);
}

int f(int a, ...)
{
  int b;
  va_list ap;
  va_start(ap, a);
  g(a, ap);
  va_end (ap);
  return 0;
}

int main(void)
{
  return f(2, FIRST, SECOND);
}
--- CUT ---
With the new glibc I get:
first: 4194304.
second: 1.

But with the old one I get :
first: 16.
second: 32.


Thanks,
Andrew Pinski



 --Alan


 Richard Biener wrote:

 On Mon, 9 Mar 2015, Richard Biener wrote:

 On Fri, 6 Mar 2015, Jeff Law wrote:

 On 03/06/15 06:16, Richard Biener wrote:

 This fixes PR63155 and reduces the memory usage at -O0 from reported
 10GB (couldn't verify/update on my small box) to 350MB (still worse
 compared to 4.8 which needs only 50MB).

 It fixes this by no longer computing live info or building a conflict
 graph for coalescing of SSA names flowing over abnormal edges
 (which needs to succeed).

 Of course this also removes verification that this coalescing is valid
 (but computing this has quadratic cost).  With this it turns
 ICEs into miscompiles.

 We could restrict verifying that we can perform abnormal coalescing
 to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
 this multiple times to be able to catch what breaks it and not having
 to work back from out-of-SSA ICEing...).

 So any opinion on this patch welcome.

 Bootstrap and regtest running on x86_64-unknown-linux-gnu.

 Ok for trunk? ;)

 Thanks,
 Richard.

 2015-03-06  Richard Biener  rguent...@suse.de

   PR middle-end/63155
   * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
   (coalesce_partitions): Split out abnormal coalescing to ...
   (perform_abnormal_coalescing): ... this function.
   (coalesce_ssa_name): Perform abnormal coalescing without computing
   live/conflict.

 I'd personally like to keep the checking when ENABLE_CHECKING.

 I haven't followed this discussion real closely, but I wonder if some
 kind of
 blocking approach would work without blowing up the memory consumption.
 There's no inherent reason why we have to coalesce everything at the
 same
 time.  We can use a blocking factor and do coalescing on some N number
 of
 SSA_NAMEs at a time.

 Yes, that's possible at (quite?) some compile-time cost.  Note that we
 can't really guarantee that the resulting live/conflict problems shrink
 significantly enough without sorting the coalesces in a different way
 (not after important coalesces but after their basevars).

 I suspect we can select an N that ultimately degenerates into the
 current do
 everything together for the common case and only has to iterate over
 blocks
 of SSA_NAMEs in extreme cases.

 I've attached a patch to the PR that adds such a number N after which we
 simply stop coalescing.  Of course that doesn't work for abnormals where
 we _must_ coalesce.  Thus this patch ...

 As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder
 if we should make some of the checking we do a runtime choice rather
 than a compile-time one...).  I'll update the patch.


 Ok, like the following which adds a verify_ssa_coalescing () function
 (which could in theory be called from IL verification like verify_ssa)
 and calls it when ENABLE_CHECKING is defined.

 Bootstrap  regtest running on x86_64-unknown-linux-gnu.

 It didn't look appropriate for this stage to implement virtual operand
 verification.

 Ok this way?

 Thanks,
 Richard.

 2015-03-06  Richard Biener  rguent...@suse.de

 PR middle-end/63155
 * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare.
 * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
 (coalesce_partitions): Call verify_ssa_coalescing if
 ENABLE_CHECKING.
 Split out abnormal coalescing to ...
 (perform_abnormal_coalescing): ... this function.
 (coalesce_ssa_name): Perform abnormal coalescing without computing
 live/conflict.
 (verify_ssa_coalescing_worker): New function.
 (verify_ssa_coalescing): Likewise.

 Index: gcc/tree-ssa-coalesce.c
 ===
 *** gcc/tree-ssa-coalesce.c (revision 221278)
 

Re: [PATCH][RFC] Fix PR63155

2015-03-09 Thread Richard Biener
On Mon, 9 Mar 2015, Richard Biener wrote:

 On Fri, 6 Mar 2015, Jeff Law wrote:
 
  On 03/06/15 06:16, Richard Biener wrote:
   
   This fixes PR63155 and reduces the memory usage at -O0 from reported
   10GB (couldn't verify/update on my small box) to 350MB (still worse
   compared to 4.8 which needs only 50MB).
   
   It fixes this by no longer computing live info or building a conflict
   graph for coalescing of SSA names flowing over abnormal edges
   (which needs to succeed).
   
   Of course this also removes verification that this coalescing is valid
   (but computing this has quadratic cost).  With this it turns
   ICEs into miscompiles.
   
   We could restrict verifying that we can perform abnormal coalescing
   to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
   this multiple times to be able to catch what breaks it and not having
   to work back from out-of-SSA ICEing...).
   
   So any opinion on this patch welcome.
   
   Bootstrap and regtest running on x86_64-unknown-linux-gnu.
   
   Ok for trunk? ;)
   
   Thanks,
   Richard.
   
   2015-03-06  Richard Biener  rguent...@suse.de
   
 PR middle-end/63155
 * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
 (coalesce_partitions): Split out abnormal coalescing to ...
 (perform_abnormal_coalescing): ... this function.
 (coalesce_ssa_name): Perform abnormal coalescing without computing
 live/conflict.
  I'd personally like to keep the checking when ENABLE_CHECKING.
  
  I haven't followed this discussion real closely, but I wonder if some kind 
  of
  blocking approach would work without blowing up the memory consumption.
  There's no inherent reason why we have to coalesce everything at the same
  time.  We can use a blocking factor and do coalescing on some N number of
  SSA_NAMEs at a time.
 
 Yes, that's possible at (quite?) some compile-time cost.  Note that we
 can't really guarantee that the resulting live/conflict problems shrink
 significantly enough without sorting the coalesces in a different way
 (not after important coalesces but after their basevars).
  
  I suspect we can select an N that ultimately degenerates into the current 
  do
  everything together for the common case and only has to iterate over blocks
  of SSA_NAMEs in extreme cases.
 
 I've attached a patch to the PR that adds such a number N after which we
 simply stop coalescing.  Of course that doesn't work for abnormals where
 we _must_ coalesce.  Thus this patch ...
 
 As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder
 if we should make some of the checking we do a runtime choice rather
 than a compile-time one...).  I'll update the patch.

Ok, like the following which adds a verify_ssa_coalescing () function
(which could in theory be called from IL verification like verify_ssa)
and calls it when ENABLE_CHECKING is defined.

Bootstrap  regtest running on x86_64-unknown-linux-gnu.

It didn't look appropriate for this stage to implement virtual operand
verification.

Ok this way?

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/63155
* tree-ssa-coalesce.h (verify_ssa_coalescing): Declare.
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
(coalesce_partitions): Call verify_ssa_coalescing if ENABLE_CHECKING.
Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without computing
live/conflict.
(verify_ssa_coalescing_worker): New function.
(verify_ssa_coalescing): Likewise.

Index: gcc/tree-ssa-coalesce.c
===
*** gcc/tree-ssa-coalesce.c (revision 221278)
--- gcc/tree-ssa-coalesce.c (working copy)
*** create_outofssa_var_map (coalesce_list_p
*** 1121,1128 
  
  
  /* Attempt to coalesce ssa versions X and Y together using the partition
!mapping in MAP and checking conflicts in GRAPH.  Output any debug info to
!DEBUG, if it is nun-NULL.  */
  
  static inline bool
  attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
--- 1121,1128 
  
  
  /* Attempt to coalesce ssa versions X and Y together using the partition
!mapping in MAP and checking conflicts in GRAPH if not NULL.
!Output any debug info to DEBUG, if it is nun-NULL.  */
  
  static inline bool
  attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
*** attempt_coalesce (var_map map, ssa_confl
*** 1154,1160 
  fprintf (debug,  [map: %d, %d] , p1, p2);
  
  
!   if (!ssa_conflicts_test_p (graph, p1, p2))
  {
var1 = partition_to_var (map, p1);
var2 = partition_to_var (map, p2);
--- 1154,1161 
  fprintf (debug,  [map: %d, %d] , p1, p2);
  
  
!   if (!graph
!   || !ssa_conflicts_test_p (graph, p1, p2))
  {
var1 = partition_to_var 

Re: [PATCH][RFC] Fix PR63155

2015-03-09 Thread Jeff Law

On 03/09/15 03:42, Richard Biener wrote:

On Fri, 6 Mar 2015, Jeff Law wrote:


On 03/06/15 06:16, Richard Biener wrote:


This fixes PR63155 and reduces the memory usage at -O0 from reported
10GB (couldn't verify/update on my small box) to 350MB (still worse
compared to 4.8 which needs only 50MB).

It fixes this by no longer computing live info or building a conflict
graph for coalescing of SSA names flowing over abnormal edges
(which needs to succeed).

Of course this also removes verification that this coalescing is valid
(but computing this has quadratic cost).  With this it turns
ICEs into miscompiles.

We could restrict verifying that we can perform abnormal coalescing
to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
this multiple times to be able to catch what breaks it and not having
to work back from out-of-SSA ICEing...).

So any opinion on this patch welcome.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk? ;)

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/63155
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
(coalesce_partitions): Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without computing
live/conflict.

I'd personally like to keep the checking when ENABLE_CHECKING.

I haven't followed this discussion real closely, but I wonder if some kind of
blocking approach would work without blowing up the memory consumption.
There's no inherent reason why we have to coalesce everything at the same
time.  We can use a blocking factor and do coalescing on some N number of
SSA_NAMEs at a time.


Yes, that's possible at (quite?) some compile-time cost.  Note that we
can't really guarantee that the resulting live/conflict problems shrink
significantly enough without sorting the coalesces in a different way
(not after important coalesces but after their basevars).
Yea, it's a class time/space tradeoff.  I guess it comes down to how 
much compile-time pain we'll take for reducing memory usage.


It may also be the case that some blocking factors are actually faster 
than doing everything at once, even for more common input graph sizes.


I actually ran into this when looking at the liveness computations for 
into-ssa eons ago.  We were computing liveness in parallel, but a 
blocking of 1 object is actually best:


https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01301.html



Jeff


Re: [PATCH][RFC] Fix PR63155

2015-03-09 Thread Jeff Law

On 03/09/15 07:01, Richard Biener wrote:


Ok, like the following which adds a verify_ssa_coalescing () function
(which could in theory be called from IL verification like verify_ssa)
and calls it when ENABLE_CHECKING is defined.

Bootstrap  regtest running on x86_64-unknown-linux-gnu.

It didn't look appropriate for this stage to implement virtual operand
verification.

Ok this way?

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/63155
* tree-ssa-coalesce.h (verify_ssa_coalescing): Declare.
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
(coalesce_partitions): Call verify_ssa_coalescing if ENABLE_CHECKING.
Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without computing
live/conflict.
(verify_ssa_coalescing_worker): New function.
(verify_ssa_coalescing): Likewise.

Looks good to me.

jeff



Re: [PATCH][RFC] Fix PR63155

2015-03-09 Thread Richard Biener
On Fri, 6 Mar 2015, Jeff Law wrote:

 On 03/06/15 06:16, Richard Biener wrote:
  
  This fixes PR63155 and reduces the memory usage at -O0 from reported
  10GB (couldn't verify/update on my small box) to 350MB (still worse
  compared to 4.8 which needs only 50MB).
  
  It fixes this by no longer computing live info or building a conflict
  graph for coalescing of SSA names flowing over abnormal edges
  (which needs to succeed).
  
  Of course this also removes verification that this coalescing is valid
  (but computing this has quadratic cost).  With this it turns
  ICEs into miscompiles.
  
  We could restrict verifying that we can perform abnormal coalescing
  to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
  this multiple times to be able to catch what breaks it and not having
  to work back from out-of-SSA ICEing...).
  
  So any opinion on this patch welcome.
  
  Bootstrap and regtest running on x86_64-unknown-linux-gnu.
  
  Ok for trunk? ;)
  
  Thanks,
  Richard.
  
  2015-03-06  Richard Biener  rguent...@suse.de
  
  PR middle-end/63155
  * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
  (coalesce_partitions): Split out abnormal coalescing to ...
  (perform_abnormal_coalescing): ... this function.
  (coalesce_ssa_name): Perform abnormal coalescing without computing
  live/conflict.
 I'd personally like to keep the checking when ENABLE_CHECKING.
 
 I haven't followed this discussion real closely, but I wonder if some kind of
 blocking approach would work without blowing up the memory consumption.
 There's no inherent reason why we have to coalesce everything at the same
 time.  We can use a blocking factor and do coalescing on some N number of
 SSA_NAMEs at a time.

Yes, that's possible at (quite?) some compile-time cost.  Note that we
can't really guarantee that the resulting live/conflict problems shrink
significantly enough without sorting the coalesces in a different way
(not after important coalesces but after their basevars).
 
 I suspect we can select an N that ultimately degenerates into the current do
 everything together for the common case and only has to iterate over blocks
 of SSA_NAMEs in extreme cases.

I've attached a patch to the PR that adds such a number N after which we
simply stop coalescing.  Of course that doesn't work for abnormals where
we _must_ coalesce.  Thus this patch ...

As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder
if we should make some of the checking we do a runtime choice rather
than a compile-time one...).  I'll update the patch.

Thanks,
Richard.


[PATCH][RFC] Fix PR63155

2015-03-06 Thread Richard Biener

This fixes PR63155 and reduces the memory usage at -O0 from reported
10GB (couldn't verify/update on my small box) to 350MB (still worse
compared to 4.8 which needs only 50MB).

It fixes this by no longer computing live info or building a conflict
graph for coalescing of SSA names flowing over abnormal edges
(which needs to succeed).

Of course this also removes verification that this coalescing is valid
(but computing this has quadratic cost).  With this it turns
ICEs into miscompiles.

We could restrict verifying that we can perform abnormal coalescing
to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
this multiple times to be able to catch what breaks it and not having
to work back from out-of-SSA ICEing...).

So any opinion on this patch welcome.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk? ;)

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/63155
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
(coalesce_partitions): Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without computing
live/conflict.

Index: gcc/tree-ssa-coalesce.c
===
*** gcc/tree-ssa-coalesce.c (revision 221237)
--- gcc/tree-ssa-coalesce.c (working copy)
*** create_outofssa_var_map (coalesce_list_p
*** 1121,1128 
  
  
  /* Attempt to coalesce ssa versions X and Y together using the partition
!mapping in MAP and checking conflicts in GRAPH.  Output any debug info to
!DEBUG, if it is nun-NULL.  */
  
  static inline bool
  attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
--- 1121,1128 
  
  
  /* Attempt to coalesce ssa versions X and Y together using the partition
!mapping in MAP and checking conflicts in GRAPH if not NULL.
!Output any debug info to DEBUG, if it is nun-NULL.  */
  
  static inline bool
  attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
*** attempt_coalesce (var_map map, ssa_confl
*** 1154,1160 
  fprintf (debug,  [map: %d, %d] , p1, p2);
  
  
!   if (!ssa_conflicts_test_p (graph, p1, p2))
  {
var1 = partition_to_var (map, p1);
var2 = partition_to_var (map, p2);
--- 1154,1161 
  fprintf (debug,  [map: %d, %d] , p1, p2);
  
  
!   if (!graph
!   || !ssa_conflicts_test_p (graph, p1, p2))
  {
var1 = partition_to_var (map, p1);
var2 = partition_to_var (map, p2);
*** attempt_coalesce (var_map map, ssa_confl
*** 1168,1177 
  
/* z is the new combined partition.  Remove the other partition from
 the list, and merge the conflicts.  */
!   if (z == p1)
!   ssa_conflicts_merge (graph, p1, p2);
!   else
!   ssa_conflicts_merge (graph, p2, p1);
  
if (debug)
fprintf (debug, : Success - %d\n, z);
--- 1169,1181 
  
/* z is the new combined partition.  Remove the other partition from
 the list, and merge the conflicts.  */
!   if (graph)
!   {
! if (z == p1)
!   ssa_conflicts_merge (graph, p1, p2);
! else
!   ssa_conflicts_merge (graph, p2, p1);
!   }
  
if (debug)
fprintf (debug, : Success - %d\n, z);
*** attempt_coalesce (var_map map, ssa_confl
*** 1185,1208 
  }
  
  
! /* Attempt to Coalesce partitions in MAP which occur in the list CL using
!GRAPH.  Debug output is sent to DEBUG if it is non-NULL.  */
  
  static void
! coalesce_partitions (var_map map, ssa_conflicts_p graph, coalesce_list_p cl,
!FILE *debug)
  {
-   int x = 0, y = 0;
-   tree var1, var2;
-   int cost;
basic_block bb;
edge e;
edge_iterator ei;
  
-   /* First, coalesce all the copies across abnormal edges.  These are not 
placed
-  in the coalesce list because they do not need to be sorted, and simply
-  consume extra memory/compilation time in large programs.  */
- 
FOR_EACH_BB_FN (bb, cfun)
  {
FOR_EACH_EDGE (e, ei, bb-preds)
--- 1189,1204 
  }
  
  
! /* Perform all abnormal coalescing on MAP.
!Debug output is sent to DEBUG if it is non-NULL.  */
  
  static void
! perform_abnormal_coalescing (var_map map, FILE *debug)
  {
basic_block bb;
edge e;
edge_iterator ei;
  
FOR_EACH_BB_FN (bb, cfun)
  {
FOR_EACH_EDGE (e, ei, bb-preds)
*** coalesce_partitions (var_map map, ssa_co
*** 1226,1236 
if (debug)
  fprintf (debug, Abnormal coalesce: );
  
!   if (!attempt_coalesce (map, graph, v1, v2, debug))
  fail_abnormal_edge_coalesce (v1, v2);
  }
  }
  }
  
/* Now process the items in the coalesce list.  */
  
--- 1222,1244 
if (debug)
  

Re: [PATCH][RFC] Fix PR63155

2015-03-06 Thread Jeff Law

On 03/06/15 06:16, Richard Biener wrote:


This fixes PR63155 and reduces the memory usage at -O0 from reported
10GB (couldn't verify/update on my small box) to 350MB (still worse
compared to 4.8 which needs only 50MB).

It fixes this by no longer computing live info or building a conflict
graph for coalescing of SSA names flowing over abnormal edges
(which needs to succeed).

Of course this also removes verification that this coalescing is valid
(but computing this has quadratic cost).  With this it turns
ICEs into miscompiles.

We could restrict verifying that we can perform abnormal coalescing
to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
this multiple times to be able to catch what breaks it and not having
to work back from out-of-SSA ICEing...).

So any opinion on this patch welcome.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk? ;)

Thanks,
Richard.

2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/63155
* tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
(coalesce_partitions): Split out abnormal coalescing to ...
(perform_abnormal_coalescing): ... this function.
(coalesce_ssa_name): Perform abnormal coalescing without computing
live/conflict.

I'd personally like to keep the checking when ENABLE_CHECKING.

I haven't followed this discussion real closely, but I wonder if some 
kind of blocking approach would work without blowing up the memory 
consumption.  There's no inherent reason why we have to coalesce 
everything at the same time.  We can use a blocking factor and do 
coalescing on some N number of SSA_NAMEs at a time.


I suspect we can select an N that ultimately degenerates into the 
current do everything together for the common case and only has to 
iterate over blocks of SSA_NAMEs in extreme cases.


Jeff