Re: [PATCH][RFC] Fix PR63155 (some more)
On Wed, 19 Sep 2018, Richard Biener wrote: > > The second testcase in the above PR runs into our O(N) bitmap element > search limitation and spends 8s (60%) of the compile-time in the SSA > propagator > engine (when optimizing). The patch improves that to 0.9s (15%). For the > first testcase it isn't that bad but still the patch improves CCP from 36% to > 14%. > > The "solution" is to use sbitmap instead of a bitmap to avoid > the linearity when doing add_ssa_edge. We pay for that (but not > actually with the testcases) with a linear-time bitmap_first_set_bit > in process_ssa_edge_worklist. I do not (yet...) have a testcase > that overall gets slower with this approach. I suppose using > std::set would "solve" the complexity issue but we'd pay > back with horribly inefficient memory use. Similarly with > our sparse bitmap implementation which lacks an ordered > first_set_bit (it only can get any set bit fast, breaking optimal > iteration order). > > If we'd only had a O(log n) search sparse bitmap implementation ... > (Steven posted patches to switch bitmap from/to such one but IIRC > that at least lacked bitmap_first_set_bit). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK for trunk? So it turns out that while the bitmap data structure isn't optimal the issue with the testcase is that we end up with a full-set-universe for the SSA worklist mostly because we are queueing PHIs via uses on backedges that are not yet executable (so we'd re-simulate the PHI anyway once that became excutable - no need to set the individual bits). So I'm testing the following instead. Bootstrap & regtest running on x86_64-unknown-linux-gnu. Richard. 2018-09-20 Richard Biener PR tree-optimization/63155 * tree-ssa-propagate.c (add_ssa_edge): Avoid adding PHIs to the worklist when the edge of the respective argument isn't executable. Index: gcc/tree-ssa-propagate.c === --- gcc/tree-ssa-propagate.c(revision 264438) +++ gcc/tree-ssa-propagate.c(working copy) @@ -168,10 +168,18 @@ add_ssa_edge (tree var) FOR_EACH_IMM_USE_FAST (use_p, iter, var) { gimple *use_stmt = USE_STMT (use_p); + basic_block use_bb = gimple_bb (use_stmt); /* If we did not yet simulate the block wait for this to happen and do not add the stmt to the SSA edge worklist. */ - if (! (gimple_bb (use_stmt)->flags & BB_VISITED)) + if (! (use_bb->flags & BB_VISITED)) + continue; + + /* If this is a use on a not yet executable edge do not bother to +queue it. */ + if (gimple_code (use_stmt) == GIMPLE_PHI + && !(EDGE_PRED (use_bb, PHI_ARG_INDEX_FROM_USE (use_p))->flags + & EDGE_EXECUTABLE)) continue; if (prop_simulate_again_p (use_stmt)
Re: [PATCH][RFC] Fix PR63155 (some more)
On Wed, 19 Sep 2018, Steven Bosscher wrote: > On Wed, Sep 19, 2018 at 3:06 PM Richard Biener wrote: > > If we'd only had a O(log n) search sparse bitmap implementation ... > > (Steven posted patches to switch bitmap from/to such one but IIRC > > that at least lacked bitmap_first_set_bit). > > But bitmap_first_set_bit would be easy to implement. Just take the > left-most node of the splay tree. > > Actually all bit-tests would be easy to implement. It's only > enumeration and set operations on the tree-views that would be > complicated fluff (easier to "switch views" than re-implement). > > Impressive that you remember >5yr old patches like that ;-) ;) Well, it's still useful (but obviously doesn't apply). Not sure iff the worst-case behavior of splay-trees makes it a pointless exercise though ;) Richard.
Re: [PATCH][RFC] Fix PR63155 (some more)
On Wed, Sep 19, 2018 at 3:06 PM Richard Biener wrote: > If we'd only had a O(log n) search sparse bitmap implementation ... > (Steven posted patches to switch bitmap from/to such one but IIRC > that at least lacked bitmap_first_set_bit). But bitmap_first_set_bit would be easy to implement. Just take the left-most node of the splay tree. Actually all bit-tests would be easy to implement. It's only enumeration and set operations on the tree-views that would be complicated fluff (easier to "switch views" than re-implement). Impressive that you remember >5yr old patches like that ;-) Ciao! Steven
Re: [PATCH][RFC] Fix PR63155 (some more)
On Wed, 19 Sep 2018, Jakub Jelinek wrote: > On Wed, Sep 19, 2018 at 03:06:30PM +0200, Richard Biener wrote: > > > > The second testcase in the above PR runs into our O(N) bitmap element > > search limitation and spends 8s (60%) of the compile-time in the SSA > > propagator > > engine (when optimizing). The patch improves that to 0.9s (15%). For the > > first testcase it isn't that bad but still the patch improves CCP from 36% > > to > > 14%. > > > > The "solution" is to use sbitmap instead of a bitmap to avoid > > the linearity when doing add_ssa_edge. We pay for that (but not > > actually with the testcases) with a linear-time bitmap_first_set_bit > > in process_ssa_edge_worklist. I do not (yet...) have a testcase > > Perhaps you could remember the first set bit number in an integral variable > next to the bitmap. On bitmap_set_bit this would be just a comparison + > conditional update, in simulate_stmt it would be again conditional on > clearing the first set bit and in that case it could start from that bit > onwards rather than from the beginning (EXECUTE_IF_SET_IN_BITMAP), similarly > in process_ssa_edge_worklist (non-conditional in that case, we are always > clearing the first bit). Or instead of tracking exact first bit track > it lazily, i.e. just remember the smallest bitnum we've set and in > process_ssa_edge_worklist don't use bitmap_first_set_bit, but walk from > that minimum using EXECUTE_IF_SET_IN_BITMAP and update the minimum to > whatever we've found. I thought about that but as you say it's easily invalidated and we can only optimize the searching start point. So it felt somewhat like a hack ;) I can still do that if you think that otherwise using an sbitmap is fine (I'm not 100% convinced about that yet). > Similarly, empty_p can be done linearly by keeping on the side the count of > set bits and updating it on set_bit when previously not set, as well on > clear_bit. I elided empty_p by re-using the fact that bitmap_first_set_bit already has to do the walk (and may fail). Of course using a on-the-side count might be more optimal (I was wondering why sbitmap itself didn't have that). I wonder if reviving Stevens [patch][RFC] bitmaps as lists *or* trees makes more sense so we can turn this kind of random-access bitmaps to tree view. Richard.
Re: [PATCH][RFC] Fix PR63155 (some more)
On Wed, Sep 19, 2018 at 03:06:30PM +0200, Richard Biener wrote: > > The second testcase in the above PR runs into our O(N) bitmap element > search limitation and spends 8s (60%) of the compile-time in the SSA > propagator > engine (when optimizing). The patch improves that to 0.9s (15%). For the > first testcase it isn't that bad but still the patch improves CCP from 36% to > 14%. > > The "solution" is to use sbitmap instead of a bitmap to avoid > the linearity when doing add_ssa_edge. We pay for that (but not > actually with the testcases) with a linear-time bitmap_first_set_bit > in process_ssa_edge_worklist. I do not (yet...) have a testcase Perhaps you could remember the first set bit number in an integral variable next to the bitmap. On bitmap_set_bit this would be just a comparison + conditional update, in simulate_stmt it would be again conditional on clearing the first set bit and in that case it could start from that bit onwards rather than from the beginning (EXECUTE_IF_SET_IN_BITMAP), similarly in process_ssa_edge_worklist (non-conditional in that case, we are always clearing the first bit). Or instead of tracking exact first bit track it lazily, i.e. just remember the smallest bitnum we've set and in process_ssa_edge_worklist don't use bitmap_first_set_bit, but walk from that minimum using EXECUTE_IF_SET_IN_BITMAP and update the minimum to whatever we've found. Similarly, empty_p can be done linearly by keeping on the side the count of set bits and updating it on set_bit when previously not set, as well on clear_bit. Jakub
Re: [PATCH][RFC] Fix PR63155
On Wed, 18 Mar 2015, Richard Biener wrote: > On March 18, 2015 4:59:30 PM GMT+01:00, Alan Lawrence > wrote: > >Following this patch (r221318), we're seeing what appears to be a > >miscompile of > >glibc on AArch64. This causes quite a bunch of tests to fail, segfaults > >etc., if > >LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs without > >(same > >glibc sources). We are still working on a reduced testcase, but this is > >proving > >difficult... > > The patch was supposed to end up in the very same code generation. > Thus it's enough to identify the problematic source file and send me > preprocessed source and cc1 invocation so I can investigate with a > cross. Ok - I _think_ that live_track_process_def doesn't handle partitions with more than one member. It's too closely tied to SSA. Thus the whole idea of iterating the coalescing doesn't work without fixing that. Like conservatively with Index: gcc/tree-ssa-coalesce.c === --- gcc/tree-ssa-coalesce.c (revision 221490) +++ gcc/tree-ssa-coalesce.c (working copy) @@ -724,7 +724,10 @@ live_track_clear_var (live_track_p ptr, int p; p = var_to_partition (ptr->map, var); - if (p != NO_PARTITION) + if (p != NO_PARTITION + /* ??? If this partition has more than one member don't do anything. */ + && ptr->map->var_partition->elements[SSA_NAME_VERSION + (partition_to_var (ptr->map, p))].class_count == 1) live_track_remove_partition (ptr, p); } @@ -780,8 +783,11 @@ live_track_process_def (live_track_p ptr if (p == NO_PARTITION) return; - /* Clear the liveness bit. */ - live_track_remove_partition (ptr, p); + /* Clear the liveness bit. + ??? If this partition has more than one member we can't do that. */ + if (ptr->map->var_partition->elements[SSA_NAME_VERSION + (partition_to_var (ptr->map, p))].class_count == 1) +live_track_remove_partition (ptr, p); /* If the bitmap isn't empty now, conflicts need to be added. */ root = basevar_index (ptr->map, p); @@ -789,7 +795,8 @@ live_track_process_def (live_track_p ptr { b = ptr->live_base_partitions[root]; EXECUTE_IF_SET_IN_BITMAP (b, 0, x, bi) -ssa_conflicts_add (graph, p, x); + if (x != (unsigned) p) + ssa_conflicts_add (graph, p, x); } } Does that fix it? Given the issue I'll revert the patch. Thanks, Richard. > Thanks, > Richard. > > >--Alan > > > >Richard Biener wrote: > >> On Mon, 9 Mar 2015, Richard Biener wrote: > >> > >>> On Fri, 6 Mar 2015, Jeff Law wrote: > >>> > On 03/06/15 06:16, Richard Biener wrote: > > This fixes PR63155 and reduces the memory usage at -O0 from > >reported > > 10GB (couldn't verify/update on my small box) to 350MB (still > >worse > > compared to 4.8 which needs only 50MB). > > > > It fixes this by no longer computing live info or building a > >conflict > > graph for coalescing of SSA names flowing over abnormal edges > > (which needs to succeed). > > > > Of course this also removes verification that this coalescing is > >valid > > (but computing this has quadratic cost). With this it turns > > ICEs into miscompiles. > > > > We could restrict verifying that we can perform abnormal > >coalescing > > to ENABLE_CHECKING (and I've wanted a verifier pass that can > >verify > > this multiple times to be able to catch what breaks it and not > >having > > to work back from out-of-SSA ICEing...). > > > > So any opinion on this patch welcome. > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > Ok for trunk? ;) > > > > Thanks, > > Richard. > > > > 2015-03-06 Richard Biener > > > > PR middle-end/63155 > > * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being > >NULL. > > (coalesce_partitions): Split out abnormal coalescing to ... > > (perform_abnormal_coalescing): ... this function. > > (coalesce_ssa_name): Perform abnormal coalescing without > >computing > > live/conflict. > I'd personally like to keep the checking when ENABLE_CHECKING. > > I haven't followed this discussion real closely, but I wonder if > >some kind of > blocking approach would work without blowing up the memory > >consumption. > There's no inherent reason why we have to coalesce everything at > >the same > time. We can use a blocking factor and do coalescing on some N > >number of > SSA_NAMEs at a time. > >>> Yes, that's possible at (quite?) some compile-time cost. Note that > >we > >>> can't really guarantee that the resulting live/conflict problems > >shrink > >>> significantly enough without sorting the coalesces in a different > >way > >>> (not after important coalesces but after their basevars). > >>> > I suspect we can select an N that ultim
Re: [PATCH][RFC] Fix PR63155
On Wed, Mar 18, 2015 at 8:59 AM, Alan Lawrence wrote: > Following this patch (r221318), we're seeing what appears to be a miscompile > of glibc on AArch64. This causes quite a bunch of tests to fail, segfaults > etc., if LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs > without (same glibc sources). We are still working on a reduced testcase, > but this is proving difficult... I was just debugging the same issue. From what I can tell vfprintf is being miscompiled. An example code which shows the issue but not self-contained yet as I thought it was related to varargs but I suspect it is more related to the computed gotos inside vprintf in glibc. #include #include int g(int a, va_list ap) { int b; b = va_arg(ap, int); if (b != SECOND) __builtin_abort (); if (a != FIRST) __builtin_abort (); printf("first: %d.\n", a); printf("second: %d.\n", b); } int f(int a, ...) { int b; va_list ap; va_start(ap, a); g(a, ap); va_end (ap); return 0; } int main(void) { return f(2, FIRST, SECOND); } --- CUT --- With the new glibc I get: first: 4194304. second: 1. But with the old one I get : first: 16. second: 32. Thanks, Andrew Pinski > > --Alan > > > Richard Biener wrote: >> >> On Mon, 9 Mar 2015, Richard Biener wrote: >> >>> On Fri, 6 Mar 2015, Jeff Law wrote: >>> On 03/06/15 06:16, Richard Biener wrote: > > This fixes PR63155 and reduces the memory usage at -O0 from reported > 10GB (couldn't verify/update on my small box) to 350MB (still worse > compared to 4.8 which needs only 50MB). > > It fixes this by no longer computing live info or building a conflict > graph for coalescing of SSA names flowing over abnormal edges > (which needs to succeed). > > Of course this also removes verification that this coalescing is valid > (but computing this has quadratic cost). With this it turns > ICEs into miscompiles. > > We could restrict verifying that we can perform abnormal coalescing > to ENABLE_CHECKING (and I've wanted a verifier pass that can verify > this multiple times to be able to catch what breaks it and not having > to work back from out-of-SSA ICEing...). > > So any opinion on this patch welcome. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > Ok for trunk? ;) > > Thanks, > Richard. > > 2015-03-06 Richard Biener > > PR middle-end/63155 > * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. > (coalesce_partitions): Split out abnormal coalescing to ... > (perform_abnormal_coalescing): ... this function. > (coalesce_ssa_name): Perform abnormal coalescing without computing > live/conflict. I'd personally like to keep the checking when ENABLE_CHECKING. I haven't followed this discussion real closely, but I wonder if some kind of blocking approach would work without blowing up the memory consumption. There's no inherent reason why we have to coalesce everything at the same time. We can use a blocking factor and do coalescing on some N number of SSA_NAMEs at a time. >>> >>> Yes, that's possible at (quite?) some compile-time cost. Note that we >>> can't really guarantee that the resulting live/conflict problems shrink >>> significantly enough without sorting the coalesces in a different way >>> (not after important coalesces but after their basevars). >>> I suspect we can select an N that ultimately degenerates into the current "do everything together" for the common case and only has to iterate over blocks of SSA_NAMEs in extreme cases. >>> >>> I've attached a patch to the PR that adds such a number N after which we >>> simply stop coalescing. Of course that doesn't work for abnormals where >>> we _must_ coalesce. Thus this patch ... >>> >>> As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder >>> if we should make some of the checking we do a runtime choice rather >>> than a compile-time one...). I'll update the patch. >> >> >> Ok, like the following which adds a verify_ssa_coalescing () function >> (which could in theory be called from IL verification like verify_ssa) >> and calls it when ENABLE_CHECKING is defined. >> >> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >> >> It didn't look appropriate for this stage to implement virtual operand >> verification. >> >> Ok this way? >> >> Thanks, >> Richard. >> >> 2015-03-06 Richard Biener >> >> PR middle-end/63155 >> * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare. >> * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. >> (coalesce_partitions): Call verify_ssa_coalescing if >> ENABLE_CHECKING. >> Split out abnormal coalescing to ... >> (perform_abnormal_coalescing): ... this function. >> (coalesce_ssa_name): Perform abnormal coalescing
Re: [PATCH][RFC] Fix PR63155
On March 18, 2015 4:59:30 PM GMT+01:00, Alan Lawrence wrote: >Following this patch (r221318), we're seeing what appears to be a >miscompile of >glibc on AArch64. This causes quite a bunch of tests to fail, segfaults >etc., if >LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs without >(same >glibc sources). We are still working on a reduced testcase, but this is >proving >difficult... The patch was supposed to end up in the very same code generation. Thus it's enough to identify the problematic source file and send me preprocessed source and cc1 invocation so I can investigate with a cross. Thanks, Richard. >--Alan > >Richard Biener wrote: >> On Mon, 9 Mar 2015, Richard Biener wrote: >> >>> On Fri, 6 Mar 2015, Jeff Law wrote: >>> On 03/06/15 06:16, Richard Biener wrote: > This fixes PR63155 and reduces the memory usage at -O0 from >reported > 10GB (couldn't verify/update on my small box) to 350MB (still >worse > compared to 4.8 which needs only 50MB). > > It fixes this by no longer computing live info or building a >conflict > graph for coalescing of SSA names flowing over abnormal edges > (which needs to succeed). > > Of course this also removes verification that this coalescing is >valid > (but computing this has quadratic cost). With this it turns > ICEs into miscompiles. > > We could restrict verifying that we can perform abnormal >coalescing > to ENABLE_CHECKING (and I've wanted a verifier pass that can >verify > this multiple times to be able to catch what breaks it and not >having > to work back from out-of-SSA ICEing...). > > So any opinion on this patch welcome. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > Ok for trunk? ;) > > Thanks, > Richard. > > 2015-03-06 Richard Biener > > PR middle-end/63155 > * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being >NULL. > (coalesce_partitions): Split out abnormal coalescing to ... > (perform_abnormal_coalescing): ... this function. > (coalesce_ssa_name): Perform abnormal coalescing without >computing > live/conflict. I'd personally like to keep the checking when ENABLE_CHECKING. I haven't followed this discussion real closely, but I wonder if >some kind of blocking approach would work without blowing up the memory >consumption. There's no inherent reason why we have to coalesce everything at >the same time. We can use a blocking factor and do coalescing on some N >number of SSA_NAMEs at a time. >>> Yes, that's possible at (quite?) some compile-time cost. Note that >we >>> can't really guarantee that the resulting live/conflict problems >shrink >>> significantly enough without sorting the coalesces in a different >way >>> (not after important coalesces but after their basevars). >>> I suspect we can select an N that ultimately degenerates into the >current "do everything together" for the common case and only has to iterate >over blocks of SSA_NAMEs in extreme cases. >>> I've attached a patch to the PR that adds such a number N after >which we >>> simply stop coalescing. Of course that doesn't work for abnormals >where >>> we _must_ coalesce. Thus this patch ... >>> >>> As said, it's simple to keep the checking with ENABLE_CHECKING (I >wonder >>> if we should make some of the checking we do a runtime choice rather >>> than a compile-time one...). I'll update the patch. >> >> Ok, like the following which adds a verify_ssa_coalescing () function >> (which could in theory be called from IL verification like >verify_ssa) >> and calls it when ENABLE_CHECKING is defined. >> >> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >> >> It didn't look appropriate for this stage to implement virtual >operand >> verification. >> >> Ok this way? >> >> Thanks, >> Richard. >> >> 2015-03-06 Richard Biener >> >> PR middle-end/63155 >> * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare. >> * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being >NULL. >> (coalesce_partitions): Call verify_ssa_coalescing if >ENABLE_CHECKING. >> Split out abnormal coalescing to ... >> (perform_abnormal_coalescing): ... this function. >> (coalesce_ssa_name): Perform abnormal coalescing without >computing >> live/conflict. >> (verify_ssa_coalescing_worker): New function. >> (verify_ssa_coalescing): Likewise. >> >> Index: gcc/tree-ssa-coalesce.c >> === >> *** gcc/tree-ssa-coalesce.c (revision 221278) >> --- gcc/tree-ssa-coalesce.c (working copy) >> *** create_outofssa_var_map (coalesce_list_p >> *** 1121,1128 >> >> >> /* Attempt to coalesce ssa versions X and Y together using the >partition >> !mapping in MAP and checking conflicts in GR
Re: [PATCH][RFC] Fix PR63155
Following this patch (r221318), we're seeing what appears to be a miscompile of glibc on AArch64. This causes quite a bunch of tests to fail, segfaults etc., if LD_LIBRARY_PATH leads to a libc.so.6 built with that patch vs without (same glibc sources). We are still working on a reduced testcase, but this is proving difficult... --Alan Richard Biener wrote: On Mon, 9 Mar 2015, Richard Biener wrote: On Fri, 6 Mar 2015, Jeff Law wrote: On 03/06/15 06:16, Richard Biener wrote: This fixes PR63155 and reduces the memory usage at -O0 from reported 10GB (couldn't verify/update on my small box) to 350MB (still worse compared to 4.8 which needs only 50MB). It fixes this by no longer computing live info or building a conflict graph for coalescing of SSA names flowing over abnormal edges (which needs to succeed). Of course this also removes verification that this coalescing is valid (but computing this has quadratic cost). With this it turns ICEs into miscompiles. We could restrict verifying that we can perform abnormal coalescing to ENABLE_CHECKING (and I've wanted a verifier pass that can verify this multiple times to be able to catch what breaks it and not having to work back from out-of-SSA ICEing...). So any opinion on this patch welcome. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Ok for trunk? ;) Thanks, Richard. 2015-03-06 Richard Biener PR middle-end/63155 * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. (coalesce_partitions): Split out abnormal coalescing to ... (perform_abnormal_coalescing): ... this function. (coalesce_ssa_name): Perform abnormal coalescing without computing live/conflict. I'd personally like to keep the checking when ENABLE_CHECKING. I haven't followed this discussion real closely, but I wonder if some kind of blocking approach would work without blowing up the memory consumption. There's no inherent reason why we have to coalesce everything at the same time. We can use a blocking factor and do coalescing on some N number of SSA_NAMEs at a time. Yes, that's possible at (quite?) some compile-time cost. Note that we can't really guarantee that the resulting live/conflict problems shrink significantly enough without sorting the coalesces in a different way (not after important coalesces but after their basevars). I suspect we can select an N that ultimately degenerates into the current "do everything together" for the common case and only has to iterate over blocks of SSA_NAMEs in extreme cases. I've attached a patch to the PR that adds such a number N after which we simply stop coalescing. Of course that doesn't work for abnormals where we _must_ coalesce. Thus this patch ... As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder if we should make some of the checking we do a runtime choice rather than a compile-time one...). I'll update the patch. Ok, like the following which adds a verify_ssa_coalescing () function (which could in theory be called from IL verification like verify_ssa) and calls it when ENABLE_CHECKING is defined. Bootstrap & regtest running on x86_64-unknown-linux-gnu. It didn't look appropriate for this stage to implement virtual operand verification. Ok this way? Thanks, Richard. 2015-03-06 Richard Biener PR middle-end/63155 * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare. * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. (coalesce_partitions): Call verify_ssa_coalescing if ENABLE_CHECKING. Split out abnormal coalescing to ... (perform_abnormal_coalescing): ... this function. (coalesce_ssa_name): Perform abnormal coalescing without computing live/conflict. (verify_ssa_coalescing_worker): New function. (verify_ssa_coalescing): Likewise. Index: gcc/tree-ssa-coalesce.c === *** gcc/tree-ssa-coalesce.c (revision 221278) --- gcc/tree-ssa-coalesce.c (working copy) *** create_outofssa_var_map (coalesce_list_p *** 1121,1128 /* Attempt to coalesce ssa versions X and Y together using the partition !mapping in MAP and checking conflicts in GRAPH. Output any debug info to !DEBUG, if it is nun-NULL. */ static inline bool attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y, --- 1121,1128 /* Attempt to coalesce ssa versions X and Y together using the partition !mapping in MAP and checking conflicts in GRAPH if not NULL. !Output any debug info to DEBUG, if it is nun-NULL. */ static inline bool attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y, *** attempt_coalesce (var_map map, ssa_confl *** 1154,1160 fprintf (debug, " [map: %d, %d] ", p1, p2); ! if (!ssa_conflicts_test_p (graph, p1, p2)) { var1 = partition_to_var (map, p1); var2 = partition_to_var (map, p2); --- 11
Re: [PATCH][RFC] Fix PR63155
On 03/09/15 07:01, Richard Biener wrote: Ok, like the following which adds a verify_ssa_coalescing () function (which could in theory be called from IL verification like verify_ssa) and calls it when ENABLE_CHECKING is defined. Bootstrap & regtest running on x86_64-unknown-linux-gnu. It didn't look appropriate for this stage to implement virtual operand verification. Ok this way? Thanks, Richard. 2015-03-06 Richard Biener PR middle-end/63155 * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare. * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. (coalesce_partitions): Call verify_ssa_coalescing if ENABLE_CHECKING. Split out abnormal coalescing to ... (perform_abnormal_coalescing): ... this function. (coalesce_ssa_name): Perform abnormal coalescing without computing live/conflict. (verify_ssa_coalescing_worker): New function. (verify_ssa_coalescing): Likewise. Looks good to me. jeff
Re: [PATCH][RFC] Fix PR63155
On 03/09/15 03:42, Richard Biener wrote: On Fri, 6 Mar 2015, Jeff Law wrote: On 03/06/15 06:16, Richard Biener wrote: This fixes PR63155 and reduces the memory usage at -O0 from reported 10GB (couldn't verify/update on my small box) to 350MB (still worse compared to 4.8 which needs only 50MB). It fixes this by no longer computing live info or building a conflict graph for coalescing of SSA names flowing over abnormal edges (which needs to succeed). Of course this also removes verification that this coalescing is valid (but computing this has quadratic cost). With this it turns ICEs into miscompiles. We could restrict verifying that we can perform abnormal coalescing to ENABLE_CHECKING (and I've wanted a verifier pass that can verify this multiple times to be able to catch what breaks it and not having to work back from out-of-SSA ICEing...). So any opinion on this patch welcome. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Ok for trunk? ;) Thanks, Richard. 2015-03-06 Richard Biener PR middle-end/63155 * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. (coalesce_partitions): Split out abnormal coalescing to ... (perform_abnormal_coalescing): ... this function. (coalesce_ssa_name): Perform abnormal coalescing without computing live/conflict. I'd personally like to keep the checking when ENABLE_CHECKING. I haven't followed this discussion real closely, but I wonder if some kind of blocking approach would work without blowing up the memory consumption. There's no inherent reason why we have to coalesce everything at the same time. We can use a blocking factor and do coalescing on some N number of SSA_NAMEs at a time. Yes, that's possible at (quite?) some compile-time cost. Note that we can't really guarantee that the resulting live/conflict problems shrink significantly enough without sorting the coalesces in a different way (not after important coalesces but after their basevars). Yea, it's a class time/space tradeoff. I guess it comes down to how much compile-time pain we'll take for reducing memory usage. It may also be the case that some blocking factors are actually faster than doing everything at once, even for more common input graph sizes. I actually ran into this when looking at the liveness computations for into-ssa eons ago. We were computing liveness in parallel, but a blocking of 1 object is actually best: https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01301.html Jeff
Re: [PATCH][RFC] Fix PR63155
On Mon, 9 Mar 2015, Richard Biener wrote: > On Fri, 6 Mar 2015, Jeff Law wrote: > > > On 03/06/15 06:16, Richard Biener wrote: > > > > > > This fixes PR63155 and reduces the memory usage at -O0 from reported > > > 10GB (couldn't verify/update on my small box) to 350MB (still worse > > > compared to 4.8 which needs only 50MB). > > > > > > It fixes this by no longer computing live info or building a conflict > > > graph for coalescing of SSA names flowing over abnormal edges > > > (which needs to succeed). > > > > > > Of course this also removes verification that this coalescing is valid > > > (but computing this has quadratic cost). With this it turns > > > ICEs into miscompiles. > > > > > > We could restrict verifying that we can perform abnormal coalescing > > > to ENABLE_CHECKING (and I've wanted a verifier pass that can verify > > > this multiple times to be able to catch what breaks it and not having > > > to work back from out-of-SSA ICEing...). > > > > > > So any opinion on this patch welcome. > > > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > > > Ok for trunk? ;) > > > > > > Thanks, > > > Richard. > > > > > > 2015-03-06 Richard Biener > > > > > > PR middle-end/63155 > > > * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. > > > (coalesce_partitions): Split out abnormal coalescing to ... > > > (perform_abnormal_coalescing): ... this function. > > > (coalesce_ssa_name): Perform abnormal coalescing without computing > > > live/conflict. > > I'd personally like to keep the checking when ENABLE_CHECKING. > > > > I haven't followed this discussion real closely, but I wonder if some kind > > of > > blocking approach would work without blowing up the memory consumption. > > There's no inherent reason why we have to coalesce everything at the same > > time. We can use a blocking factor and do coalescing on some N number of > > SSA_NAMEs at a time. > > Yes, that's possible at (quite?) some compile-time cost. Note that we > can't really guarantee that the resulting live/conflict problems shrink > significantly enough without sorting the coalesces in a different way > (not after important coalesces but after their basevars). > > > I suspect we can select an N that ultimately degenerates into the current > > "do > > everything together" for the common case and only has to iterate over blocks > > of SSA_NAMEs in extreme cases. > > I've attached a patch to the PR that adds such a number N after which we > simply stop coalescing. Of course that doesn't work for abnormals where > we _must_ coalesce. Thus this patch ... > > As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder > if we should make some of the checking we do a runtime choice rather > than a compile-time one...). I'll update the patch. Ok, like the following which adds a verify_ssa_coalescing () function (which could in theory be called from IL verification like verify_ssa) and calls it when ENABLE_CHECKING is defined. Bootstrap & regtest running on x86_64-unknown-linux-gnu. It didn't look appropriate for this stage to implement virtual operand verification. Ok this way? Thanks, Richard. 2015-03-06 Richard Biener PR middle-end/63155 * tree-ssa-coalesce.h (verify_ssa_coalescing): Declare. * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. (coalesce_partitions): Call verify_ssa_coalescing if ENABLE_CHECKING. Split out abnormal coalescing to ... (perform_abnormal_coalescing): ... this function. (coalesce_ssa_name): Perform abnormal coalescing without computing live/conflict. (verify_ssa_coalescing_worker): New function. (verify_ssa_coalescing): Likewise. Index: gcc/tree-ssa-coalesce.c === *** gcc/tree-ssa-coalesce.c (revision 221278) --- gcc/tree-ssa-coalesce.c (working copy) *** create_outofssa_var_map (coalesce_list_p *** 1121,1128 /* Attempt to coalesce ssa versions X and Y together using the partition !mapping in MAP and checking conflicts in GRAPH. Output any debug info to !DEBUG, if it is nun-NULL. */ static inline bool attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y, --- 1121,1128 /* Attempt to coalesce ssa versions X and Y together using the partition !mapping in MAP and checking conflicts in GRAPH if not NULL. !Output any debug info to DEBUG, if it is nun-NULL. */ static inline bool attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y, *** attempt_coalesce (var_map map, ssa_confl *** 1154,1160 fprintf (debug, " [map: %d, %d] ", p1, p2); ! if (!ssa_conflicts_test_p (graph, p1, p2)) { var1 = partition_to_var (map, p1); var2 = partition_to_var (map, p2); --- 1154,1161 fprintf (debug, " [map: %d, %d] ", p1, p
Re: [PATCH][RFC] Fix PR63155
On Fri, 6 Mar 2015, Jeff Law wrote: > On 03/06/15 06:16, Richard Biener wrote: > > > > This fixes PR63155 and reduces the memory usage at -O0 from reported > > 10GB (couldn't verify/update on my small box) to 350MB (still worse > > compared to 4.8 which needs only 50MB). > > > > It fixes this by no longer computing live info or building a conflict > > graph for coalescing of SSA names flowing over abnormal edges > > (which needs to succeed). > > > > Of course this also removes verification that this coalescing is valid > > (but computing this has quadratic cost). With this it turns > > ICEs into miscompiles. > > > > We could restrict verifying that we can perform abnormal coalescing > > to ENABLE_CHECKING (and I've wanted a verifier pass that can verify > > this multiple times to be able to catch what breaks it and not having > > to work back from out-of-SSA ICEing...). > > > > So any opinion on this patch welcome. > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > Ok for trunk? ;) > > > > Thanks, > > Richard. > > > > 2015-03-06 Richard Biener > > > > PR middle-end/63155 > > * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. > > (coalesce_partitions): Split out abnormal coalescing to ... > > (perform_abnormal_coalescing): ... this function. > > (coalesce_ssa_name): Perform abnormal coalescing without computing > > live/conflict. > I'd personally like to keep the checking when ENABLE_CHECKING. > > I haven't followed this discussion real closely, but I wonder if some kind of > blocking approach would work without blowing up the memory consumption. > There's no inherent reason why we have to coalesce everything at the same > time. We can use a blocking factor and do coalescing on some N number of > SSA_NAMEs at a time. Yes, that's possible at (quite?) some compile-time cost. Note that we can't really guarantee that the resulting live/conflict problems shrink significantly enough without sorting the coalesces in a different way (not after important coalesces but after their basevars). > I suspect we can select an N that ultimately degenerates into the current "do > everything together" for the common case and only has to iterate over blocks > of SSA_NAMEs in extreme cases. I've attached a patch to the PR that adds such a number N after which we simply stop coalescing. Of course that doesn't work for abnormals where we _must_ coalesce. Thus this patch ... As said, it's simple to keep the checking with ENABLE_CHECKING (I wonder if we should make some of the checking we do a runtime choice rather than a compile-time one...). I'll update the patch. Thanks, Richard.
Re: [PATCH][RFC] Fix PR63155
On 03/06/15 06:16, Richard Biener wrote: This fixes PR63155 and reduces the memory usage at -O0 from reported 10GB (couldn't verify/update on my small box) to 350MB (still worse compared to 4.8 which needs only 50MB). It fixes this by no longer computing live info or building a conflict graph for coalescing of SSA names flowing over abnormal edges (which needs to succeed). Of course this also removes verification that this coalescing is valid (but computing this has quadratic cost). With this it turns ICEs into miscompiles. We could restrict verifying that we can perform abnormal coalescing to ENABLE_CHECKING (and I've wanted a verifier pass that can verify this multiple times to be able to catch what breaks it and not having to work back from out-of-SSA ICEing...). So any opinion on this patch welcome. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Ok for trunk? ;) Thanks, Richard. 2015-03-06 Richard Biener PR middle-end/63155 * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL. (coalesce_partitions): Split out abnormal coalescing to ... (perform_abnormal_coalescing): ... this function. (coalesce_ssa_name): Perform abnormal coalescing without computing live/conflict. I'd personally like to keep the checking when ENABLE_CHECKING. I haven't followed this discussion real closely, but I wonder if some kind of blocking approach would work without blowing up the memory consumption. There's no inherent reason why we have to coalesce everything at the same time. We can use a blocking factor and do coalescing on some N number of SSA_NAMEs at a time. I suspect we can select an N that ultimately degenerates into the current "do everything together" for the common case and only has to iterate over blocks of SSA_NAMEs in extreme cases. Jeff