[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #18 from Jakub Jelinek --- Author: jakub Date: Fri Mar 28 10:25:34 2014 New Revision: 208893 URL: http://gcc.gnu.org/viewcvs?rev=208893&root=gcc&view=rev Log: PR ipa/60315 * g++.dg/torture/pr60315.C: Add -std=c++11 to dg-options. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/torture/pr60315.C
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #17 from rguenther at suse dot de --- On March 26, 2014 10:58:18 PM CET, hubicka at ucw dot cz wrote: >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 > >--- Comment #16 from Jan Hubicka --- >> forwprop would do that, but the enum is unsigned int while the >> switch value is int and thus simplify_gimple_switch bails out >> because the conversion is not value-preserving. >> >> So the frontend would need to be changed here or we need to >> "complicate" the transform by not looking at the type of >> the existing switch argument but instead by looking at the >> actual switch label values to see if their value would be >> preserved. But yes, that enum -> int conversion asked for >> by the C++ standard seems to be common that this should be >> worth the trouble. > >Yep, it seems that the "complicate" transform is actually the most >generic >thing to do. (we won't need to modify all FE's and we will likely get >more >simplifications done) Shall I try to dig into it or you know how to do >that >better? I've posted a patch but it causes some regressions I need to investigate.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #16 from Jan Hubicka --- > forwprop would do that, but the enum is unsigned int while the > switch value is int and thus simplify_gimple_switch bails out > because the conversion is not value-preserving. > > So the frontend would need to be changed here or we need to > "complicate" the transform by not looking at the type of > the existing switch argument but instead by looking at the > actual switch label values to see if their value would be > preserved. But yes, that enum -> int conversion asked for > by the C++ standard seems to be common that this should be > worth the trouble. Yep, it seems that the "complicate" transform is actually the most generic thing to do. (we won't need to modify all FE's and we will likely get more simplifications done) Shall I try to dig into it or you know how to do that better?
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #15 from rguenther at suse dot de --- On Wed, 26 Mar 2014, hubicka at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 > > --- Comment #14 from Jan Hubicka --- > The compile time hog issue is fixed now. We still fix the predicates for > switch statement (to get pass NOP_EXPR) since it seems very common pattern. > Richard: I suppose we can't fold away the NOP_EXPR easily earlier? forwprop would do that, but the enum is unsigned int while the switch value is int and thus simplify_gimple_switch bails out because the conversion is not value-preserving. So the frontend would need to be changed here or we need to "complicate" the transform by not looking at the type of the existing switch argument but instead by looking at the actual switch label values to see if their value would be preserved. But yes, that enum -> int conversion asked for by the C++ standard seems to be common that this should be worth the trouble.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #14 from Jan Hubicka --- The compile time hog issue is fixed now. We still fix the predicates for switch statement (to get pass NOP_EXPR) since it seems very common pattern. Richard: I suppose we can't fold away the NOP_EXPR easily earlier?
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #13 from Jan Hubicka --- Author: hubicka Date: Wed Mar 26 02:11:57 2014 New Revision: 208831 URL: http://gcc.gnu.org/viewcvs?rev=208831&root=gcc&view=rev Log: PR ipa/60315 * cif-code.def (UNREACHABLE) New code. * ipa-inline.c (inline_small_functions): Skip edges to __builtlin_unreachable. (estimate_edge_growth): Allow edges to __builtlin_unreachable. * ipa-inline-analysis.c (edge_set_predicate): Redirect edges with false predicate to __bulitin_unreachable. (set_cond_stmt_execution_predicate): Fix issue when invert_tree_comparison returns ERROR_MARK. * ipa-pure-const.c (propagate_pure_const, propagate_nothrow): Do not propagate to inline clones. * cgraph.c (verify_edge_corresponds_to_fndecl): Allow redirection to unreachable. * ipa-cp.c (create_specialized_node): Be ready for new node to appear. * cgraphclones.c (cgraph_clone_node): If call destination is already ureachable, do not redirect it back. * tree-inline.c (fold_marked_statements): Hanlde calls becoming unreachable. Added: trunk/gcc/testsuite/g++.dg/torture/pr60315.C Modified: trunk/gcc/ChangeLog trunk/gcc/cgraph.c trunk/gcc/cgraphclones.c trunk/gcc/cif-code.def trunk/gcc/ipa-cp.c trunk/gcc/ipa-inline-analysis.c trunk/gcc/ipa-inline.c trunk/gcc/ipa-inline.h trunk/gcc/testsuite/ChangeLog
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #12 from Jan Hubicka --- Created attachment 32442 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32442&action=edit Better patch I am attaching more complete patch. There is quite bad wrong code bug in pure-const that updates declaration because it proves that one of inline clones is pure which does not imply that the offline copy is. The patch also need fix to tree-inline.c that I sent earlier and Richard requested a cleanup for http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00134.html Will do that tomorrow.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #11 from Jan Hubicka --- Created attachment 32439 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32439&action=edit Patch I am testing this patch implements the trick of redirecting call edges to UNREACHABLE. It solves the compile time issues quite well. I wonder why I did not get this idea previously - basically it was always on the TODO list to remove edges corresponding to provably unreachable code, but for that we need bit more support because the known to be dead calls needs to be recorded for verifier and inline-transform machinery. Calls to UNREACHABLE does as well as special purpose markers without needs for changess across the cgraph code.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #10 from Jan Hubicka --- Actually the problem here seems to be that we soon work out that most of edges are never executed, yet we still inlining them. The metrics are not growing then so we take time to hit the limits. I guess with ability to redirect edges to unreachable, we can kill nodes early. BTW the cache is not really intended to help the updates, it only avoids repeated recomputations. This is not really a dataflow problem - we only walk the inline trees that (modulo bugs) should not grow arbitrarily large because of inlining limits. Not seeing that the switch is controlled by parameters sucks indeed. Will add code going thorugh casts.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Jan Hubicka changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #9 from Jan Hubicka --- I will take a look... It seems we manage to incredibly deep inline trees.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #8 from Jakub Jelinek --- BTW, the #c0 testcase regressed with r179083.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #7 from Richard Biener --- Note that we seem to fail to update BB predicates for switch stmts. size:0.00, time:0.00, predicate:(true) size:3.00, time:2.00, predicate:(not inlined) size:2.00, time:2.00, predicate:(op1 changed) size:8.00, time:3.20, predicate:(op1 changed) && (op1 != 2) I cannot interpret the size 2 case (what is "op1 changed"?), but in that case we actually shrink compared to not inlining. As && op1 != 2 makes it more restrictive it's odd that that increases the metrics. As inliner I would inline all the (op1 changed) cases, thus in the above case op1 == 2. We seem to inline fully until hitting the case where only recursive edges are left. Even for only two switch cases we inline 5(!) calls into Test. Ah, the switch isn't handled by the predicates because BB 3 predicate:(op1 != 1) s.1_4 = (int) s_2(D); freq:0.80 size: 0 time: 0 switch (s.1_4) , case 0: , case 1: > it is not unmodified_parm_or_parm_agg_item (). The parameter is unsigned int so the cast is not value-preserving. Of course in general we can't rely on "proper" predicates so we need to avoid exploding reliably.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #6 from Richard Biener --- Btw, the smaller testcase (E4 case commented) shows exactly the same behavior, we just seem to be exponential so only adding E4 makes it "really" bad.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #5 from Richard Biener --- Hmm, ok - it is supposed to only account for the extra call edges in the inlined bodies. The actual issue seems to be Deciding on inlining of small functions. Starting with size 114. Enqueueing calls in Test::Test(Scale) [with Scale scale = (Scale)3u]/14. Enqueueing calls in Test::Test(Scale) [with Scale scale = (Scale)3u]/13. Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 enqueuing call Test::Test(Scale) [with Scale scale = (Scale)3u]/13 -> Test::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)2u]/9 Known to be false: not inlined, op1 != 2, op1 changed size:0 time:0 enqueuing call Test::Test(Scale) [with Scale scale = (Scale)3u]/13 -> Test::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)1u]/5 Known to be false: not inlined, op1 != 1, op1 changed size:0 time:0 ... Considering Test::Test(Scale) [with Scale scale = (Scale)2u]/9 with 27 size to be inlined into Test::Test(Scale) [with Scale scale = (Scale)3u]/13 in t.C:21 Estimated growth after inlined into all is +27 insns. Estimated badness is -1073741827, frequency 0.16. Badness calculation for Test::Test(Scale) [with Scale scale = (Scale)3u]/13 -> Test::Test(Scale) [with Scale scale = (Scale)2u]/10 size growth -3, time 0 inline hints: in_scc declared_inline -1073741827: Growth -3 <= 0 Processing frequency Test::Test(Scale) [with Scale scale = (Scale)2u] Called by Test::Test(Scale) [with Scale scale = (Scale)3u] that is normal or hot Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)0u]/2 Known to be false: not inlined, op1 != 0, op1 changed size:0 time:0 enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 -> Test::Test(Scale) [with Scale scale = (Scale)0u]/3, badness -1073741827 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)1u]/5 Known to be false: not inlined, op1 != 1, op1 changed size:0 time:0 enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 -> Test::Test(Scale) [with Scale scale = (Scale)1u]/6, badness -1073741827 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)2u]/9 Known to be false: not inlined, op1 != 2, op1 changed size:0 time:0 enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 -> Test::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 -> Test::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Inlined into Test::Test(Scale) [with Scale scale = (Scale)3u] which now has time 13 and size 24,net change of -3. New minimal size reached: 111 Estimating body: Test::Test(Scale) [with Scale scale = (Scale)1u]/5 Known to be false: not inlined, op1 != 1, op1 changed size:0 time:0 Considering Test::Test(Scale) [with Scale scale = (Scale)1u]/5 with 27 size to be inlined into Test::Test(Scale) [with Scale scale = (Scale)0u]/2 in t.C:20 Estimated growth after inlined into all is +27 insns. Estimated badness is -1073741827, frequency 0.12. Badness calculation for Test::Test(Scale) [with Scale scale = (Scale)0u]/2 -> Test::Test(Scale) [with Scale scale = (Scale)1u]/6 size growth -3, time 0 inline hints: declared_inline -1073741827: Growth -3 <= 0 ... so we are inlining all over the place but don't really arrive at a point where no further useful inlining is left and inlining never has a growth effect in our theory and we continue to think that we shrink the overall unit by further inlining. Meanwhile the cgraph is full of millions of calls (but estimated to be never reached?): Considering Test::Test(Scale) [with Scale scale = (Scale)0u]/2 with 24 size to be inlined into Test::Test(Scale) [with Scale scale = (Scale)0u]/76 in t.C:19 Estimated growth after inlined into all is +24 insns. Estimated badness is -1073741827, frequency 0.00. Badness calculation for Test::Test(Scale) [with Scale scale =
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #4 from Richard Biener --- When calling do_estimate_edge_size to compute the effect on caller size when inlining an edge we call estimate_node_size_and_time which eventually recurses down to estimate_calls_size_and_time (why!? call edges in the callee are irrelevant when inlining the call into the caller!). Doesn't this just want to add(?) e->call_stmt_size/time? At the moment estimate_calls_size_and_time recurses to estimate_edge_size_and_time ... and I don't see _any_ prevention of running in cgraph cycles here. (and the cache isn't populated before computing an edges size/time is). In fact, Index: gcc/ipa-inline-analysis.c === --- gcc/ipa-inline-analysis.c (revision 207960) +++ gcc/ipa-inline-analysis.c (working copy) @@ -3011,21 +3011,11 @@ estimate_calls_size_and_time (struct cgr struct inline_edge_summary *es = inline_edge_summary (e); if (!es->predicate || evaluate_predicate (es->predicate, possible_truths)) - { - if (e->inline_failed) - { - /* Predicates of calls shall not use NOT_CHANGED codes, -sowe do not need to compute probabilities. */ - estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE, - known_vals, known_binfos, - known_aggs, hints); - } - else - estimate_calls_size_and_time (e->callee, size, time, hints, - possible_truths, - known_vals, known_binfos, - known_aggs); - } + /* Predicates of calls shall not use NOT_CHANGED codes, + sowe do not need to compute probabilities. */ + estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE, +known_vals, known_binfos, +known_aggs, hints); } for (e = node->indirect_calls; e; e = e->next_callee) { fixes this and I cannot make sense of calling estimate_calls_size_and_time for the callee of an edge that we are not going to inline (or that is already inlined? I still find those if (e->inline_failed) checks odd). If it's supposed to account for inline bodies in the caller then we should have updated the inline_summary () of the caller, not have to recurse here - and we _do_ seem to (inline_merge_summary).
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #3 from Richard Biener --- This whole thing updating keys and such should use a proper lattice of per-cgraph and per-edge node sizes/times which can be updated with a less ad-hoc algorithm than the current one which easily and completely explodes ... :/ (and the 'cache' is completely ineffective during the update process)
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Richard Biener changed: What|Removed |Added Blocks||60243 --- Comment #2 from Richard Biener --- We have only 1 billion (!) calls to estimate_calls_size_and_time (and 2 billion recursions in that function - I suppose callgrind/kcachegrind overflowed the entry counter even ...). Also 3 billion calls to evaluate_predicate. Sth is getting seriously out-of-hands here ;) That is, it's ultimately called from do_estimate_edge_size (but only 36 calls here - still a _lot_ for this testcase). Looks related to what I found in PR60243.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Richard Biener changed: What|Removed |Added Keywords||compile-time-hog CC||hubicka at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Target Milestone|--- |4.8.3
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Marc Glisse changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-22 Component|c++ |ipa Summary|template constructor switch |[4.8/4.9 Regression] |optimization|template constructor switch ||optimization Ever confirmed|0 |1 --- Comment #1 from Marc Glisse --- It does finish, eventually, but it spends a lot of time in inline_small_functions... ipa inlining heuristics : 627.09 (96%) usr 0.02 ( 3%) sys 629.01 (96%) wall 69964 kB (27%) ggc