[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-28 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #18 from Jakub Jelinek  ---
Author: jakub
Date: Fri Mar 28 10:25:34 2014
New Revision: 208893

URL: http://gcc.gnu.org/viewcvs?rev=208893&root=gcc&view=rev
Log:
PR ipa/60315
* g++.dg/torture/pr60315.C: Add -std=c++11 to dg-options.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/g++.dg/torture/pr60315.C


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-27 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #17 from rguenther at suse dot de  ---
On March 26, 2014 10:58:18 PM CET, hubicka at ucw dot cz
 wrote:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315
>
>--- Comment #16 from Jan Hubicka  ---
>> forwprop would do that, but the enum is unsigned int while the
>> switch value is int and thus simplify_gimple_switch bails out
>> because the conversion is not value-preserving.
>> 
>> So the frontend would need to be changed here or we need to
>> "complicate" the transform by not looking at the type of
>> the existing switch argument but instead by looking at the
>> actual switch label values to see if their value would be
>> preserved.  But yes, that enum -> int conversion asked for
>> by the C++ standard seems to be common that this should be
>> worth the trouble.
>
>Yep, it seems that the "complicate" transform is actually the most
>generic
>thing to do. (we won't need to modify all FE's and we will likely get
>more
>simplifications done) Shall I try to dig into it or you know how to do
>that
>better?

I've posted a patch but it causes some regressions I need to investigate.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-26 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #16 from Jan Hubicka  ---
> forwprop would do that, but the enum is unsigned int while the
> switch value is int and thus simplify_gimple_switch bails out
> because the conversion is not value-preserving.
> 
> So the frontend would need to be changed here or we need to
> "complicate" the transform by not looking at the type of
> the existing switch argument but instead by looking at the
> actual switch label values to see if their value would be
> preserved.  But yes, that enum -> int conversion asked for
> by the C++ standard seems to be common that this should be
> worth the trouble.

Yep, it seems that the "complicate" transform is actually the most generic
thing to do. (we won't need to modify all FE's and we will likely get more
simplifications done) Shall I try to dig into it or you know how to do that
better?


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-26 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #15 from rguenther at suse dot de  ---
On Wed, 26 Mar 2014, hubicka at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315
> 
> --- Comment #14 from Jan Hubicka  ---
> The compile time hog issue is fixed now.  We still fix the predicates for
> switch statement (to get pass NOP_EXPR) since it seems very common pattern. 
> Richard: I suppose we can't fold away the NOP_EXPR easily earlier?

forwprop would do that, but the enum is unsigned int while the
switch value is int and thus simplify_gimple_switch bails out
because the conversion is not value-preserving.

So the frontend would need to be changed here or we need to
"complicate" the transform by not looking at the type of
the existing switch argument but instead by looking at the
actual switch label values to see if their value would be
preserved.  But yes, that enum -> int conversion asked for
by the C++ standard seems to be common that this should be
worth the trouble.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-25 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #14 from Jan Hubicka  ---
The compile time hog issue is fixed now.  We still fix the predicates for
switch statement (to get pass NOP_EXPR) since it seems very common pattern. 
Richard: I suppose we can't fold away the NOP_EXPR easily earlier?


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-25 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #13 from Jan Hubicka  ---
Author: hubicka
Date: Wed Mar 26 02:11:57 2014
New Revision: 208831

URL: http://gcc.gnu.org/viewcvs?rev=208831&root=gcc&view=rev
Log:

PR ipa/60315
* cif-code.def (UNREACHABLE) New code.
* ipa-inline.c (inline_small_functions): Skip edges to
__builtlin_unreachable.
(estimate_edge_growth): Allow edges to __builtlin_unreachable.
* ipa-inline-analysis.c (edge_set_predicate): Redirect edges with false
predicate to __bulitin_unreachable.
(set_cond_stmt_execution_predicate): Fix issue when invert_tree_comparison
returns ERROR_MARK.
* ipa-pure-const.c (propagate_pure_const, propagate_nothrow): Do not
propagate to inline clones.
* cgraph.c (verify_edge_corresponds_to_fndecl): Allow redirection
to unreachable.
* ipa-cp.c (create_specialized_node): Be ready for new node to appear.
* cgraphclones.c (cgraph_clone_node): If call destination is already
ureachable, do not redirect it back.
* tree-inline.c (fold_marked_statements): Hanlde calls becoming
unreachable.

Added:
trunk/gcc/testsuite/g++.dg/torture/pr60315.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cgraph.c
trunk/gcc/cgraphclones.c
trunk/gcc/cif-code.def
trunk/gcc/ipa-cp.c
trunk/gcc/ipa-inline-analysis.c
trunk/gcc/ipa-inline.c
trunk/gcc/ipa-inline.h
trunk/gcc/testsuite/ChangeLog


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-24 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #12 from Jan Hubicka  ---
Created attachment 32442
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32442&action=edit
Better patch

I am attaching more complete patch.  There is quite bad wrong code bug in
pure-const that updates declaration because it proves that one of inline clones
is pure which does not imply that the offline copy is.
The patch also need fix to tree-inline.c that I sent earlier and Richard
requested a cleanup for http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00134.html
Will do that tomorrow.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-24 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #11 from Jan Hubicka  ---
Created attachment 32439
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32439&action=edit
Patch I am testing

this patch implements the trick of redirecting call edges to UNREACHABLE.  It
solves the compile time issues quite well. I wonder why I did not get this idea
previously - basically it was always on the TODO list to remove edges
corresponding to provably unreachable code, but for that we need bit more
support because the known to be dead calls needs to be recorded for verifier
and inline-transform machinery.

Calls to UNREACHABLE does as well as special purpose markers without needs for
changess across the cgraph code.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-24 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #10 from Jan Hubicka  ---
Actually the problem here seems to be that we soon work out that most of edges
are never executed, yet we still inlining them. The metrics are not growing
then so we take time to hit the limits.  I guess with ability to redirect edges
to unreachable, we can kill nodes early.

BTW the cache is not really intended to help the updates, it only avoids
repeated recomputations.  This is not really a dataflow problem - we only walk
the inline trees that (modulo bugs) should not grow arbitrarily large because
of inlining limits.

Not seeing that the switch is controlled by parameters sucks indeed. Will add
code going thorugh casts.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-24 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #9 from Jan Hubicka  ---
I will take a look... It seems we manage to incredibly deep inline trees.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-03-18 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek  ---
BTW, the #c0 testcase regressed with r179083.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #7 from Richard Biener  ---
Note that we seem to fail to update BB predicates for switch stmts.

size:0.00, time:0.00, predicate:(true)
size:3.00, time:2.00, predicate:(not inlined)
size:2.00, time:2.00, predicate:(op1 changed)
size:8.00, time:3.20, predicate:(op1 changed) && (op1 != 2)

I cannot interpret the size 2 case (what is "op1 changed"?), but in that
case we actually shrink compared to not inlining.  As && op1 != 2 makes
it more restrictive it's odd that that increases the metrics.

As inliner I would inline all the (op1 changed) cases, thus in the above
case op1 == 2.

We seem to inline fully until hitting the case where only recursive edges
are left.  Even for only two switch cases we inline 5(!) calls into Test.

Ah, the switch isn't handled by the predicates because

 BB 3 predicate:(op1 != 1)
  s.1_4 = (int) s_2(D);
freq:0.80 size:  0 time:  0
  switch (s.1_4) , case 0: , case 1: >

it is not unmodified_parm_or_parm_agg_item ().  The parameter is unsigned int
so the cast is not value-preserving.

Of course in general we can't rely on "proper" predicates so we need to avoid
exploding reliably.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #6 from Richard Biener  ---
Btw, the smaller testcase (E4 case commented) shows exactly the same behavior,
we just seem to be exponential so only adding E4 makes it "really" bad.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #5 from Richard Biener  ---
Hmm, ok - it is supposed to only account for the extra call edges in the
inlined
bodies.  The actual issue seems to be

Deciding on inlining of small functions.  Starting with size 114.
Enqueueing calls in Test::Test(Scale) [with Scale scale = (Scale)3u]/14.
Enqueueing calls in Test::Test(Scale) [with Scale scale = (Scale)3u]/13.
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
  enqueuing call Test::Test(Scale) [with Scale scale = (Scale)3u]/13 ->
Test::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)2u]/9
   Known to be false: not inlined, op1 != 2, op1 changed
   size:0 time:0
  enqueuing call Test::Test(Scale) [with Scale scale = (Scale)3u]/13 ->
Test::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)1u]/5
   Known to be false: not inlined, op1 != 1, op1 changed
   size:0 time:0
...

Considering Test::Test(Scale) [with Scale scale = (Scale)2u]/9 with 27
size
 to be inlined into Test::Test(Scale) [with Scale scale = (Scale)3u]/13
in t.C:21
 Estimated growth after inlined into all is +27 insns.
 Estimated badness is -1073741827, frequency 0.16.
Badness calculation for Test::Test(Scale) [with Scale scale =
(Scale)3u]/13 -> Test::Test(Scale) [with Scale scale = (Scale)2u]/10
  size growth -3, time 0 inline hints: in_scc declared_inline
  -1073741827: Growth -3 <= 0
Processing frequency Test::Test(Scale) [with Scale scale = (Scale)2u]
  Called by Test::Test(Scale) [with Scale scale = (Scale)3u] that is
normal or hot
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)0u]/2
   Known to be false: not inlined, op1 != 0, op1 changed
   size:0 time:0
  enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 ->
Test::Test(Scale) [with Scale scale = (Scale)0u]/3, badness -1073741827
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)1u]/5
   Known to be false: not inlined, op1 != 1, op1 changed
   size:0 time:0
  enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 ->
Test::Test(Scale) [with Scale scale = (Scale)1u]/6, badness -1073741827
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)2u]/9
   Known to be false: not inlined, op1 != 2, op1 changed
   size:0 time:0
  enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 ->
Test::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
  enqueuing call Test::Test(Scale) [with Scale scale = (Scale)2u]/30 ->
Test::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)3u]/13
   Known to be false: not inlined, op1 != 3, op1 changed
   size:0 time:0
 Inlined into Test::Test(Scale) [with Scale scale = (Scale)3u] which now
has time 13 and size 24,net change of -3.
New minimal size reached: 111
   Estimating body: Test::Test(Scale) [with Scale scale = (Scale)1u]/5
   Known to be false: not inlined, op1 != 1, op1 changed
   size:0 time:0

Considering Test::Test(Scale) [with Scale scale = (Scale)1u]/5 with 27
size
 to be inlined into Test::Test(Scale) [with Scale scale = (Scale)0u]/2
in t.C:20
 Estimated growth after inlined into all is +27 insns.
 Estimated badness is -1073741827, frequency 0.12.
Badness calculation for Test::Test(Scale) [with Scale scale =
(Scale)0u]/2 -> Test::Test(Scale) [with Scale scale = (Scale)1u]/6
  size growth -3, time 0 inline hints: declared_inline
  -1073741827: Growth -3 <= 0
...

so we are inlining all over the place but don't really arrive at a point
where no further useful inlining is left and inlining never has a growth
effect in our theory and we continue to think that we shrink the overall
unit by further inlining.  Meanwhile the cgraph is full of millions of
calls (but estimated to be never reached?):

Considering Test::Test(Scale) [with Scale scale = (Scale)0u]/2 with 24
size
 to be inlined into Test::Test(Scale) [with Scale scale = (Scale)0u]/76
in t.C:19
 Estimated growth after inlined into all is +24 insns.
 Estimated badness is -1073741827, frequency 0.00.
Badness calculation for Test::Test(Scale) [with Scale scale =

[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #4 from Richard Biener  ---
When calling do_estimate_edge_size to compute the effect on caller size when
inlining an edge we call estimate_node_size_and_time which eventually recurses
down to estimate_calls_size_and_time (why!?  call edges in the callee are
irrelevant when inlining the call into the caller!).  Doesn't this just want
to add(?) e->call_stmt_size/time?  At the moment estimate_calls_size_and_time
recurses to estimate_edge_size_and_time ... and I don't see _any_ prevention
of running in cgraph cycles here.  (and the cache isn't populated before
computing an edges size/time is).

In fact,

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   (revision 207960)
+++ gcc/ipa-inline-analysis.c   (working copy)
@@ -3011,21 +3011,11 @@ estimate_calls_size_and_time (struct cgr
   struct inline_edge_summary *es = inline_edge_summary (e);
   if (!es->predicate
  || evaluate_predicate (es->predicate, possible_truths))
-   {
- if (e->inline_failed)
-   {
- /* Predicates of calls shall not use NOT_CHANGED codes,
-sowe do not need to compute probabilities.  */
- estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE,
-  known_vals, known_binfos,
-  known_aggs, hints);
-   }
- else
-   estimate_calls_size_and_time (e->callee, size, time, hints,
- possible_truths,
- known_vals, known_binfos,
- known_aggs);
-   }
+   /* Predicates of calls shall not use NOT_CHANGED codes,
+  sowe do not need to compute probabilities.  */
+   estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE,
+known_vals, known_binfos,
+known_aggs, hints);
 }
   for (e = node->indirect_calls; e; e = e->next_callee)
 {

fixes this and I cannot make sense of calling estimate_calls_size_and_time
for the callee of an edge that we are not going to inline (or that is already
inlined?  I still find those if (e->inline_failed) checks odd).  If it's
supposed to account for inline bodies in the caller then we should have
updated the inline_summary () of the caller, not have to recurse here - and
we _do_ seem to (inline_merge_summary).


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

--- Comment #3 from Richard Biener  ---
This whole thing updating keys and such should use a proper lattice of
per-cgraph and per-edge node sizes/times which can be updated with a
less ad-hoc algorithm than the current one which easily and completely explodes
... :/  (and the 'cache' is completely ineffective during the update process)


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Richard Biener  changed:

   What|Removed |Added

 Blocks||60243

--- Comment #2 from Richard Biener  ---
We have only 1 billion (!) calls to estimate_calls_size_and_time (and 2 billion
recursions in that function - I suppose callgrind/kcachegrind overflowed the
entry counter even ...).  Also 3 billion calls to evaluate_predicate.

Sth is getting seriously out-of-hands here ;)

That is, it's ultimately called from do_estimate_edge_size (but only 36
calls here - still a _lot_ for this testcase).

Looks related to what I found in PR60243.


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-22 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Richard Biener  changed:

   What|Removed |Added

   Keywords||compile-time-hog
 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
   Target Milestone|--- |4.8.3


[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization

2014-02-22 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-22
  Component|c++ |ipa
Summary|template constructor switch |[4.8/4.9 Regression]
   |optimization|template constructor switch
   ||optimization
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
It does finish, eventually, but it spends a lot of time in
inline_small_functions...

 ipa inlining heuristics : 627.09 (96%) usr   0.02 ( 3%) sys 629.01 (96%) wall 
 69964 kB (27%) ggc