This fixes the IPA inline analysis quadraticness that arises when we inline functions into the same function many times via inline_to_all_callers as we do in the testcase for PR37448. In that case updating the overall summary of the caller is done for each inlined call instead of just once.
The solution is to keep track of which nodes we inlined to and delay the summary update. Honza, I believe this is safe as the callers summary should only matter for further inline decisions and not this one (inlining into all callers). Double-checking is appreciated - there should be no code changes by this patch (I verified this on the PR37448 testcase). This brings down compile-time of ipa inlining heuristics from ipa inlining heuristics : 260.14 (84%) usr 0.50 (23%) sys 260.91 (84%) wall 102217 kB (10%) ggc (that's an optimized non-checking compiler) to ipa inlining heuristics : 3.52 ( 3%) usr 0.55 (12%) sys 4.21 ( 3%) wall 102216 kB (12%) ggc (that's an unoptimized, checking enabled compiler, "before" on that unoptimized compiler is ipa inlining heuristics : 935.06 (89%)). We still do a lot of inlining here so we see integration : 21.22 (17%) usr 0.45 (10%) sys 21.68 (17%) wall 142979 kB (17%) ggc which is supposedly expected (same unoptimized, checking enabled compiler). Bootstrap & regtest is running on x86_64-unknown-linux-gnu, ok for trunk and branch(es)? Thanks, Richard. 2016-02-18 Richard Biener <rguent...@suse.de> PR ipa/37448 * ipa-inline.c (inline_to_all_callers_1): Add to the callers hash-set, do not update overall summaries here. Renamed from ... (inline_to_all_callers): ... this which is now wrapping the above and performing delayed overall summary update. Index: gcc/ipa-inline.c =================================================================== *** gcc/ipa-inline.c (revision 233498) --- gcc/ipa-inline.c (working copy) *************** flatten_function (struct cgraph_node *no *** 2163,2169 **** recursion. */ static bool ! inline_to_all_callers (struct cgraph_node *node, void *data) { int *num_calls = (int *)data; bool callee_removed = false; --- 2163,2170 ---- recursion. */ static bool ! inline_to_all_callers_1 (struct cgraph_node *node, void *data, ! hash_set<cgraph_node *> *callers) { int *num_calls = (int *)data; bool callee_removed = false; *************** inline_to_all_callers (struct cgraph_nod *** 2193,2199 **** inline_summaries->get (node->callers->caller)->size); } ! inline_call (node->callers, true, NULL, NULL, true, &callee_removed); if (dump_file) fprintf (dump_file, " Inlined into %s which now has %i size\n", --- 2194,2203 ---- inline_summaries->get (node->callers->caller)->size); } ! /* Remember which callers we inlined to, delaying updating the ! overall summary. */ ! callers->add (node->callers->caller); ! inline_call (node->callers, true, NULL, NULL, false, &callee_removed); if (dump_file) fprintf (dump_file, " Inlined into %s which now has %i size\n", *************** inline_to_all_callers (struct cgraph_nod *** 2211,2216 **** --- 2215,2237 ---- return false; } + /* Wrapper around inline_to_all_callers_1 doing delayed overall summary + update. */ + + static bool + inline_to_all_callers (struct cgraph_node *node, void *data) + { + hash_set<cgraph_node *> callers; + bool res = inline_to_all_callers_1 (node, data, &callers); + /* Perform the delayed update of the overall summary of all callers + processed. This avoids quadratic behavior in the cases where + we have a lot of calls to the same function. */ + for (hash_set<cgraph_node *>::iterator i = callers.begin (); + i != callers.end (); ++i) + inline_update_overall_summary (*i); + return res; + } + /* Output overall time estimate. */ static void dump_overall_stats (void)