[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Richard Biener changed: What|Removed |Added Last reconfirmed|2018-11-19 00:00:00 |2024-2-19 --- Comment #28 from Richard Biener --- Original testcase at -O2: callgraph functions expansion : 11.94 ( 51%) 2.19 ( 42%) 14.14 ( 49%) 570M ( 40%) callgraph ipa passes : 10.00 ( 43%) 0.70 ( 13%) 10.70 ( 37%) 601M ( 42%) ipa profile: 4.68 ( 20%) 0.00 ( 0%) 4.68 ( 16%) 0 ( 0%) TOTAL : 23.36 5.22 28.60 1430M 23.36user 5.27system 0:28.65elapsed 99%CPU (0avgtext+0avgdata 1152100maxresident)k 0inputs+0outputs (0major+315833minor)pagefaults 0swaps Jakubs testcase at -O2: callgraph functions expansion : 12.66 ( 30%) 2.21 ( 16%) 14.87 ( 27%) 505M ( 15%) callgraph ipa passes : 18.28 ( 44%) 0.65 ( 5%) 18.94 ( 34%) 601M ( 18%) ipa profile: 4.20 ( 10%) 0.00 ( 0%) 4.20 ( 8%) 0 ( 0%) preprocessing : 1.47 ( 4%) 3.27 ( 24%) 4.81 ( 9%) 417M ( 12%) lexical analysis : 2.24 ( 5%) 4.08 ( 30%) 6.34 ( 11%) 0 ( 0%) early inlining heuristics : 2.83 ( 7%) 0.04 ( 0%) 2.97 ( 5%) 1658k ( 0%) inline parameters : 3.01 ( 7%) 0.21 ( 2%) 3.16 ( 6%) 29M ( 1%) tree CFG construction : 3.44 ( 8%) 0.15 ( 1%) 3.52 ( 6%) 599M ( 18%) tree operand scan : 4.47 ( 11%) 0.26 ( 2%) 4.80 ( 9%) 93M ( 3%) TOTAL : 41.73 13.57 55.32 3422M 41.73user 13.67system 0:55.42elapsed 99%CPU (0avgtext+0avgdata 2374596maxresident)k 0inputs+0outputs (0major+536990minor)pagefaults 0swaps so besides a faster machine still like Honza said in the last comment.
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #27 from Jan Hubicka --- profile_estimate issue is still here, inliner and early inliner issues seems solved. Seems that ipa_profile just orders the nodes for propagation in wrong way - we propagate from callers to callees while toposorter is for propagation opoposite way. operand_scan seems slow too. Time variable usr sys wall GGC phase setup: 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1237 kB ( 0%) phase parsing : 6.63 ( 9%) 6.77 ( 77%) 13.41 ( 17%) 655497 kB ( 20%) phase opt and generate : 64.47 ( 91%) 2.07 ( 23%) 66.57 ( 83%) 2603397 kB ( 80%) garbage collection : 0.64 ( 1%) 0.00 ( 0%) 0.65 ( 1%) 0 kB ( 0%) dump files : 0.05 ( 0%) 0.01 ( 0%) 0.04 ( 0%) 0 kB ( 0%) callgraph construction : 0.91 ( 1%) 0.01 ( 0%) 0.83 ( 1%) 399235 kB ( 12%) callgraph optimization : 0.37 ( 1%) 0.00 ( 0%) 0.43 ( 1%) 0 kB ( 0%) callgraph functions expansion : 15.98 ( 22%) 1.20 ( 14%) 17.18 ( 21%) 297309 kB ( 9%) callgraph ipa passes : 40.57 ( 57%) 0.40 ( 5%) 40.99 ( 51%) 617751 kB ( 19%) ipa function summary : 0.14 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 1807 kB ( 0%) ipa dead code removal : 0.22 ( 0%) 0.00 ( 0%) 0.24 ( 0%) 0 kB ( 0%) ipa cp : 0.97 ( 1%) 0.03 ( 0%) 1.03 ( 1%) 327514 kB ( 10%) ipa inlining heuristics: 0.72 ( 1%) 0.00 ( 0%) 0.63 ( 1%) 84183 kB ( 3%) ipa function splitting : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) ipa various optimizations : 0.69 ( 1%) 0.20 ( 2%) 0.89 ( 1%) 128398 kB ( 4%) ipa reference : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) ipa profile: 18.24 ( 26%) 0.00 ( 0%) 18.25 ( 23%) 0 kB ( 0%) ipa pure const : 0.45 ( 1%) 0.00 ( 0%) 0.46 ( 1%) 0 kB ( 0%) ipa icf: 0.17 ( 0%) 0.02 ( 0%) 0.17 ( 0%) 0 kB ( 0%) ipa SRA: 0.21 ( 0%) 0.00 ( 0%) 0.21 ( 0%) 102 kB ( 0%) ipa free inline summary: 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) cfg cleanup: 0.00 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 0 kB ( 0%) trivially dead code: 0.12 ( 0%) 0.03 ( 0%) 0.12 ( 0%) 0 kB ( 0%) df scan insns : 0.85 ( 1%) 0.14 ( 2%) 1.28 ( 2%) 46 kB ( 0%) df multiple defs : 0.30 ( 0%) 0.06 ( 1%) 0.31 ( 0%) 0 kB ( 0%) df reaching defs : 0.69 ( 1%) 0.05 ( 1%) 0.63 ( 1%) 0 kB ( 0%) df live regs : 0.49 ( 1%) 0.02 ( 0%) 0.57 ( 1%) 0 kB ( 0%) df live regs : 0.19 ( 0%) 0.01 ( 0%) 0.12 ( 0%) 0 kB ( 0%) df must-initialized regs : 0.10 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 kB ( 0%) df use-def / def-use chains: 0.44 ( 1%) 0.05 ( 1%) 0.40 ( 1%) 0 kB ( 0%) df reg dead/unused notes : 1.35 ( 2%) 0.09 ( 1%) 1.15 ( 1%) 747 kB ( 0%) register information : 0.16 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 0 kB ( 0%) alias analysis : 0.16 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 436 kB ( 0%) alias stmt walking : 0.49 ( 1%) 0.07 ( 1%) 0.67 ( 1%) 0 kB ( 0%) register scan : 0.04 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) rebuild jump labels: 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) preprocessing : 2.37 ( 3%) 2.37 ( 27%) 4.49 ( 6%) 383477 kB ( 12%) lexical analysis : 1.88 ( 3%) 2.13 ( 24%) 4.20 ( 5%) 0 kB ( 0%) parser (global): 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 1442 kB ( 0%) parser function body : 2.19 ( 3%) 2.26 ( 26%) 4.50 ( 6%) 270577 kB ( 8%) early inlining heuristics : 2.80 ( 4%) 0.03 ( 0%) 2.81 ( 4%) 3076 kB ( 0%) inline parameters : 6.43 ( 9%) 0.14 ( 2%) 6.74 ( 8%) 31127 kB ( 1%) integration: 0.17 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 6789 kB ( 0%) tree gimplify : 1.01 ( 1%) 0.03 ( 0%) 1.15 ( 1%) 610970 kB ( 19%) tree eh: 0.50 ( 1%) 0.03 ( 0%) 0.44 ( 1%) 0 kB ( 0%) tree CFG construction : 3.50 ( 5%) 0.02 ( 0%) 3.74 ( 5%) 628087 kB ( 19%) tree CFG cleanup
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #26 from Martin Jambor --- With new IPA-SRA, the situation has improved quite a bit, see below where old-ipa-sra is trunk r275981 and new-ipa-sra is trunk r275982 (arrival of new IPA-SRA): $ /usr/bin/time -f 'real=%e user=%U' taskset -c 0 ~/gcc/old-ipa-sra/inst/bin/gcc -O0 -fno-inline -S pr60243.c real=64.20 user=63.37 $ /usr/bin/time -f 'real=%e user=%U' taskset -c 0 ~/gcc/old-ipa-sra/inst/bin/gcc -O1 -fno-inline -S pr60243.c real=90.80 user=89.84 $ /usr/bin/time -f 'real=%e user=%U' taskset -c 0 ~/gcc/old-ipa-sra/inst/bin/gcc -O2 -S pr60243.c real=235.18 user=233.77 $ /usr/bin/time -f 'real=%e user=%U' taskset -c 0 ~/gcc/old-ipa-sra/inst/bin/gcc -O2 -fno-inline -S pr60243.c real=198.59 user=197.27 $ /usr/bin/time -f 'real=%e user=%U' taskset -c 0 ~/gcc/new-ipa-sra/inst/bin/gcc -O2 -S pr60243.c real=114.68 user=113.76 $ /usr/bin/time -f 'real=%e user=%U' taskset -c 0 ~/gcc/new-ipa-sra/inst/bin/gcc -O2 -fno-inline -S pr60243.c real=88.40 user=87.41 $ taskset -c 0 ~/gcc/new-ipa-sra/inst/bin/gcc -O2 -S pr60243.c -ftime-report (showing only IPA passes and passes taking more than 1% of usr time) phase parsing : 9.57 ( 8%) 6.93 ( 75%) 16.51 ( 13%) 655448 kB ( 20%) phase opt and generate : 105.13 ( 92%) 2.34 ( 25%) 107.83 ( 87%) 2619926 kB ( 80%) callgraph functions expansion : 18.05 ( 16%) 1.34 ( 14%) 19.71 ( 16%) 302442 kB ( 9%) callgraph ipa passes : 77.51 ( 68%) 0.50 ( 5%) 78.06 ( 63%) 623696 kB ( 19%) ipa function summary : 0.15 ( 0%) 0.01 ( 0%) 0.16 ( 0%) 1494 kB ( 0%) ipa dead code removal : 0.32 ( 0%) 0.00 ( 0%) 0.29 ( 0%) 0 kB ( 0%) ipa cp : 1.10 ( 1%) 0.05 ( 1%) 1.13 ( 1%) 326688 kB ( 10%) ipa inlining heuristics: 17.85 ( 16%) 0.06 ( 1%) 17.82 ( 14%) 83762 kB ( 3%) ipa function splitting : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%) ipa various optimizations : 0.63 ( 1%) 0.28 ( 3%) 0.96 ( 1%) 131752 kB ( 4%) ipa reference : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 kB ( 0%) ipa profile: 14.66 ( 13%) 0.00 ( 0%) 14.67 ( 12%) 0 kB ( 0%) ipa pure const : 0.36 ( 0%) 0.04 ( 0%) 0.60 ( 0%) 0 kB ( 0%) ipa icf: 0.17 ( 0%) 0.01 ( 0%) 0.19 ( 0%) 0 kB ( 0%) ipa SRA: 0.21 ( 0%) 0.00 ( 0%) 0.23 ( 0%) 102 kB ( 0%) ipa free inline summary: 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) preprocessing : 4.20 ( 4%) 3.31 ( 36%) 7.77 ( 6%) 384133 kB ( 12%) lexical analysis : 2.46 ( 2%) 1.80 ( 19%) 3.95 ( 3%) 0 kB ( 0%) parser function body : 2.71 ( 2%) 1.82 ( 20%) 4.57 ( 4%) 269874 kB ( 8%) early inlining heuristics : 12.82 ( 11%) 0.03 ( 0%) 12.71 ( 10%) 4031 kB ( 0%) inline parameters : 8.01 ( 7%) 0.12 ( 1%) 8.27 ( 7%) 30845 kB ( 1%) tree CFG construction : 5.23 ( 5%) 0.04 ( 0%) 5.03 ( 4%) 628095 kB ( 19%) tree SSA rewrite : 3.42 ( 3%) 0.02 ( 0%) 3.39 ( 3%) 93305 kB ( 3%) tree operand scan : 17.53 ( 15%) 0.26 ( 3%) 17.77 ( 14%) 96568 kB ( 3%) Essentially, -O2 -fno-inline is now as fast as -O1 -fno-inline.
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Richard Biener changed: What|Removed |Added Last reconfirmed|2017-11-17 00:00:00 |2018-11-19 --- Comment #25 from Richard Biener --- (In reply to Martin Liška from comment #24) > Can the bug be marked as resolved? I don't see how. Jakubs testcase: ipa inlining heuristics: 27.66 ( 8%) 0.00 ( 0%) 27.66 ( 8%) 0 kB ( 0%) ipa profile: 18.72 ( 6%) 0.00 ( 0%) 18.71 ( 5%) 0 kB ( 0%) ipa SRA: 190.05 ( 58%) 1.44 ( 9%) 191.77 ( 56%) 717305 kB ( 22%) early inlining heuristics : 24.01 ( 7%) 0.01 ( 0%) 24.18 ( 7%) 2357 kB ( 0%) tree operand scan : 13.67 ( 4%) 0.68 ( 4%) 14.12 ( 4%) 95009 kB ( 3%) TOTAL : 325.67 16.14343.04 3319727 kB so it's all IPA and a little operand scanner.
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #24 from Martin Liška --- Can the bug be marked as resolved?
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #23 from Eric Gallager --- (In reply to Jan Hubicka from comment #22) > > The IPA SRA time is all spent in compute_fn_summary via convert_callers. > > Not sure why that's necessary here? Martin, in r152368 you reduced those > > to once-per-caller but obviously if each function calls each other function > > as in this testcase this is still O(n^2). Why's the summary not simply > > recomputed when we process the caller next? Thus at most N times? > > This is because summary needs to be ready for early inliner to decide whether > caller is good for inlning or not. I think we can simply mark it as dirty > and > compute on demand from the inliner. > > I also have finally working patches for incremental update of inline summary > in > the IPA inliner. > Cool, looking forward to seeing those patches!
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #22 from Jan Hubicka --- > The IPA SRA time is all spent in compute_fn_summary via convert_callers. > Not sure why that's necessary here? Martin, in r152368 you reduced those > to once-per-caller but obviously if each function calls each other function > as in this testcase this is still O(n^2). Why's the summary not simply > recomputed when we process the caller next? Thus at most N times? This is because summary needs to be ready for early inliner to decide whether caller is good for inlning or not. I think we can simply mark it as dirty and compute on demand from the inliner. I also have finally working patches for incremental update of inline summary in the IPA inliner. Honza
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Richard Biener changed: What|Removed |Added CC||jamborm at gcc dot gnu.org --- Comment #21 from Richard Biener --- Current trunk at -O2 -fno-checking (w/ otherwise checking enabled): Time variable usr sys wall GGC phase setup: 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1245 kB ( 0%) phase parsing : 16.72 ( 6%) 15.15 ( 75%) 31.86 ( 10%) 612162 kB ( 18%) phase opt and generate : 272.51 ( 94%) 5.08 ( 25%) 277.63 ( 90%) 2719266 kB ( 82%) ipa inlining heuristics: 31.82 ( 11%) 0.00 ( 0%) 31.85 ( 10%) 0 kB ( 0%) ipa profile: 9.92 ( 3%) 0.00 ( 0%) 9.93 ( 3%) 0 kB ( 0%) ipa SRA: 153.77 ( 53%) 1.81 ( 9%) 155.54 ( 50%) 741949 kB ( 22%) early inlining heuristics : 24.54 ( 8%) 0.03 ( 0%) 24.65 ( 8%) 2987 kB ( 0%) at -O -g we can also see to my surprise: tree CFG construction : 6.27 ( 4%) 0.04 ( 0%) 6.28 ( 4%) 628095 kB ( 15%) tree operand scan : 3.78 ( 3%) 0.99 ( 4%) 5.01 ( 3%) 47597 kB ( 1%) tree CFG cleanup : 7.51 ( 5%) 0.05 ( 0%) 7.71 ( 5%) 0 kB ( 0%) the tree CFG construction time is _entirely_ spent in assign_discriminators! That's because expand_location is costly and the discriminator_per_locus hashtable does that all the time. It's also because the testcase sits on a single line. The whole code seems odd to me as well given it doesn't very well handle trailing or leading UNKNOWN_LOCATION stmts. I also wonder why it is done at CFG construction time. The IPA SRA time is all spent in compute_fn_summary via convert_callers. Not sure why that's necessary here? Martin, in r152368 you reduced those to once-per-caller but obviously if each function calls each other function as in this testcase this is still O(n^2). Why's the summary not simply recomputed when we process the caller next? Thus at most N times?
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #20 from rguenther at suse dot de --- On Sun, 19 Nov 2017, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 > > --- Comment #19 from Jan Hubicka --- > Author: hubicka > Date: Sun Nov 19 18:55:30 2017 > New Revision: 254934 > > URL: https://gcc.gnu.org/viewcvs?rev=254934=gcc=rev > Log: > PR ipa/60243 > * tree-inline.c (estimate_num_insns): Set to 1 at least. > > Modified: > trunk/gcc/ChangeLog > trunk/gcc/tree-inline.c While this fixes the new regression the appearant IPA SRA quadraticness remains. I'll add the testcase to our "random" set of testcases in the C++ bench.
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #19 from Jan Hubicka --- Author: hubicka Date: Sun Nov 19 18:55:30 2017 New Revision: 254934 URL: https://gcc.gnu.org/viewcvs?rev=254934=gcc=rev Log: PR ipa/60243 * tree-inline.c (estimate_num_insns): Set to 1 at least. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-inline.c
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #18 from Jan Hubicka --- Returning MIN(1, count) indeed seems like very good idea to me. We need to keep those in control :) I am testing patch for that.
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #17 from Richard Biener --- So, add a comment in the asm to make the testcase test the same as originally for this PR (seems to peak at ~2GB then). Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1182 kB ( 0%) ggc phase parsing : 6.53 ( 3%) usr 7.40 (73%) sys 13.93 ( 6%) wall 611643 kB (19%) ggc phase opt and generate : 202.70 (97%) usr 2.70 (27%) sys 205.41 (94%) wall 2569108 kB (81%) ggc ipa profile : 14.77 ( 7%) usr 0.00 ( 0%) sys 14.77 ( 7%) wall 0 kB ( 0%) ggc ipa SRA : 127.88 (61%) usr 0.89 ( 9%) sys 129.17 (59%) wall 619431 kB (19%) ggc early inlining heuristics: 3.74 ( 2%) usr 0.00 ( 0%) sys 3.64 ( 2%) wall 1928 kB ( 0%) ggc tree CFG construction : 8.73 ( 4%) usr 0.05 ( 0%) sys 8.77 ( 4%) wall 651524 kB (20%) ggc tree operand scan : 10.61 ( 5%) usr 0.33 ( 3%) sys 10.77 ( 5%) wall 95009 kB ( 3%) ggc scheduling 2: 3.69 ( 2%) usr 0.02 ( 0%) sys 3.80 ( 2%) wall 502 kB ( 0%) ggc TOTAL : 209.2310.10 219.35 3181942 kB
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Richard Biener changed: What|Removed |Added Keywords||memory-hog Status|UNCONFIRMED |NEW Last reconfirmed||2017-11-17 Ever confirmed|0 |1 Known to fail||7.2.1 --- Comment #16 from Richard Biener --- Not by this commit. Jakubs testcase is still slow in GCC 7 (and uses >28GB memory - ick, didn't even finish compiling). We seem to blow up during early inlining here, because we get BBs with millions of __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); __asm__ __volatile__("" : : : "memory"); ... counting those as zero size probably isn't wise if we don't "optimize" them during inlining... This issue likely hides the underlying old issue. case GIMPLE_ASM: { int count = asm_str_count (gimple_asm_string (as_a (stmt))); /* 1000 means infinity. This avoids overflows later with very long asm statements. */ if (count > 1000) count = 1000; return count; } should return MIN (1, count) even if in this case the asm doesn't generate any code.
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Eric Gallager changed: What|Removed |Added CC||egallager at gcc dot gnu.org --- Comment #15 from Eric Gallager --- (In reply to Jan Hubicka from comment #14) > Author: hubicka > Date: Fri Mar 28 19:50:28 2014 > New Revision: 208916 > > URL: http://gcc.gnu.org/viewcvs?rev=208916=gcc=rev > Log: > PR ipa/60243 > * ipa-inline.c (want_inline_small_function_p): Short circuit large > functions; reorganize to make cheap checks first. > (inline_small_functions): Do not estimate growth when dumping; > it is expensive. > * ipa-inline.h (inline_summary): Add min_size. > (growth_likely_positive): New function. > * ipa-inline-analysis.c (dump_inline_summary): Add min_size. > (set_cond_stmt_execution_predicate): Cleanup. > (estimate_edge_size_and_time): Compute min_size. > (estimate_calls_size_and_time): Likewise. > (estimate_node_size_and_time): Likewise. > (inline_update_overall_summary): Update min_size. > (do_estimate_edge_time): Likewise. > (do_estimate_edge_size): Update. > (do_estimate_edge_hints): Update. > (growth_likely_positive): New function. > > Modified: > trunk/gcc/ChangeLog > trunk/gcc/ipa-inline-analysis.c > trunk/gcc/ipa-inline.c > trunk/gcc/ipa-inline.h Did this fix it?
[Bug ipa/60243] IPA is slow on large cgraph tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Bug 60243 depends on bug 60315, which changed state. Bug 60315 Summary: [4.8 Regression] template constructor switch optimization https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #14 from Jan Hubicka hubicka at gcc dot gnu.org --- Author: hubicka Date: Fri Mar 28 19:50:28 2014 New Revision: 208916 URL: http://gcc.gnu.org/viewcvs?rev=208916root=gccview=rev Log: PR ipa/60243 * ipa-inline.c (want_inline_small_function_p): Short circuit large functions; reorganize to make cheap checks first. (inline_small_functions): Do not estimate growth when dumping; it is expensive. * ipa-inline.h (inline_summary): Add min_size. (growth_likely_positive): New function. * ipa-inline-analysis.c (dump_inline_summary): Add min_size. (set_cond_stmt_execution_predicate): Cleanup. (estimate_edge_size_and_time): Compute min_size. (estimate_calls_size_and_time): Likewise. (estimate_node_size_and_time): Likewise. (inline_update_overall_summary): Update min_size. (do_estimate_edge_time): Likewise. (do_estimate_edge_size): Update. (do_estimate_edge_hints): Update. (growth_likely_positive): New function. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-inline-analysis.c trunk/gcc/ipa-inline.c trunk/gcc/ipa-inline.h
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #13 from Jan Hubicka hubicka at gcc dot gnu.org --- BTW, compiled with C++ FE we seem to have important bottleneck in linemap_macro_map_lookup
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #12 from rguenther at suse dot de rguenther at suse dot de --- On Sun, 2 Mar 2014, hubicka at gcc dot gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #11 from Jan Hubicka hubicka at gcc dot gnu.org --- Created attachment 32244 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32244action=edit WIP patch this patch cuts some redundant work on estimating size of functions that will be too large to be inlined anyway. Currently inliner spends a lot of time compuing properties of these functions (since small and inlinable functions are also fast to estimate) The patch doesn't really save much time building libreoffice/firefox. I will experiment with it a bit more. Does it help PR60315? That one is even more an excessive example.
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #11 from Jan Hubicka hubicka at gcc dot gnu.org --- Created attachment 32244 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32244action=edit WIP patch this patch cuts some redundant work on estimating size of functions that will be too large to be inlined anyway. Currently inliner spends a lot of time compuing properties of these functions (since small and inlinable functions are also fast to estimate) The patch doesn't really save much time building libreoffice/firefox. I will experiment with it a bit more.
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #8 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Wed Feb 19 09:29:34 2014 New Revision: 207879 URL: http://gcc.gnu.org/viewcvs?rev=207879root=gccview=rev Log: 2014-02-19 Richard Biener rguent...@suse.de PR ipa/60243 * ipa-prop.c: Include stringpool.h and tree-ssanames.h. (ipa_modify_call_arguments): Emit an argument load explicitely and preserve virtual SSA form there and for the replacement call. Do not update SSA form nor free dominance info. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-prop.c
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #9 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Wed Feb 19 14:25:47 2014 New Revision: 207899 URL: http://gcc.gnu.org/viewcvs?rev=207899root=gccview=rev Log: 2014-02-19 Richard Biener rguent...@suse.de PR ipa/60243 * tree-inline.c (estimate_num_insns): Avoid calling cgraph_get_node for all calls. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-inline.c
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #10 from Jan Hubicka hubicka at ucw dot cz --- --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- estimate_calls_size_and_time is quite high on the profile - called via do_estimate_edge_size it walks callgraph edges O(n^2). It seems that the idea of having a cache is worse than devising an algorithm to compute sizes and times for the whole cgraph at once? Yep, the problem is that they are changing as the inlining progresses, since we propagate predicates on them on each inline. I will check the testcase.
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- estimate_calls_size_and_time is quite high on the profile - called via do_estimate_edge_size it walks callgraph edges O(n^2). It seems that the idea of having a cache is worse than devising an algorithm to compute sizes and times for the whole cgraph at once? The next high thing on the profile is ipa_propagate_frequency_1 called from do_estimate_growth (same thing, walks over all call edges again). The ipa-profile slowness is the same - ipa_propagate_frequency. The testcase has N cgraph nodes and N^2/2 call edges, so it's quite unusual of course.
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org --- Oh, and ipa_profile_generate_summary is dominated by symtab_get_node () hashtable lookup ...
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #5 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Richard Biener from comment #4) Oh, and ipa_profile_generate_summary is dominated by symtab_get_node () hashtable lookup ... here: int estimate_num_insns (gimple stmt, eni_weights *weights) { /* Do not special case builtins where we see the body. This just confuse inliner. */ ... else if (!(decl = gimple_call_fndecl (stmt)) || !(node = cgraph_get_node (decl)) || node-definition) ; a simple re-org will fix that. I'll do that.
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- Created attachment 32162 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32162action=edit patch 1
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Created attachment 32163 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32163action=edit patch 2
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- -O2 -fno-inline
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- So: #define A(n) static void test##n (int); #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) D(1) #undef A #define E(m, n) if (n m) test##n (i); #define F(m, n) E(m, n##0) E(m, n##1) E(m, n##2) E(m, n##3) E(m, n##4) E(m, n##5) E(m, n##6) E(m, n##7) E(m, n##8) E(m, n##9) #define G(m, n) F(m, n##0) F(m, n##1) F(m, n##2) F(m, n##3) F(m, n##4) F(m, n##5) F(m, n##6) F(m, n##7) F(m, n##8) F(m, n##9) #define H(m, n) G(m, n##0) G(m, n##1) G(m, n##2) G(m, n##3) G(m, n##4) G(m, n##5) G(m, n##6) G(m, n##7) G(m, n##8) G(m, n##9) #define A(n) \ static void test##n (int i)\ {\ asm ( : : : memory);\ H(n, 1)\ } D(1) int main () { test1000 (5); return 0; } so that we have something for the testsuite?