[Bug middle-end/110489] Slow building virtual.c.i from p11-kit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:18e5aeaef294428fc8458c2c70a9ac3a537c35d6 commit r14-2209-g18e5aeaef294428fc8458c2c70a9ac3a537c35d6 Author: Richard Biener Date: Fri Jun 30 09:46:48 2023 +0200 middle-end/110489 - avoid useless work on statistics When we call statistics_fini_pass we unconditionally allocate the statistics hash and traverse it. When a TU has many small functions this can take considerable time. The following avoids this by never allocating the hash from this function. PR middle-end/110489 * statistics.cc (curr_statistics_hash): Add argument indicating whether we should allocate the hash. (statistics_fini_pass): If the hash isn't allocated only print the summary header.
[Bug middle-end/110489] Slow building virtual.c.i from p11-kit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489 --- Comment #4 from Richard Biener --- There would possibly be opportunity to optimize some of our infrastructure for the case where we have 3 basic blocks (the minimum, ENTRY, bb2 and EXIT). For example dominance compute doesn't need to be "computed", the only special case to consider is that EXIT is not reachable. The testcase at hand seems to consist of forwarders. In general single-BB functions can be quite common in C++ code as well. A lot of passes could excuse themselves as well (next special case is BB2 having a backedge to itself). There is quite some constant overhead all over the place even when we do nothing in the end.
[Bug middle-end/110489] Slow building virtual.c.i from p11-kit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-06-30 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- Samples: 45K of event 'cycles', Event count (approx.): 51356148788 Overhead Samples Command Shared Object Symbol 2.57% 1169 cc1 libc-2.31.so[.] _int_malloc 1.54% 700 cc1 cc1 [.] bitmap_set_bit 1.52% 692 cc1 libc-2.31.so[.] malloc 1.32% 602 cc1 libc-2.31.so[.] _int_free 1.31% 598 cc1 cc1 [.] record_reg_classes 1.04% 476 cc1 cc1 [.] constrain_operands 0.81% 368 cc1 cc1 [.] solve_constraints 0.79% 360 cc1 cc1 [.] cse_insn 0.78% 357 cc1 cc1 [.] ggc_internal_alloc 0.76% 347 cc1 libc-2.31.so[.] free 0.73% 330 cc1 cc1 [.] statistics_fini_pass it's pointing at things I've seen multiple times, but I think investigating why memory allocation is so high up in the profile would be good. There are some users like dom_info::dom_init which hit hard on the allocator without good reason but then it's only few per function but as seen this testcase has many of them. Likewise ipa_sra_summarize_function seems to have 99% cost in memory allocation. Doing more on-demand initialization might help here. I have a patch for the statistics_finish_pass hit.
[Bug middle-end/110489] Slow building virtual.c.i from p11-kit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489 --- Comment #2 from Andrew Pinski --- So I took a look at the sources, there are very many small functions. This might be the reason why dump files Timevar takes a long time, it is called for each pass and for each function. Maybe that can be improved. the register allocator and schedule costs I suspect is due to there being a small setup cost which multiply by many functions add up.
[Bug middle-end/110489] Slow building virtual.c.i from p11-kit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489 --- Comment #1 from Andrew Pinski --- The only ones that stick out are: dump files : 1.07 ( 4%) 0.24 ( 5%) 1.58 ( 5%) 0 ( 0%) integrated RA : 1.75 ( 7%) 0.11 ( 2%) 2.10 ( 7%) 147M ( 24%) scheduling 2 : 1.55 ( 6%) 0.14 ( 3%) 1.35 ( 4%) 2653k ( 0%) Nothing else sticks out really. (but they do add up).