[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:18e5aeaef294428fc8458c2c70a9ac3a537c35d6

commit r14-2209-g18e5aeaef294428fc8458c2c70a9ac3a537c35d6
Author: Richard Biener 
Date:   Fri Jun 30 09:46:48 2023 +0200

middle-end/110489 - avoid useless work on statistics

When we call statistics_fini_pass we unconditionally allocate
the statistics hash and traverse it.  When a TU has many small
functions this can take considerable time.  The following avoids
this by never allocating the hash from this function.

PR middle-end/110489
* statistics.cc (curr_statistics_hash): Add argument
indicating whether we should allocate the hash.
(statistics_fini_pass): If the hash isn't allocated
only print the summary header.

[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #4 from Richard Biener  ---
There would possibly be opportunity to optimize some of our infrastructure for
the case where we have 3 basic blocks (the minimum, ENTRY, bb2 and EXIT).  For
example dominance compute doesn't need to be "computed", the only special case
to consider is that EXIT is not reachable.

The testcase at hand seems to consist of forwarders.  In general single-BB
functions can be quite common in C++ code as well.  A lot of passes
could excuse themselves as well (next special case is BB2 having a
backedge to itself).  There is quite some constant overhead all over the place
even when we do nothing in the end.

[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-06-30
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #3 from Richard Biener  ---
Samples: 45K of event 'cycles', Event count (approx.): 51356148788  
Overhead   Samples  Command  Shared Object   Symbol 
   2.57%  1169  cc1  libc-2.31.so[.] _int_malloc
   1.54%   700  cc1  cc1 [.] bitmap_set_bit
   1.52%   692  cc1  libc-2.31.so[.] malloc
   1.32%   602  cc1  libc-2.31.so[.] _int_free
   1.31%   598  cc1  cc1 [.] record_reg_classes
   1.04%   476  cc1  cc1 [.] constrain_operands
   0.81%   368  cc1  cc1 [.] solve_constraints
   0.79%   360  cc1  cc1 [.] cse_insn
   0.78%   357  cc1  cc1 [.] ggc_internal_alloc
   0.76%   347  cc1  libc-2.31.so[.] free
   0.73%   330  cc1  cc1 [.] statistics_fini_pass

it's pointing at things I've seen multiple times, but I think investigating
why memory allocation is so high up in the profile would be good.  There
are some users like dom_info::dom_init which hit hard on the allocator
without good reason but then it's only few per function but as seen this
testcase has many of them.  Likewise ipa_sra_summarize_function seems to
have 99% cost in memory allocation.  Doing more on-demand initialization
might help here.

I have a patch for the statistics_finish_pass hit.

[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #2 from Andrew Pinski  ---
So I took a look at the sources, there are very many small functions.
This might be the reason why dump files Timevar takes a long time, it is called
for each pass and for each function. Maybe that can be improved.

the register allocator and schedule costs I suspect is due to there being a
small setup cost which multiply by many functions add up.

[Bug middle-end/110489] Slow building virtual.c.i from p11-kit

2023-06-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #1 from Andrew Pinski  ---
The only ones that stick out are:
 dump files :   1.07 (  4%)   0.24 (  5%)   1.58 (  5%)
0  (  0%)
 integrated RA  :   1.75 (  7%)   0.11 (  2%)   2.10 (  7%)
  147M ( 24%)
 scheduling 2   :   1.55 (  6%)   0.14 (  3%)   1.35 (  4%)
 2653k (  0%)


Nothing else sticks out really. (but they do add up).