https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot
gnu.org
--- Comment #38 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
.... it is GCC10 but I finally managed to implement the incremental update
here.
Memory use is about 1.1GB but inliner finishes quite quickly:
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1237 kB ( 0%)
phase parsing : 1.29 ( 2%) 1.24 ( 6%) 2.54 ( 3%)
247897 kB ( 6%)
phase lang. deferred : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 kB ( 0%)
phase opt and generate : 56.81 ( 98%) 19.35 ( 94%) 76.27 ( 97%)
3859026 kB ( 94%)
garbage collection : 0.84 ( 1%) 0.10 ( 0%) 0.93 ( 1%)
0 kB ( 0%)
dump files : 3.28 ( 6%) 1.85 ( 9%) 5.30 ( 7%)
0 kB ( 0%)
callgraph construction : 0.70 ( 1%) 0.28 ( 1%) 1.07 ( 1%)
99328 kB ( 2%)
callgraph optimization : 1.38 ( 2%) 0.74 ( 4%) 2.03 ( 3%)
1026 kB ( 0%)
callgraph functions expansion : 47.27 ( 81%) 15.51 ( 75%) 62.89 ( 80%)
2827825 kB ( 69%)
callgraph ipa passes : 8.19 ( 14%) 3.26 ( 16%) 11.45 ( 15%)
709147 kB ( 17%)
ipa function summary : 0.34 ( 1%) 0.08 ( 0%) 0.43 ( 1%)
97794 kB ( 2%)
ipa dead code removal : 0.25 ( 0%) 0.01 ( 0%) 0.27 ( 0%)
0 kB ( 0%)
ipa inheritance graph : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 kB ( 0%)
ipa devirtualization : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
ipa cp : 0.23 ( 0%) 0.02 ( 0%) 0.27 ( 0%)
7169 kB ( 0%)
ipa inlining heuristics : 0.19 ( 0%) 0.00 ( 0%) 0.22 ( 0%)
0 kB ( 0%)
ipa function splitting : 0.02 ( 0%) 0.01 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
ipa comdats : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
ipa various optimizations : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
ipa reference : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
0 kB ( 0%)
ipa profile : 0.07 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
ipa pure const : 0.45 ( 1%) 0.15 ( 1%) 0.47 ( 1%)
0 kB ( 0%)
ipa icf : 0.22 ( 0%) 0.01 ( 0%) 0.23 ( 0%)
0 kB ( 0%)
ipa SRA : 0.13 ( 0%) 0.00 ( 0%) 0.14 ( 0%)
5120 kB ( 0%)
ipa free lang data : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
ipa free inline summary : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
0 kB ( 0%)
cfg construction : 0.07 ( 0%) 0.01 ( 0%) 0.19 ( 0%)
0 kB ( 0%)
cfg cleanup : 0.73 ( 1%) 0.23 ( 1%) 0.95 ( 1%)
0 kB ( 0%)
trivially dead code : 0.30 ( 1%) 0.06 ( 0%) 0.30 ( 0%)
0 kB ( 0%)
df scan insns : 0.81 ( 1%) 0.21 ( 1%) 0.93 ( 1%)
3072 kB ( 0%)
df multiple defs : 0.28 ( 0%) 0.06 ( 0%) 0.41 ( 1%)
0 kB ( 0%)
df reaching defs : 1.48 ( 3%) 0.20 ( 1%) 1.63 ( 2%)
0 kB ( 0%)
df live regs : 1.12 ( 2%) 0.26 ( 1%) 1.33 ( 2%)
0 kB ( 0%)
df live&initialized regs : 0.51 ( 1%) 0.19 ( 1%) 0.66 ( 1%)
0 kB ( 0%)
df must-initialized regs : 0.11 ( 0%) 0.06 ( 0%) 0.14 ( 0%)
0 kB ( 0%)
df use-def / def-use chains : 0.36 ( 1%) 0.04 ( 0%) 0.43 ( 1%)
0 kB ( 0%)
df reg dead/unused notes : 1.69 ( 3%) 0.20 ( 1%) 1.81 ( 2%)
12288 kB ( 0%)
register information : 0.38 ( 1%) 0.04 ( 0%) 0.39 ( 0%)
0 kB ( 0%)
alias analysis : 0.82 ( 1%) 0.17 ( 1%) 1.15 ( 1%)
36865 kB ( 1%)
alias stmt walking : 0.06 ( 0%) 0.04 ( 0%) 0.07 ( 0%)
0 kB ( 0%)
register scan : 0.07 ( 0%) 0.03 ( 0%) 0.11 ( 0%)
0 kB ( 0%)
rebuild jump labels : 0.16 ( 0%) 0.06 ( 0%) 0.14 ( 0%)
0 kB ( 0%)
preprocessing : 0.39 ( 1%) 0.32 ( 2%) 0.49 ( 1%)
44508 kB ( 1%)
lexical analysis : 0.32 ( 1%) 0.39 ( 2%) 0.73 ( 1%)
0 kB ( 0%)
parser (global) : 0.11 ( 0%) 0.08 ( 0%) 0.27 ( 0%)
38009 kB ( 1%)
parser function body : 0.48 ( 1%) 0.45 ( 2%) 1.06 ( 1%)
165379 kB ( 4%)
early inlining heuristics : 0.14 ( 0%) 0.03 ( 0%) 0.16 ( 0%)
51712 kB ( 1%)
inline parameters : 0.51 ( 1%) 0.16 ( 1%) 0.72 ( 1%)
134145 kB ( 3%)
integration : 0.39 ( 1%) 0.06 ( 0%) 0.44 ( 1%)
70655 kB ( 2%)
tree gimplify : 0.25 ( 0%) 0.15 ( 1%) 0.41 ( 1%)
153090 kB ( 4%)
tree eh : 0.05 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
tree CFG construction : 0.12 ( 0%) 0.08 ( 0%) 0.15 ( 0%)
78337 kB ( 2%)
tree CFG cleanup : 0.58 ( 1%) 0.17 ( 1%) 0.90 ( 1%)
0 kB ( 0%)
tree tail merge : 0.10 ( 0%) 0.04 ( 0%) 0.10 ( 0%)
0 kB ( 0%)
tree VRP : 0.76 ( 1%) 0.22 ( 1%) 1.09 ( 1%)
147458 kB ( 4%)
tree Early VRP : 0.15 ( 0%) 0.13 ( 1%) 0.17 ( 0%)
68609 kB ( 2%)
tree copy propagation : 0.22 ( 0%) 0.09 ( 0%) 0.21 ( 0%)
0 kB ( 0%)
tree PTA : 1.04 ( 2%) 0.44 ( 2%) 1.72 ( 2%)
6144 kB ( 0%)
tree PHI insertion : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 kB ( 0%)
tree SSA rewrite : 0.13 ( 0%) 0.05 ( 0%) 0.17 ( 0%)
34302 kB ( 1%)
tree SSA other : 0.19 ( 0%) 0.16 ( 1%) 0.35 ( 0%)
9216 kB ( 0%)
tree SSA incremental : 0.06 ( 0%) 0.01 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
tree operand scan : 0.20 ( 0%) 0.12 ( 1%) 0.25 ( 0%)
75284 kB ( 2%)
dominator optimization : 0.51 ( 1%) 0.25 ( 1%) 0.79 ( 1%)
10240 kB ( 0%)
backwards jump threading : 0.26 ( 0%) 0.13 ( 1%) 0.40 ( 1%)
0 kB ( 0%)
tree SRA : 0.05 ( 0%) 0.06 ( 0%) 0.10 ( 0%)
0 kB ( 0%)
isolate eroneous paths : 0.07 ( 0%) 0.02 ( 0%) 0.15 ( 0%)
0 kB ( 0%)
tree CCP : 0.50 ( 1%) 0.22 ( 1%) 0.83 ( 1%)
8192 kB ( 0%)
tree split crit edges : 0.01 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
tree reassociation : 0.13 ( 0%) 0.13 ( 1%) 0.17 ( 0%)
0 kB ( 0%)
tree PRE : 0.80 ( 1%) 0.22 ( 1%) 1.36 ( 2%)
83969 kB ( 2%)
tree FRE : 0.65 ( 1%) 0.33 ( 2%) 1.05 ( 1%)
46080 kB ( 1%)
tree code sinking : 0.10 ( 0%) 0.01 ( 0%) 0.11 ( 0%)
0 kB ( 0%)
tree linearize phis : 0.18 ( 0%) 0.15 ( 1%) 0.22 ( 0%)
68609 kB ( 2%)
tree backward propagate : 0.08 ( 0%) 0.03 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
tree forward propagate : 0.24 ( 0%) 0.08 ( 0%) 0.22 ( 0%)
0 kB ( 0%)
tree phiprop : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
tree conservative DCE : 0.24 ( 0%) 0.08 ( 0%) 0.48 ( 1%)
0 kB ( 0%)
tree aggressive DCE : 0.43 ( 1%) 0.17 ( 1%) 0.49 ( 1%)
137218 kB ( 3%)
tree buildin call DCE : 0.03 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
tree DSE : 0.11 ( 0%) 0.01 ( 0%) 0.18 ( 0%)
0 kB ( 0%)
PHI merge : 0.05 ( 0%) 0.07 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
loopless fn : 0.00 ( 0%) 0.01 ( 0%) 0.02 ( 0%)
0 kB ( 0%)
tree loop invariant motion : 0.04 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
complete unrolling : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
tree copy headers : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
tree SSA uncprop : 0.16 ( 0%) 0.07 ( 0%) 0.31 ( 0%)
0 kB ( 0%)
tree NRV optimization : 0.08 ( 0%) 0.03 ( 0%) 0.07 ( 0%)
1536 kB ( 0%)
tree switch conversion : 0.02 ( 0%) 0.01 ( 0%) 0.02 ( 0%)
0 kB ( 0%)
tree switch lowering : 0.02 ( 0%) 0.03 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
gimple CSE sin/cos : 0.04 ( 0%) 0.02 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
gimple widening/fma detection : 0.06 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
tree strlen optimization : 0.11 ( 0%) 0.02 ( 0%) 0.18 ( 0%)
68609 kB ( 2%)
dominance frontiers : 0.03 ( 0%) 0.02 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
dominance computation : 2.37 ( 4%) 1.13 ( 5%) 3.83 ( 5%)
0 kB ( 0%)
control dependences : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
out of ssa : 0.33 ( 1%) 0.10 ( 0%) 0.38 ( 0%)
11776 kB ( 0%)
expand vars : 0.04 ( 0%) 0.02 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
expand : 0.61 ( 1%) 0.22 ( 1%) 0.95 ( 1%)
124618 kB ( 3%)
post expand cleanups : 0.22 ( 0%) 0.07 ( 0%) 0.27 ( 0%)
30720 kB ( 1%)
lower subreg : 0.06 ( 0%) 0.02 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
jump : 0.13 ( 0%) 0.03 ( 0%) 0.17 ( 0%)
0 kB ( 0%)
forward prop : 0.74 ( 1%) 0.29 ( 1%) 0.89 ( 1%)
0 kB ( 0%)
CSE : 0.68 ( 1%) 0.27 ( 1%) 0.77 ( 1%)
1468 kB ( 0%)
dead code elimination : 0.36 ( 1%) 0.10 ( 0%) 0.46 ( 1%)
0 kB ( 0%)
dead store elim1 : 0.38 ( 1%) 0.07 ( 0%) 0.43 ( 1%)
0 kB ( 0%)
dead store elim2 : 0.43 ( 1%) 0.04 ( 0%) 0.62 ( 1%)
0 kB ( 0%)
loop analysis : 0.12 ( 0%) 0.04 ( 0%) 0.09 ( 0%)
0 kB ( 0%)
loop init : 1.05 ( 2%) 0.52 ( 3%) 1.66 ( 2%)
245251 kB ( 6%)
loop invariant motion : 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
loop fini : 0.43 ( 1%) 0.18 ( 1%) 0.61 ( 1%)
0 kB ( 0%)
CPROP : 0.10 ( 0%) 0.08 ( 0%) 0.21 ( 0%)
0 kB ( 0%)
PRE : 0.07 ( 0%) 0.01 ( 0%) 0.06 ( 0%)
0 kB ( 0%)
CSE 2 : 0.36 ( 1%) 0.13 ( 1%) 0.46 ( 1%)
1536 kB ( 0%)
branch prediction : 0.25 ( 0%) 0.10 ( 0%) 0.17 ( 0%)
6656 kB ( 0%)
combiner : 0.57 ( 1%) 0.11 ( 1%) 0.75 ( 1%)
4096 kB ( 0%)
if-conversion : 0.21 ( 0%) 0.11 ( 1%) 0.35 ( 0%)
0 kB ( 0%)
mode switching : 0.01 ( 0%) 0.01 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
integrated RA : 3.51 ( 6%) 1.24 ( 6%) 4.80 ( 6%)
1578520 kB ( 38%)
LRA non-specific : 1.16 ( 2%) 0.51 ( 2%) 1.68 ( 2%)
3584 kB ( 0%)
LRA virtuals elimination : 0.25 ( 0%) 0.04 ( 0%) 0.37 ( 0%)
0 kB ( 0%)
LRA reload inheritance : 0.11 ( 0%) 0.05 ( 0%) 0.14 ( 0%)
0 kB ( 0%)
LRA create live ranges : 0.03 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
LRA hard reg assignment : 0.11 ( 0%) 0.06 ( 0%) 0.19 ( 0%)
0 kB ( 0%)
reload : 0.14 ( 0%) 0.03 ( 0%) 0.21 ( 0%)
0 kB ( 0%)
reload CSE regs : 0.77 ( 1%) 0.16 ( 1%) 0.87 ( 1%)
4608 kB ( 0%)
ree : 0.23 ( 0%) 0.04 ( 0%) 0.32 ( 0%)
5120 kB ( 0%)
thread pro- & epilogue : 0.54 ( 1%) 0.17 ( 1%) 0.59 ( 1%)
56321 kB ( 1%)
if-conversion 2 : 0.11 ( 0%) 0.04 ( 0%) 0.16 ( 0%)
0 kB ( 0%)
combine stack adjustments : 0.10 ( 0%) 0.02 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
peephole 2 : 0.27 ( 0%) 0.01 ( 0%) 0.30 ( 0%)
9728 kB ( 0%)
hard reg cprop : 0.46 ( 1%) 0.09 ( 0%) 0.46 ( 1%)
0 kB ( 0%)
scheduling 2 : 2.90 ( 5%) 0.46 ( 2%) 3.30 ( 4%)
29555 kB ( 1%)
machine dep reorg : 0.32 ( 1%) 0.13 ( 1%) 0.38 ( 0%)
0 kB ( 0%)
reorder blocks : 0.12 ( 0%) 0.09 ( 0%) 0.26 ( 0%)
0 kB ( 0%)
shorten branches : 0.18 ( 0%) 0.06 ( 0%) 0.24 ( 0%)
0 kB ( 0%)
reg stack : 0.07 ( 0%) 0.00 ( 0%) 0.08 ( 0%)
0 kB ( 0%)
final : 1.44 ( 2%) 0.52 ( 3%) 1.95 ( 2%)
73729 kB ( 2%)
variable output : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 kB ( 0%)
symout : 0.04 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
tree if-combine : 0.01 ( 0%) 0.02 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
straight-line strength reduction : 0.14 ( 0%) 0.02 ( 0%) 0.18 ( 0%)
0 kB ( 0%)
store merging : 0.02 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
initialize rtl : 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
12 kB ( 0%)
address lowering : 0.00 ( 0%) 0.02 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
early local passes : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
unaccounted optimizations : 0.01 ( 0%) 0.02 ( 0%) 0.03 ( 0%)
0 kB ( 0%)
rest of compilation : 6.34 ( 11%) 2.47 ( 12%) 8.47 ( 11%)
155650 kB ( 4%)
unaccounted post reload : 0.04 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
0 kB ( 0%)
unaccounted late compilation : 0.01 ( 0%) 0.03 ( 0%) 0.00 ( 0%)
0 kB ( 0%)
remove unused locals : 0.16 ( 0%) 0.07 ( 0%) 0.22 ( 0%)
0 kB ( 0%)
address taken : 0.17 ( 0%) 0.03 ( 0%) 0.15 ( 0%)
0 kB ( 0%)
repair loop structures : 0.01 ( 0%) 0.02 ( 0%) 0.04 ( 0%)
0 kB ( 0%)
TOTAL : 58.11 20.59 78.83
4108169 kB
So we still have memory use issue at least. Since original reporter says
700MB, I guess it is 30% regression?
Memory use from parsing to late opts is:
Analyzing compilation unit
{GC madv_dontneed 336k} {GC 262144k -> 235813k} {GC released 14336k
madv_dontneed 472k} {GC 472181k -> 411705k}Performing interprocedural
optimizations
<*free_lang_data> {heap 13388k} <visibility> {heap 15528k} <build_ssa_passes>
{heap 15528k} <opt_local_passes> {heap 19652k} <remove_symbols> {heap 84292k}
<targetclone> {heap 84292k} <free-fnsummary> {heap 84292k}Streaming LTO
<whole-program> {GC released 16384k madv_dontneed 808k} {GC 870998k ->
510544k} {heap 104288k} <profile_estimate> {heap 104288k} <icf> {heap 113524k}
<devirt> {heap 113524k} <cp> {heap 113524k} <sra> {heap 113524k} <fnsummary>
{heap 113524k} <inline> {heap 113524k} <pure-const> {heap 113524k}
<free-fnsummary> {heap 113524k} <static-var> {heap 113524k} <single-use> {heap
113524k} <comdats> {heap 113524k}Assembling functions:
So starting with cca 472MB of GGC memory and 133MB of heap we get to about
870GB for GGC during earlyopts it seems.
Seems a lot of memory is taken by IRA, too.
I was originally assigned for the inliner issue which is however solved now :)