https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW Assignee|hubicka at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #38 from Jan Hubicka <hubicka at gcc dot gnu.org> --- .... it is GCC10 but I finally managed to implement the incremental update here. Memory use is about 1.1GB but inliner finishes quite quickly: Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1237 kB ( 0%) phase parsing : 1.29 ( 2%) 1.24 ( 6%) 2.54 ( 3%) 247897 kB ( 6%) phase lang. deferred : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) phase opt and generate : 56.81 ( 98%) 19.35 ( 94%) 76.27 ( 97%) 3859026 kB ( 94%) garbage collection : 0.84 ( 1%) 0.10 ( 0%) 0.93 ( 1%) 0 kB ( 0%) dump files : 3.28 ( 6%) 1.85 ( 9%) 5.30 ( 7%) 0 kB ( 0%) callgraph construction : 0.70 ( 1%) 0.28 ( 1%) 1.07 ( 1%) 99328 kB ( 2%) callgraph optimization : 1.38 ( 2%) 0.74 ( 4%) 2.03 ( 3%) 1026 kB ( 0%) callgraph functions expansion : 47.27 ( 81%) 15.51 ( 75%) 62.89 ( 80%) 2827825 kB ( 69%) callgraph ipa passes : 8.19 ( 14%) 3.26 ( 16%) 11.45 ( 15%) 709147 kB ( 17%) ipa function summary : 0.34 ( 1%) 0.08 ( 0%) 0.43 ( 1%) 97794 kB ( 2%) ipa dead code removal : 0.25 ( 0%) 0.01 ( 0%) 0.27 ( 0%) 0 kB ( 0%) ipa inheritance graph : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) ipa devirtualization : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) ipa cp : 0.23 ( 0%) 0.02 ( 0%) 0.27 ( 0%) 7169 kB ( 0%) ipa inlining heuristics : 0.19 ( 0%) 0.00 ( 0%) 0.22 ( 0%) 0 kB ( 0%) ipa function splitting : 0.02 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 0 kB ( 0%) ipa comdats : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) ipa various optimizations : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 kB ( 0%) ipa reference : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 kB ( 0%) ipa profile : 0.07 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 kB ( 0%) ipa pure const : 0.45 ( 1%) 0.15 ( 1%) 0.47 ( 1%) 0 kB ( 0%) ipa icf : 0.22 ( 0%) 0.01 ( 0%) 0.23 ( 0%) 0 kB ( 0%) ipa SRA : 0.13 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 5120 kB ( 0%) ipa free lang data : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) ipa free inline summary : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 kB ( 0%) cfg construction : 0.07 ( 0%) 0.01 ( 0%) 0.19 ( 0%) 0 kB ( 0%) cfg cleanup : 0.73 ( 1%) 0.23 ( 1%) 0.95 ( 1%) 0 kB ( 0%) trivially dead code : 0.30 ( 1%) 0.06 ( 0%) 0.30 ( 0%) 0 kB ( 0%) df scan insns : 0.81 ( 1%) 0.21 ( 1%) 0.93 ( 1%) 3072 kB ( 0%) df multiple defs : 0.28 ( 0%) 0.06 ( 0%) 0.41 ( 1%) 0 kB ( 0%) df reaching defs : 1.48 ( 3%) 0.20 ( 1%) 1.63 ( 2%) 0 kB ( 0%) df live regs : 1.12 ( 2%) 0.26 ( 1%) 1.33 ( 2%) 0 kB ( 0%) df live&initialized regs : 0.51 ( 1%) 0.19 ( 1%) 0.66 ( 1%) 0 kB ( 0%) df must-initialized regs : 0.11 ( 0%) 0.06 ( 0%) 0.14 ( 0%) 0 kB ( 0%) df use-def / def-use chains : 0.36 ( 1%) 0.04 ( 0%) 0.43 ( 1%) 0 kB ( 0%) df reg dead/unused notes : 1.69 ( 3%) 0.20 ( 1%) 1.81 ( 2%) 12288 kB ( 0%) register information : 0.38 ( 1%) 0.04 ( 0%) 0.39 ( 0%) 0 kB ( 0%) alias analysis : 0.82 ( 1%) 0.17 ( 1%) 1.15 ( 1%) 36865 kB ( 1%) alias stmt walking : 0.06 ( 0%) 0.04 ( 0%) 0.07 ( 0%) 0 kB ( 0%) register scan : 0.07 ( 0%) 0.03 ( 0%) 0.11 ( 0%) 0 kB ( 0%) rebuild jump labels : 0.16 ( 0%) 0.06 ( 0%) 0.14 ( 0%) 0 kB ( 0%) preprocessing : 0.39 ( 1%) 0.32 ( 2%) 0.49 ( 1%) 44508 kB ( 1%) lexical analysis : 0.32 ( 1%) 0.39 ( 2%) 0.73 ( 1%) 0 kB ( 0%) parser (global) : 0.11 ( 0%) 0.08 ( 0%) 0.27 ( 0%) 38009 kB ( 1%) parser function body : 0.48 ( 1%) 0.45 ( 2%) 1.06 ( 1%) 165379 kB ( 4%) early inlining heuristics : 0.14 ( 0%) 0.03 ( 0%) 0.16 ( 0%) 51712 kB ( 1%) inline parameters : 0.51 ( 1%) 0.16 ( 1%) 0.72 ( 1%) 134145 kB ( 3%) integration : 0.39 ( 1%) 0.06 ( 0%) 0.44 ( 1%) 70655 kB ( 2%) tree gimplify : 0.25 ( 0%) 0.15 ( 1%) 0.41 ( 1%) 153090 kB ( 4%) tree eh : 0.05 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 0 kB ( 0%) tree CFG construction : 0.12 ( 0%) 0.08 ( 0%) 0.15 ( 0%) 78337 kB ( 2%) tree CFG cleanup : 0.58 ( 1%) 0.17 ( 1%) 0.90 ( 1%) 0 kB ( 0%) tree tail merge : 0.10 ( 0%) 0.04 ( 0%) 0.10 ( 0%) 0 kB ( 0%) tree VRP : 0.76 ( 1%) 0.22 ( 1%) 1.09 ( 1%) 147458 kB ( 4%) tree Early VRP : 0.15 ( 0%) 0.13 ( 1%) 0.17 ( 0%) 68609 kB ( 2%) tree copy propagation : 0.22 ( 0%) 0.09 ( 0%) 0.21 ( 0%) 0 kB ( 0%) tree PTA : 1.04 ( 2%) 0.44 ( 2%) 1.72 ( 2%) 6144 kB ( 0%) tree PHI insertion : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) tree SSA rewrite : 0.13 ( 0%) 0.05 ( 0%) 0.17 ( 0%) 34302 kB ( 1%) tree SSA other : 0.19 ( 0%) 0.16 ( 1%) 0.35 ( 0%) 9216 kB ( 0%) tree SSA incremental : 0.06 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 0 kB ( 0%) tree operand scan : 0.20 ( 0%) 0.12 ( 1%) 0.25 ( 0%) 75284 kB ( 2%) dominator optimization : 0.51 ( 1%) 0.25 ( 1%) 0.79 ( 1%) 10240 kB ( 0%) backwards jump threading : 0.26 ( 0%) 0.13 ( 1%) 0.40 ( 1%) 0 kB ( 0%) tree SRA : 0.05 ( 0%) 0.06 ( 0%) 0.10 ( 0%) 0 kB ( 0%) isolate eroneous paths : 0.07 ( 0%) 0.02 ( 0%) 0.15 ( 0%) 0 kB ( 0%) tree CCP : 0.50 ( 1%) 0.22 ( 1%) 0.83 ( 1%) 8192 kB ( 0%) tree split crit edges : 0.01 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 0 kB ( 0%) tree reassociation : 0.13 ( 0%) 0.13 ( 1%) 0.17 ( 0%) 0 kB ( 0%) tree PRE : 0.80 ( 1%) 0.22 ( 1%) 1.36 ( 2%) 83969 kB ( 2%) tree FRE : 0.65 ( 1%) 0.33 ( 2%) 1.05 ( 1%) 46080 kB ( 1%) tree code sinking : 0.10 ( 0%) 0.01 ( 0%) 0.11 ( 0%) 0 kB ( 0%) tree linearize phis : 0.18 ( 0%) 0.15 ( 1%) 0.22 ( 0%) 68609 kB ( 2%) tree backward propagate : 0.08 ( 0%) 0.03 ( 0%) 0.04 ( 0%) 0 kB ( 0%) tree forward propagate : 0.24 ( 0%) 0.08 ( 0%) 0.22 ( 0%) 0 kB ( 0%) tree phiprop : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) tree conservative DCE : 0.24 ( 0%) 0.08 ( 0%) 0.48 ( 1%) 0 kB ( 0%) tree aggressive DCE : 0.43 ( 1%) 0.17 ( 1%) 0.49 ( 1%) 137218 kB ( 3%) tree buildin call DCE : 0.03 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 0 kB ( 0%) tree DSE : 0.11 ( 0%) 0.01 ( 0%) 0.18 ( 0%) 0 kB ( 0%) PHI merge : 0.05 ( 0%) 0.07 ( 0%) 0.06 ( 0%) 0 kB ( 0%) loopless fn : 0.00 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 0 kB ( 0%) tree loop invariant motion : 0.04 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 0 kB ( 0%) complete unrolling : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) tree copy headers : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) tree SSA uncprop : 0.16 ( 0%) 0.07 ( 0%) 0.31 ( 0%) 0 kB ( 0%) tree NRV optimization : 0.08 ( 0%) 0.03 ( 0%) 0.07 ( 0%) 1536 kB ( 0%) tree switch conversion : 0.02 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 0 kB ( 0%) tree switch lowering : 0.02 ( 0%) 0.03 ( 0%) 0.06 ( 0%) 0 kB ( 0%) gimple CSE sin/cos : 0.04 ( 0%) 0.02 ( 0%) 0.06 ( 0%) 0 kB ( 0%) gimple widening/fma detection : 0.06 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 0 kB ( 0%) tree strlen optimization : 0.11 ( 0%) 0.02 ( 0%) 0.18 ( 0%) 68609 kB ( 2%) dominance frontiers : 0.03 ( 0%) 0.02 ( 0%) 0.03 ( 0%) 0 kB ( 0%) dominance computation : 2.37 ( 4%) 1.13 ( 5%) 3.83 ( 5%) 0 kB ( 0%) control dependences : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) out of ssa : 0.33 ( 1%) 0.10 ( 0%) 0.38 ( 0%) 11776 kB ( 0%) expand vars : 0.04 ( 0%) 0.02 ( 0%) 0.06 ( 0%) 0 kB ( 0%) expand : 0.61 ( 1%) 0.22 ( 1%) 0.95 ( 1%) 124618 kB ( 3%) post expand cleanups : 0.22 ( 0%) 0.07 ( 0%) 0.27 ( 0%) 30720 kB ( 1%) lower subreg : 0.06 ( 0%) 0.02 ( 0%) 0.04 ( 0%) 0 kB ( 0%) jump : 0.13 ( 0%) 0.03 ( 0%) 0.17 ( 0%) 0 kB ( 0%) forward prop : 0.74 ( 1%) 0.29 ( 1%) 0.89 ( 1%) 0 kB ( 0%) CSE : 0.68 ( 1%) 0.27 ( 1%) 0.77 ( 1%) 1468 kB ( 0%) dead code elimination : 0.36 ( 1%) 0.10 ( 0%) 0.46 ( 1%) 0 kB ( 0%) dead store elim1 : 0.38 ( 1%) 0.07 ( 0%) 0.43 ( 1%) 0 kB ( 0%) dead store elim2 : 0.43 ( 1%) 0.04 ( 0%) 0.62 ( 1%) 0 kB ( 0%) loop analysis : 0.12 ( 0%) 0.04 ( 0%) 0.09 ( 0%) 0 kB ( 0%) loop init : 1.05 ( 2%) 0.52 ( 3%) 1.66 ( 2%) 245251 kB ( 6%) loop invariant motion : 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 0 kB ( 0%) loop fini : 0.43 ( 1%) 0.18 ( 1%) 0.61 ( 1%) 0 kB ( 0%) CPROP : 0.10 ( 0%) 0.08 ( 0%) 0.21 ( 0%) 0 kB ( 0%) PRE : 0.07 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 0 kB ( 0%) CSE 2 : 0.36 ( 1%) 0.13 ( 1%) 0.46 ( 1%) 1536 kB ( 0%) branch prediction : 0.25 ( 0%) 0.10 ( 0%) 0.17 ( 0%) 6656 kB ( 0%) combiner : 0.57 ( 1%) 0.11 ( 1%) 0.75 ( 1%) 4096 kB ( 0%) if-conversion : 0.21 ( 0%) 0.11 ( 1%) 0.35 ( 0%) 0 kB ( 0%) mode switching : 0.01 ( 0%) 0.01 ( 0%) 0.04 ( 0%) 0 kB ( 0%) integrated RA : 3.51 ( 6%) 1.24 ( 6%) 4.80 ( 6%) 1578520 kB ( 38%) LRA non-specific : 1.16 ( 2%) 0.51 ( 2%) 1.68 ( 2%) 3584 kB ( 0%) LRA virtuals elimination : 0.25 ( 0%) 0.04 ( 0%) 0.37 ( 0%) 0 kB ( 0%) LRA reload inheritance : 0.11 ( 0%) 0.05 ( 0%) 0.14 ( 0%) 0 kB ( 0%) LRA create live ranges : 0.03 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) LRA hard reg assignment : 0.11 ( 0%) 0.06 ( 0%) 0.19 ( 0%) 0 kB ( 0%) reload : 0.14 ( 0%) 0.03 ( 0%) 0.21 ( 0%) 0 kB ( 0%) reload CSE regs : 0.77 ( 1%) 0.16 ( 1%) 0.87 ( 1%) 4608 kB ( 0%) ree : 0.23 ( 0%) 0.04 ( 0%) 0.32 ( 0%) 5120 kB ( 0%) thread pro- & epilogue : 0.54 ( 1%) 0.17 ( 1%) 0.59 ( 1%) 56321 kB ( 1%) if-conversion 2 : 0.11 ( 0%) 0.04 ( 0%) 0.16 ( 0%) 0 kB ( 0%) combine stack adjustments : 0.10 ( 0%) 0.02 ( 0%) 0.03 ( 0%) 0 kB ( 0%) peephole 2 : 0.27 ( 0%) 0.01 ( 0%) 0.30 ( 0%) 9728 kB ( 0%) hard reg cprop : 0.46 ( 1%) 0.09 ( 0%) 0.46 ( 1%) 0 kB ( 0%) scheduling 2 : 2.90 ( 5%) 0.46 ( 2%) 3.30 ( 4%) 29555 kB ( 1%) machine dep reorg : 0.32 ( 1%) 0.13 ( 1%) 0.38 ( 0%) 0 kB ( 0%) reorder blocks : 0.12 ( 0%) 0.09 ( 0%) 0.26 ( 0%) 0 kB ( 0%) shorten branches : 0.18 ( 0%) 0.06 ( 0%) 0.24 ( 0%) 0 kB ( 0%) reg stack : 0.07 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 0 kB ( 0%) final : 1.44 ( 2%) 0.52 ( 3%) 1.95 ( 2%) 73729 kB ( 2%) variable output : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) symout : 0.04 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%) tree if-combine : 0.01 ( 0%) 0.02 ( 0%) 0.03 ( 0%) 0 kB ( 0%) straight-line strength reduction : 0.14 ( 0%) 0.02 ( 0%) 0.18 ( 0%) 0 kB ( 0%) store merging : 0.02 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 0 kB ( 0%) initialize rtl : 0.01 ( 0%) 0.01 ( 0%) 0.03 ( 0%) 12 kB ( 0%) address lowering : 0.00 ( 0%) 0.02 ( 0%) 0.04 ( 0%) 0 kB ( 0%) early local passes : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%) unaccounted optimizations : 0.01 ( 0%) 0.02 ( 0%) 0.03 ( 0%) 0 kB ( 0%) rest of compilation : 6.34 ( 11%) 2.47 ( 12%) 8.47 ( 11%) 155650 kB ( 4%) unaccounted post reload : 0.04 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 0 kB ( 0%) unaccounted late compilation : 0.01 ( 0%) 0.03 ( 0%) 0.00 ( 0%) 0 kB ( 0%) remove unused locals : 0.16 ( 0%) 0.07 ( 0%) 0.22 ( 0%) 0 kB ( 0%) address taken : 0.17 ( 0%) 0.03 ( 0%) 0.15 ( 0%) 0 kB ( 0%) repair loop structures : 0.01 ( 0%) 0.02 ( 0%) 0.04 ( 0%) 0 kB ( 0%) TOTAL : 58.11 20.59 78.83 4108169 kB So we still have memory use issue at least. Since original reporter says 700MB, I guess it is 30% regression? Memory use from parsing to late opts is: Analyzing compilation unit {GC madv_dontneed 336k} {GC 262144k -> 235813k} {GC released 14336k madv_dontneed 472k} {GC 472181k -> 411705k}Performing interprocedural optimizations <*free_lang_data> {heap 13388k} <visibility> {heap 15528k} <build_ssa_passes> {heap 15528k} <opt_local_passes> {heap 19652k} <remove_symbols> {heap 84292k} <targetclone> {heap 84292k} <free-fnsummary> {heap 84292k}Streaming LTO <whole-program> {GC released 16384k madv_dontneed 808k} {GC 870998k -> 510544k} {heap 104288k} <profile_estimate> {heap 104288k} <icf> {heap 113524k} <devirt> {heap 113524k} <cp> {heap 113524k} <sra> {heap 113524k} <fnsummary> {heap 113524k} <inline> {heap 113524k} <pure-const> {heap 113524k} <free-fnsummary> {heap 113524k} <static-var> {heap 113524k} <single-use> {heap 113524k} <comdats> {heap 113524k}Assembling functions: So starting with cca 472MB of GGC memory and 133MB of heap we get to about 870GB for GGC during earlyopts it seems. Seems a lot of memory is taken by IRA, too. I was originally assigned for the inliner issue which is however solved now :)