https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Assignee|hubicka at gcc dot gnu.org         |unassigned at gcc dot 
gnu.org

--- Comment #38 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
.... it is GCC10 but I finally managed to implement the incremental update
here.
Memory use is about 1.1GB but inliner finishes quite quickly:

Time variable                                   usr           sys          wall
              GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1237 kB (  0%)
 phase parsing                      :   1.29 (  2%)   1.24 (  6%)   2.54 (  3%)
 247897 kB (  6%)
 phase lang. deferred               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 phase opt and generate             :  56.81 ( 98%)  19.35 ( 94%)  76.27 ( 97%)
3859026 kB ( 94%)
 garbage collection                 :   0.84 (  1%)   0.10 (  0%)   0.93 (  1%)
      0 kB (  0%)
 dump files                         :   3.28 (  6%)   1.85 (  9%)   5.30 (  7%)
      0 kB (  0%)
 callgraph construction             :   0.70 (  1%)   0.28 (  1%)   1.07 (  1%)
  99328 kB (  2%)
 callgraph optimization             :   1.38 (  2%)   0.74 (  4%)   2.03 (  3%)
   1026 kB (  0%)
 callgraph functions expansion      :  47.27 ( 81%)  15.51 ( 75%)  62.89 ( 80%)
2827825 kB ( 69%)
 callgraph ipa passes               :   8.19 ( 14%)   3.26 ( 16%)  11.45 ( 15%)
 709147 kB ( 17%)
 ipa function summary               :   0.34 (  1%)   0.08 (  0%)   0.43 (  1%)
  97794 kB (  2%)
 ipa dead code removal              :   0.25 (  0%)   0.01 (  0%)   0.27 (  0%)
      0 kB (  0%)
 ipa inheritance graph              :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 ipa devirtualization               :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 ipa cp                             :   0.23 (  0%)   0.02 (  0%)   0.27 (  0%)
   7169 kB (  0%)
 ipa inlining heuristics            :   0.19 (  0%)   0.00 (  0%)   0.22 (  0%)
      0 kB (  0%)
 ipa function splitting             :   0.02 (  0%)   0.01 (  0%)   0.06 (  0%)
      0 kB (  0%)
 ipa comdats                        :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
      0 kB (  0%)
 ipa various optimizations          :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
      0 kB (  0%)
 ipa reference                      :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)
      0 kB (  0%)
 ipa profile                        :   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)
      0 kB (  0%)
 ipa pure const                     :   0.45 (  1%)   0.15 (  1%)   0.47 (  1%)
      0 kB (  0%)
 ipa icf                            :   0.22 (  0%)   0.01 (  0%)   0.23 (  0%)
      0 kB (  0%)
 ipa SRA                            :   0.13 (  0%)   0.00 (  0%)   0.14 (  0%)
   5120 kB (  0%)
 ipa free lang data                 :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 ipa free inline summary            :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
      0 kB (  0%)
 cfg construction                   :   0.07 (  0%)   0.01 (  0%)   0.19 (  0%)
      0 kB (  0%)
 cfg cleanup                        :   0.73 (  1%)   0.23 (  1%)   0.95 (  1%)
      0 kB (  0%)
 trivially dead code                :   0.30 (  1%)   0.06 (  0%)   0.30 (  0%)
      0 kB (  0%)
 df scan insns                      :   0.81 (  1%)   0.21 (  1%)   0.93 (  1%)
   3072 kB (  0%)
 df multiple defs                   :   0.28 (  0%)   0.06 (  0%)   0.41 (  1%)
      0 kB (  0%)
 df reaching defs                   :   1.48 (  3%)   0.20 (  1%)   1.63 (  2%)
      0 kB (  0%)
 df live regs                       :   1.12 (  2%)   0.26 (  1%)   1.33 (  2%)
      0 kB (  0%)
 df live&initialized regs           :   0.51 (  1%)   0.19 (  1%)   0.66 (  1%)
      0 kB (  0%)
 df must-initialized regs           :   0.11 (  0%)   0.06 (  0%)   0.14 (  0%)
      0 kB (  0%)
 df use-def / def-use chains        :   0.36 (  1%)   0.04 (  0%)   0.43 (  1%)
      0 kB (  0%)
 df reg dead/unused notes           :   1.69 (  3%)   0.20 (  1%)   1.81 (  2%)
  12288 kB (  0%)
 register information               :   0.38 (  1%)   0.04 (  0%)   0.39 (  0%)
      0 kB (  0%)
 alias analysis                     :   0.82 (  1%)   0.17 (  1%)   1.15 (  1%)
  36865 kB (  1%)
 alias stmt walking                 :   0.06 (  0%)   0.04 (  0%)   0.07 (  0%)
      0 kB (  0%)
 register scan                      :   0.07 (  0%)   0.03 (  0%)   0.11 (  0%)
      0 kB (  0%)
 rebuild jump labels                :   0.16 (  0%)   0.06 (  0%)   0.14 (  0%)
      0 kB (  0%)
 preprocessing                      :   0.39 (  1%)   0.32 (  2%)   0.49 (  1%)
  44508 kB (  1%)
 lexical analysis                   :   0.32 (  1%)   0.39 (  2%)   0.73 (  1%)
      0 kB (  0%)
 parser (global)                    :   0.11 (  0%)   0.08 (  0%)   0.27 (  0%)
  38009 kB (  1%)
 parser function body               :   0.48 (  1%)   0.45 (  2%)   1.06 (  1%)
 165379 kB (  4%)
 early inlining heuristics          :   0.14 (  0%)   0.03 (  0%)   0.16 (  0%)
  51712 kB (  1%)
 inline parameters                  :   0.51 (  1%)   0.16 (  1%)   0.72 (  1%)
 134145 kB (  3%)
 integration                        :   0.39 (  1%)   0.06 (  0%)   0.44 (  1%)
  70655 kB (  2%)
 tree gimplify                      :   0.25 (  0%)   0.15 (  1%)   0.41 (  1%)
 153090 kB (  4%)
 tree eh                            :   0.05 (  0%)   0.01 (  0%)   0.05 (  0%)
      0 kB (  0%)
 tree CFG construction              :   0.12 (  0%)   0.08 (  0%)   0.15 (  0%)
  78337 kB (  2%)
 tree CFG cleanup                   :   0.58 (  1%)   0.17 (  1%)   0.90 (  1%)
      0 kB (  0%)
 tree tail merge                    :   0.10 (  0%)   0.04 (  0%)   0.10 (  0%)
      0 kB (  0%)
 tree VRP                           :   0.76 (  1%)   0.22 (  1%)   1.09 (  1%)
 147458 kB (  4%)
 tree Early VRP                     :   0.15 (  0%)   0.13 (  1%)   0.17 (  0%)
  68609 kB (  2%)
 tree copy propagation              :   0.22 (  0%)   0.09 (  0%)   0.21 (  0%)
      0 kB (  0%)
 tree PTA                           :   1.04 (  2%)   0.44 (  2%)   1.72 (  2%)
   6144 kB (  0%)
 tree PHI insertion                 :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
      0 kB (  0%)
 tree SSA rewrite                   :   0.13 (  0%)   0.05 (  0%)   0.17 (  0%)
  34302 kB (  1%)
 tree SSA other                     :   0.19 (  0%)   0.16 (  1%)   0.35 (  0%)
   9216 kB (  0%)
 tree SSA incremental               :   0.06 (  0%)   0.01 (  0%)   0.06 (  0%)
      0 kB (  0%)
 tree operand scan                  :   0.20 (  0%)   0.12 (  1%)   0.25 (  0%)
  75284 kB (  2%)
 dominator optimization             :   0.51 (  1%)   0.25 (  1%)   0.79 (  1%)
  10240 kB (  0%)
 backwards jump threading           :   0.26 (  0%)   0.13 (  1%)   0.40 (  1%)
      0 kB (  0%)
 tree SRA                           :   0.05 (  0%)   0.06 (  0%)   0.10 (  0%)
      0 kB (  0%)
 isolate eroneous paths             :   0.07 (  0%)   0.02 (  0%)   0.15 (  0%)
      0 kB (  0%)
 tree CCP                           :   0.50 (  1%)   0.22 (  1%)   0.83 (  1%)
   8192 kB (  0%)
 tree split crit edges              :   0.01 (  0%)   0.01 (  0%)   0.05 (  0%)
      0 kB (  0%)
 tree reassociation                 :   0.13 (  0%)   0.13 (  1%)   0.17 (  0%)
      0 kB (  0%)
 tree PRE                           :   0.80 (  1%)   0.22 (  1%)   1.36 (  2%)
  83969 kB (  2%)
 tree FRE                           :   0.65 (  1%)   0.33 (  2%)   1.05 (  1%)
  46080 kB (  1%)
 tree code sinking                  :   0.10 (  0%)   0.01 (  0%)   0.11 (  0%)
      0 kB (  0%)
 tree linearize phis                :   0.18 (  0%)   0.15 (  1%)   0.22 (  0%)
  68609 kB (  2%)
 tree backward propagate            :   0.08 (  0%)   0.03 (  0%)   0.04 (  0%)
      0 kB (  0%)
 tree forward propagate             :   0.24 (  0%)   0.08 (  0%)   0.22 (  0%)
      0 kB (  0%)
 tree phiprop                       :   0.02 (  0%)   0.00 (  0%)   0.05 (  0%)
      0 kB (  0%)
 tree conservative DCE              :   0.24 (  0%)   0.08 (  0%)   0.48 (  1%)
      0 kB (  0%)
 tree aggressive DCE                :   0.43 (  1%)   0.17 (  1%)   0.49 (  1%)
 137218 kB (  3%)
 tree buildin call DCE              :   0.03 (  0%)   0.01 (  0%)   0.05 (  0%)
      0 kB (  0%)
 tree DSE                           :   0.11 (  0%)   0.01 (  0%)   0.18 (  0%)
      0 kB (  0%)
 PHI merge                          :   0.05 (  0%)   0.07 (  0%)   0.06 (  0%)
      0 kB (  0%)
 loopless fn                        :   0.00 (  0%)   0.01 (  0%)   0.02 (  0%)
      0 kB (  0%)
 tree loop invariant motion         :   0.04 (  0%)   0.01 (  0%)   0.03 (  0%)
      0 kB (  0%)
 complete unrolling                 :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 tree copy headers                  :   0.02 (  0%)   0.00 (  0%)   0.05 (  0%)
      0 kB (  0%)
 tree SSA uncprop                   :   0.16 (  0%)   0.07 (  0%)   0.31 (  0%)
      0 kB (  0%)
 tree NRV optimization              :   0.08 (  0%)   0.03 (  0%)   0.07 (  0%)
   1536 kB (  0%)
 tree switch conversion             :   0.02 (  0%)   0.01 (  0%)   0.02 (  0%)
      0 kB (  0%)
 tree switch lowering               :   0.02 (  0%)   0.03 (  0%)   0.06 (  0%)
      0 kB (  0%)
 gimple CSE sin/cos                 :   0.04 (  0%)   0.02 (  0%)   0.06 (  0%)
      0 kB (  0%)
 gimple widening/fma detection      :   0.06 (  0%)   0.01 (  0%)   0.03 (  0%)
      0 kB (  0%)
 tree strlen optimization           :   0.11 (  0%)   0.02 (  0%)   0.18 (  0%)
  68609 kB (  2%)
 dominance frontiers                :   0.03 (  0%)   0.02 (  0%)   0.03 (  0%)
      0 kB (  0%)
 dominance computation              :   2.37 (  4%)   1.13 (  5%)   3.83 (  5%)
      0 kB (  0%)
 control dependences                :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 out of ssa                         :   0.33 (  1%)   0.10 (  0%)   0.38 (  0%)
  11776 kB (  0%)
 expand vars                        :   0.04 (  0%)   0.02 (  0%)   0.06 (  0%)
      0 kB (  0%)
 expand                             :   0.61 (  1%)   0.22 (  1%)   0.95 (  1%)
 124618 kB (  3%)
 post expand cleanups               :   0.22 (  0%)   0.07 (  0%)   0.27 (  0%)
  30720 kB (  1%)
 lower subreg                       :   0.06 (  0%)   0.02 (  0%)   0.04 (  0%)
      0 kB (  0%)
 jump                               :   0.13 (  0%)   0.03 (  0%)   0.17 (  0%)
      0 kB (  0%)
 forward prop                       :   0.74 (  1%)   0.29 (  1%)   0.89 (  1%)
      0 kB (  0%)
 CSE                                :   0.68 (  1%)   0.27 (  1%)   0.77 (  1%)
   1468 kB (  0%)
 dead code elimination              :   0.36 (  1%)   0.10 (  0%)   0.46 (  1%)
      0 kB (  0%)
 dead store elim1                   :   0.38 (  1%)   0.07 (  0%)   0.43 (  1%)
      0 kB (  0%)
 dead store elim2                   :   0.43 (  1%)   0.04 (  0%)   0.62 (  1%)
      0 kB (  0%)
 loop analysis                      :   0.12 (  0%)   0.04 (  0%)   0.09 (  0%)
      0 kB (  0%)
 loop init                          :   1.05 (  2%)   0.52 (  3%)   1.66 (  2%)
 245251 kB (  6%)
 loop invariant motion              :   0.01 (  0%)   0.01 (  0%)   0.03 (  0%)
      0 kB (  0%)
 loop fini                          :   0.43 (  1%)   0.18 (  1%)   0.61 (  1%)
      0 kB (  0%)
 CPROP                              :   0.10 (  0%)   0.08 (  0%)   0.21 (  0%)
      0 kB (  0%)
 PRE                                :   0.07 (  0%)   0.01 (  0%)   0.06 (  0%)
      0 kB (  0%)
 CSE 2                              :   0.36 (  1%)   0.13 (  1%)   0.46 (  1%)
   1536 kB (  0%)
 branch prediction                  :   0.25 (  0%)   0.10 (  0%)   0.17 (  0%)
   6656 kB (  0%)
 combiner                           :   0.57 (  1%)   0.11 (  1%)   0.75 (  1%)
   4096 kB (  0%)
 if-conversion                      :   0.21 (  0%)   0.11 (  1%)   0.35 (  0%)
      0 kB (  0%)
 mode switching                     :   0.01 (  0%)   0.01 (  0%)   0.04 (  0%)
      0 kB (  0%)
 integrated RA                      :   3.51 (  6%)   1.24 (  6%)   4.80 (  6%)
1578520 kB ( 38%)
 LRA non-specific                   :   1.16 (  2%)   0.51 (  2%)   1.68 (  2%)
   3584 kB (  0%)
 LRA virtuals elimination           :   0.25 (  0%)   0.04 (  0%)   0.37 (  0%)
      0 kB (  0%)
 LRA reload inheritance             :   0.11 (  0%)   0.05 (  0%)   0.14 (  0%)
      0 kB (  0%)
 LRA create live ranges             :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)
      0 kB (  0%)
 LRA hard reg assignment            :   0.11 (  0%)   0.06 (  0%)   0.19 (  0%)
      0 kB (  0%)
 reload                             :   0.14 (  0%)   0.03 (  0%)   0.21 (  0%)
      0 kB (  0%)
 reload CSE regs                    :   0.77 (  1%)   0.16 (  1%)   0.87 (  1%)
   4608 kB (  0%)
 ree                                :   0.23 (  0%)   0.04 (  0%)   0.32 (  0%)
   5120 kB (  0%)
 thread pro- & epilogue             :   0.54 (  1%)   0.17 (  1%)   0.59 (  1%)
  56321 kB (  1%)
 if-conversion 2                    :   0.11 (  0%)   0.04 (  0%)   0.16 (  0%)
      0 kB (  0%)
 combine stack adjustments          :   0.10 (  0%)   0.02 (  0%)   0.03 (  0%)
      0 kB (  0%)
 peephole 2                         :   0.27 (  0%)   0.01 (  0%)   0.30 (  0%)
   9728 kB (  0%)
 hard reg cprop                     :   0.46 (  1%)   0.09 (  0%)   0.46 (  1%)
      0 kB (  0%)
 scheduling 2                       :   2.90 (  5%)   0.46 (  2%)   3.30 (  4%)
  29555 kB (  1%)
 machine dep reorg                  :   0.32 (  1%)   0.13 (  1%)   0.38 (  0%)
      0 kB (  0%)
 reorder blocks                     :   0.12 (  0%)   0.09 (  0%)   0.26 (  0%)
      0 kB (  0%)
 shorten branches                   :   0.18 (  0%)   0.06 (  0%)   0.24 (  0%)
      0 kB (  0%)
 reg stack                          :   0.07 (  0%)   0.00 (  0%)   0.08 (  0%)
      0 kB (  0%)
 final                              :   1.44 (  2%)   0.52 (  3%)   1.95 (  2%)
  73729 kB (  2%)
 variable output                    :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
      0 kB (  0%)
 symout                             :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)
      0 kB (  0%)
 tree if-combine                    :   0.01 (  0%)   0.02 (  0%)   0.03 (  0%)
      0 kB (  0%)
 straight-line strength reduction   :   0.14 (  0%)   0.02 (  0%)   0.18 (  0%)
      0 kB (  0%)
 store merging                      :   0.02 (  0%)   0.01 (  0%)   0.03 (  0%)
      0 kB (  0%)
 initialize rtl                     :   0.01 (  0%)   0.01 (  0%)   0.03 (  0%)
     12 kB (  0%)
 address lowering                   :   0.00 (  0%)   0.02 (  0%)   0.04 (  0%)
      0 kB (  0%)
 early local passes                 :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
      0 kB (  0%)
 unaccounted optimizations          :   0.01 (  0%)   0.02 (  0%)   0.03 (  0%)
      0 kB (  0%)
 rest of compilation                :   6.34 ( 11%)   2.47 ( 12%)   8.47 ( 11%)
 155650 kB (  4%)
 unaccounted post reload            :   0.04 (  0%)   0.01 (  0%)   0.05 (  0%)
      0 kB (  0%)
 unaccounted late compilation       :   0.01 (  0%)   0.03 (  0%)   0.00 (  0%)
      0 kB (  0%)
 remove unused locals               :   0.16 (  0%)   0.07 (  0%)   0.22 (  0%)
      0 kB (  0%)
 address taken                      :   0.17 (  0%)   0.03 (  0%)   0.15 (  0%)
      0 kB (  0%)
 repair loop structures             :   0.01 (  0%)   0.02 (  0%)   0.04 (  0%)
      0 kB (  0%)
 TOTAL                              :  58.11         20.59         78.83       
4108169 kB

So we still have memory use issue at least.  Since original reporter says
700MB, I guess it is 30% regression?
Memory use from parsing to late opts is:
Analyzing compilation unit
 {GC madv_dontneed 336k} {GC 262144k -> 235813k} {GC released 14336k
madv_dontneed 472k} {GC 472181k -> 411705k}Performing interprocedural
optimizations
 <*free_lang_data> {heap 13388k} <visibility> {heap 15528k} <build_ssa_passes>
{heap 15528k} <opt_local_passes> {heap 19652k} <remove_symbols> {heap 84292k}
<targetclone> {heap 84292k} <free-fnsummary> {heap 84292k}Streaming LTO
 <whole-program> {GC released 16384k madv_dontneed 808k} {GC 870998k ->
510544k} {heap 104288k} <profile_estimate> {heap 104288k} <icf> {heap 113524k}
<devirt> {heap 113524k} <cp> {heap 113524k} <sra> {heap 113524k} <fnsummary>
{heap 113524k} <inline> {heap 113524k} <pure-const> {heap 113524k}
<free-fnsummary> {heap 113524k} <static-var> {heap 113524k} <single-use> {heap
113524k} <comdats> {heap 113524k}Assembling functions:

So starting with cca 472MB of GGC memory and 133MB of heap we get to about
870GB for GGC during earlyopts it seems.

Seems a lot of memory is taken by IRA, too.

I was originally assigned for the inliner issue which is however solved now :)

Reply via email to