------- Comment #31 from rguenth at gcc dot gnu dot org 2010-01-26 11:35 ------- Updated timings and memory:
> ~/bin/maxmem2.sh /usr/bin/time gcc-4.5 -S -o /dev/null lgwam.c 32.62user 1.48system 0:34.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+333822minor)pagefaults 0swaps total: 1333745 kB > ~/bin/maxmem2.sh /usr/bin/time /space/rguenther/install/gcc-4.4.3/bin/gcc -S > -o /dev/null lgwam.c 35.01user 1.54system 0:36.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+329139minor)pagefaults 0swaps total: 1306898 kB > ~/bin/maxmem2.sh /usr/bin/time /space/rguenther/install/gcc-4.3.4/bin/gcc -S > -o /dev/null lgwam.c 27.42user 1.83system 0:29.61elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 16inputs+0outputs (0major+374338minor)pagefaults 0swaps total: 1341721 kB > ~/bin/maxmem2.sh /usr/bin/time /space/rguenther/install/gcc-4.2.3/bin/gcc -S > -o /dev/null lgwam.c 15.33user 0.80system 0:16.31elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 256inputs+0outputs (2major+197427minor)pagefaults 0swaps total: 598733 kB time-report for 4.5 trunk (only interesting parts): expand : 6.72 (20%) usr 0.72 (26%) sys 7.45 (20%) wall 140609 kB (35%) ggc integrated RA : 8.56 (25%) usr 0.12 ( 4%) sys 8.76 (24%) wall 5151 kB ( 1%) ggc reload : 7.72 (23%) usr 0.22 ( 8%) sys 7.94 (21%) wall TOTAL : 34.17 2.79 37.15 402913 kB memory-usage is still high compared to 4.2. At -O2 the picture is similar (memory peaks at 2.5GB): Execution times (seconds) garbage collection : 0.64 ( 0%) usr 0.00 ( 0%) sys 0.64 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.31 ( 0%) usr 0.08 ( 1%) sys 0.39 ( 0%) wall 20525 kB ( 3%) ggc callgraph optimization: 1.40 ( 1%) usr 0.03 ( 0%) sys 1.46 ( 1%) wall 639 kB ( 0%) ggc ipa cp : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 9607 kB ( 1%) ggc ipa reference : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 199 kB ( 0%) ggc cfg cleanup : 0.92 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 1168 kB ( 0%) ggc trivially dead code : 0.92 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 0 kB ( 0%) ggc df multiple defs : 0.86 ( 0%) usr 0.01 ( 0%) sys 0.88 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 2.69 ( 1%) usr 0.96 (12%) sys 3.94 ( 2%) wall 0 kB ( 0%) ggc df live regs : 7.15 ( 3%) usr 0.07 ( 1%) sys 7.13 ( 3%) wall 0 kB ( 0%) ggc df live&initialized regs: 3.33 ( 2%) usr 0.06 ( 1%) sys 3.36 ( 1%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.85 ( 0%) usr 0.03 ( 0%) sys 0.88 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 6.31 ( 3%) usr 0.05 ( 1%) sys 12.84 ( 6%) wall 18222 kB ( 2%) ggc register information : 3.42 ( 2%) usr 0.00 ( 0%) sys 3.46 ( 2%) wall 0 kB ( 0%) ggc alias analysis : 1.14 ( 1%) usr 0.01 ( 0%) sys 1.17 ( 1%) wall 10447 kB ( 1%) ggc alias stmt walking : 4.39 ( 2%) usr 0.26 ( 3%) sys 4.53 ( 2%) wall 0 kB ( 0%) ggc register scan : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 7 kB ( 0%) ggc rebuild jump labels : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.51 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.64 ( 0%) usr 0.35 ( 4%) sys 0.93 ( 0%) wall 23140 kB ( 3%) ggc lexical analysis : 0.30 ( 0%) usr 0.54 ( 7%) sys 0.88 ( 0%) wall 0 kB ( 0%) ggc parser : 0.69 ( 0%) usr 0.38 ( 5%) sys 1.09 ( 0%) wall 38129 kB ( 5%) ggc inline heuristics : 1.18 ( 1%) usr 0.01 ( 0%) sys 1.15 ( 1%) wall 29832 kB ( 4%) ggc integration : 3.42 ( 2%) usr 0.45 ( 6%) sys 3.66 ( 2%) wall 175322 kB (22%) ggc tree gimplify : 1.08 ( 1%) usr 0.09 ( 1%) sys 1.17 ( 1%) wall 104718 kB (13%) ggc tree eh : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.14 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall 11817 kB ( 1%) ggc tree CFG cleanup : 1.23 ( 1%) usr 0.00 ( 0%) sys 1.21 ( 1%) wall 60 kB ( 0%) ggc tree VRP : 3.08 ( 1%) usr 0.01 ( 0%) sys 3.09 ( 1%) wall 6719 kB ( 1%) ggc tree copy propagation : 1.22 ( 1%) usr 0.02 ( 0%) sys 1.14 ( 0%) wall 585 kB ( 0%) ggc tree find ref. vars : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall 6045 kB ( 1%) ggc tree PTA : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.68 ( 0%) wall 2178 kB ( 0%) ggc tree PHI insertion : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 185 kB ( 0%) ggc tree SSA rewrite : 2.92 ( 1%) usr 0.06 ( 1%) sys 3.04 ( 1%) wall 54148 kB ( 7%) ggc tree SSA other : 0.24 ( 0%) usr 0.08 ( 1%) sys 0.34 ( 0%) wall 589 kB ( 0%) ggc tree SSA incremental : 3.46 ( 2%) usr 0.03 ( 0%) sys 3.50 ( 2%) wall 187 kB ( 0%) ggc tree operand scan : 1.81 ( 1%) usr 0.31 ( 4%) sys 2.40 ( 1%) wall 53424 kB ( 7%) ggc dominator optimization: 0.73 ( 0%) usr 0.00 ( 0%) sys 0.73 ( 0%) wall 5666 kB ( 1%) ggc tree SRA : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 2 kB ( 0%) ggc tree CCP : 0.95 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%) wall 617 kB ( 0%) ggc tree PHI const/copy prop: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 8 kB ( 0%) ggc tree reassociation : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 280 kB ( 0%) ggc tree PRE : 2.02 ( 1%) usr 1.01 (13%) sys 3.02 ( 1%) wall 2092 kB ( 0%) ggc tree FRE : 2.62 ( 1%) usr 1.12 (14%) sys 3.75 ( 2%) wall 1674 kB ( 0%) ggc tree code sinking : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 1126 kB ( 0%) ggc tree linearize phis : 0.07 ( 0%) usr 0.01 ( 0%) sys 0.09 ( 0%) wall 4 kB ( 0%) ggc tree forward propagate: 0.19 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 75 kB ( 0%) ggc tree phiprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 0.61 ( 0%) usr 0.20 ( 2%) sys 0.91 ( 0%) wall 5 kB ( 0%) ggc tree aggressive DCE : 0.65 ( 0%) usr 0.09 ( 1%) sys 0.80 ( 0%) wall 1486 kB ( 0%) ggc tree buildin call DCE : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 34 kB ( 0%) ggc PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 26 kB ( 0%) ggc tree loop bounds : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 17 kB ( 0%) ggc tree loop invariant motion: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 4 kB ( 0%) ggc scev constant prop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 112 kB ( 0%) ggc complete unrolling : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 379 kB ( 0%) ggc tree iv optimization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 473 kB ( 0%) ggc tree loop init : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 390 kB ( 0%) ggc tree copy headers : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 330 kB ( 0%) ggc tree SSA uncprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc tree rename SSA copies: 0.31 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 1.26 ( 1%) usr 0.04 ( 0%) sys 1.31 ( 1%) wall 0 kB ( 0%) ggc control dependences : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc expand : 10.92 ( 5%) usr 0.70 ( 9%) sys 11.74 ( 5%) wall 124907 kB (16%) ggc forward prop : 2.08 ( 1%) usr 0.04 ( 0%) sys 2.06 ( 1%) wall 3631 kB ( 0%) ggc CSE : 3.19 ( 1%) usr 0.00 ( 0%) sys 3.14 ( 1%) wall 512 kB ( 0%) ggc dead code elimination : 1.24 ( 1%) usr 0.01 ( 0%) sys 1.25 ( 1%) wall 0 kB ( 0%) ggc dead store elim1 : 1.26 ( 1%) usr 0.04 ( 0%) sys 1.33 ( 1%) wall 965 kB ( 0%) ggc dead store elim2 : 1.51 ( 1%) usr 0.02 ( 0%) sys 1.54 ( 1%) wall 7466 kB ( 1%) ggc loop analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 229 kB ( 0%) ggc loop invariant motion : 0.10 ( 0%) usr 0.03 ( 0%) sys 0.13 ( 0%) wall 2 kB ( 0%) ggc CPROP : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 452 kB ( 0%) ggc PRE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 15 kB ( 0%) ggc CSE 2 : 3.20 ( 1%) usr 0.00 ( 0%) sys 3.22 ( 1%) wall 494 kB ( 0%) ggc branch prediction : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall 2539 kB ( 0%) ggc combiner : 1.42 ( 1%) usr 0.02 ( 0%) sys 1.42 ( 1%) wall 11673 kB ( 1%) ggc if-conversion : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 338 kB ( 0%) ggc regmove : 0.39 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 0 kB ( 0%) ggc integrated RA : 78.17 (37%) usr 0.46 ( 6%) sys 78.94 (34%) wall 4788 kB ( 1%) ggc reload : 25.79 (12%) usr 0.07 ( 1%) sys 26.09 (11%) wall 26499 kB ( 3%) ggc reload CSE regs : 3.00 ( 1%) usr 0.05 ( 1%) sys 3.08 ( 1%) wall 15550 kB ( 2%) ggc thread pro- & epilogue: 1.39 ( 1%) usr 0.01 ( 0%) sys 1.48 ( 1%) wall 1006 kB ( 0%) ggc if-conversion 2 : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 138 kB ( 0%) ggc combine stack adjustments: 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc peephole 2 : 0.67 ( 0%) usr 0.01 ( 0%) sys 0.69 ( 0%) wall 3275 kB ( 0%) ggc hard reg cprop : 1.33 ( 1%) usr 0.02 ( 0%) sys 1.32 ( 1%) wall 979 kB ( 0%) ggc scheduling 2 : 6.59 ( 3%) usr 0.05 ( 1%) sys 6.63 ( 3%) wall 294 kB ( 0%) ggc machine dep reorg : 0.69 ( 0%) usr 0.01 ( 0%) sys 0.68 ( 0%) wall 10 kB ( 0%) ggc reorder blocks : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 2878 kB ( 0%) ggc final : 1.63 ( 1%) usr 0.04 ( 0%) sys 1.64 ( 1%) wall 201 kB ( 0%) ggc tree if-combine : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 1 kB ( 0%) ggc plugin execution : 0.00 ( 0%) usr 0.03 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 214.12 8.02 229.70 793292 kB At -O1 we seem to never finish early inlining into testsuite though... (the loop over all callees calling check-inline-limits which again loops over all callees looks quadratic - at least because we do not not consider the duplicates in the caller? We seem to be looping over edges but check limits for !one_only. Why do we not consider edges individually and avoid calling cgraph_check_inline_limits with !one_only at all?). -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[4.3/4.5 Regression] gcc |[4.3/4.4/4.5 Regression] gcc |4.3.1 cannot compile big |4.3.1 cannot compile big |function |function http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37448