------- Comment #21 from hubicka at gcc dot gnu dot org 2006-07-24 11:54 ------- OK, some summary ;)
Mainline (after the first three patches) at -O now peaks 450MB (just because of register allocator's conflict matrix, otherwise it is about 150MB). Still not quite icc's 12 seconds/200MB, but we are out of regression land for -O relative to 4.0.I tested 3.0 and it bombs on the testcase, 2.95 however compile it quite fluently on 200MB peak, it needs 6 minutes however. life analysis : 25.92 (16%) usr 0.01 ( 0%) sys 26.18 (15%) wall 2565 kB ( 1%) ggc inline heuristics : 15.15 ( 9%) usr 0.01 ( 0%) sys 15.27 ( 9%) wall 1486 kB ( 1%) ggc integration : 21.37 (13%) usr 0.12 ( 5%) sys 21.66 (13%) wall 33445 kB (19%) ggc tree SSA to normal : 27.73 (17%) usr 0.03 ( 1%) sys 27.93 (16%) wall 17 kB ( 0%) ggc local alloc : 7.33 ( 4%) usr 0.03 ( 1%) sys 7.41 ( 4%) wall 1855 kB ( 1%) ggc global alloc : 13.67 ( 8%) usr 0.73 (32%) sys 15.85 ( 9%) wall 14178 kB ( 8%) ggc reload CSE regs : 30.88 (19%) usr 0.04 ( 2%) sys 31.09 (18%) wall 2393 kB ( 1%) ggc TOTAL : 164.46 2.27 169.53 173593 kB It would be interesting to see how dataflow branch score here after re-merging from mainline. Hopefully integration and register allocation issues should be tracked there. The inliner is still quadratic in time because of quadratic split_block and cgraph_node. Both can be made linear quite easilly (split_block by always renumbering the smaller area of block and cgraph_node by producing hashtables for nodes with many edges), but I am not sure I want to do that for 4.2. Inline heuristics might be trickier to get in speed. I duno about reload. Oprofile might be handy ;) -O2 expose problem in PRE DannyB has fix for. Regmove and into-SSA can also be significantly sped up by patches I attached and will commit them once testing converge. -O3 turns the testcase into quite different one (gigantic basic block is turned into many basic blocks by inlining min/max functions). There few problems are still visible - FRE consume unbounded amount of memory and we fail to synthetize fmin/fmax operators where we ought to. If the FRE problem is fixed, I would say it should no longer be considered as 4.2 blocker. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071