space

hubicka at gcc dot gnu dot org Mon, 24 Jul 2006 04:54:21 -0700


------- Comment #21 from hubicka at gcc dot gnu dot org  2006-07-24 11:54 
-------
OK, some summary ;)


Mainline (after the first three patches) at -O now peaks 450MB (just because of
register allocator's conflict matrix, otherwise it is about 150MB).  Still not
quite icc's 12 seconds/200MB, but we are out of regression land for -O relative
to 4.0.I tested 3.0 and it bombs on the testcase, 2.95 however compile it quite
fluently on 200MB peak, it needs 6 minutes however.

 life analysis         :  25.92 (16%) usr   0.01 ( 0%) sys  26.18 (15%) wall   
2565 kB ( 1%) ggc
 inline heuristics     :  15.15 ( 9%) usr   0.01 ( 0%) sys  15.27 ( 9%) wall   
1486 kB ( 1%) ggc
 integration           :  21.37 (13%) usr   0.12 ( 5%) sys  21.66 (13%) wall  
33445 kB (19%) ggc
 tree SSA to normal    :  27.73 (17%) usr   0.03 ( 1%) sys  27.93 (16%) wall   
  17 kB ( 0%) ggc
 local alloc           :   7.33 ( 4%) usr   0.03 ( 1%) sys   7.41 ( 4%) wall   
1855 kB ( 1%) ggc
 global alloc          :  13.67 ( 8%) usr   0.73 (32%) sys  15.85 ( 9%) wall  
14178 kB ( 8%) ggc
 reload CSE regs       :  30.88 (19%) usr   0.04 ( 2%) sys  31.09 (18%) wall   
2393 kB ( 1%) ggc
 TOTAL                 : 164.46             2.27           169.53            
173593 kB

It would be interesting to see how dataflow branch score here after re-merging
from mainline.  Hopefully integration and register allocation issues should be
tracked there.

The inliner is still quadratic in time because of quadratic split_block and
cgraph_node.  Both can be made linear quite easilly (split_block by always
renumbering the smaller area of block and cgraph_node by producing hashtables
for nodes with many edges), but I am not sure I want to do that for 4.2.
Inline heuristics might be trickier to get in speed.

I duno about reload. Oprofile might be handy ;)

-O2 expose problem in PRE DannyB has fix for.  Regmove and into-SSA can also be
significantly sped up by patches I attached and will commit them once testing
converge.

-O3 turns the testcase into quite different one (gigantic basic block is turned
into many basic blocks by inlining min/max functions).
There few problems are still visible - FRE consume unbounded amount of memory
and we fail to synthetize fmin/fmax operators where we ought to.

If the FRE problem is fixed, I would say it should no longer be considered as
4.2 blocker.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

Reply via email to