http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2010.11.22 13:13:03
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 
13:13:03 UTC ---
It's a very large monolithic function.  And as usual we have a gazillion
amount of local IO state variables.

On trunk with release checking I see:

A quarter of the testcase:

 alias stmt walking    :  15.94 (59%) usr   0.03 ( 8%) sys  16.00 (57%) wall   
1845 kB ( 2%) ggc
 TOTAL                 :  27.08             0.38            27.89             
96306 kB

Half of the testcase:

 alias stmt walking    :  63.31 (68%) usr   0.51 (31%) sys  64.06 (67%) wall   
3684 kB ( 2%) ggc
 TOTAL                 :  93.52             1.66            95.57            
241871 kB

All of the testcase:

 alias stmt walking    : 259.19 (73%) usr   0.78 (26%) sys 261.79 (72%) wall   
7023 kB ( 1%) ggc
 TOTAL                 : 356.27             2.98           361.57            
690719 kB

so it's definitely nearly quadratic (but that's expected).

4.5.x for a quarter of the testcase has:

 alias stmt walking    :  93.10 (88%) usr   0.03 ( 8%) sys  93.31 (87%) wall   
   0 kB ( 0%) ggc
 TOTAL                 : 106.11             0.40           106.93             
87895 kB

so trunk is already a lot better.

Removing the alias stmt walk timevar gets us to the following on trunk
(quarter of the testcase again):

 tree PRE              :  12.93 (47%) usr   0.00 ( 0%) sys  12.98 (46%) wall   
3607 kB ( 4%) ggc
 TOTAL                 :  27.57             0.34            27.99             
96324 kB

What is costly is translating things through the loop bodies.  We can
improve this a lot by properly marking the I/O structs as dead once
they are no longer used and before they are used first.  The proposed
virtual kill stmts could be used for that.  We could also build this
kind of lifeness information up-front and use it to limit the walking
(but that again is only trivial for non-address taken variables, which
the I/O structs are not).

Anyway, confirmed.  Fortran I/O and array descriptor temporaries really
need re-use (I proposed a patch for I/O ones once but it was shot down
because of async-I/O).

Removing all prints from the testcase gives:

 alias stmt walking    : 132.44 (61%) usr   0.79 (30%) sys 133.47 (61%) wall   
7023 kB ( 1%) ggc
 TOTAL                 : 216.55             2.67           219.78            
645229 kB

As all arrays are not address-taken we really look for CSE opportunities
up to the very start of the function (PRE translates the in-loop
references from the any (a /= b) loop to the loop header using the
constant initial index and tries to CSE that, but it doesn't actually
succeed - which is another bug of course, it should look it up from
original resp. A.0).

Reply via email to