http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636



William J. Schmidt <wschmidt at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |bergner at gcc dot gnu.org



--- Comment #44 from William J. Schmidt <wschmidt at gcc dot gnu.org> 
2013-03-04 17:53:17 UTC ---

Compiling mgrid.f on powerpc64-unknown-linux-gnu as follows:



$ gfortran -S -m32 -O3 -mcpu=power7 -fpeel-loops -funroll-loops -ffast-math

-fvect-cost-model mgrid.f



I examined the assembly generated for revisions 193330, 193331 (this issue),

and 196171 (PR55334).  What I'm seeing is that for both 193331 and 196171, the

inliner is much more aggressive, and in particular is inlining several copies

of some pretty large functions.



For -m32, I am not seeing any specialization of resid_, so although the change

in 196171 helped a little, it appears that this was by reducing overall code

size.  There weren't any changes in inlining decisions.  Of course there is a

lot of distance between 193331 and 196171, so it is not a perfect comparison,

though it appears 196171 is where -m32 received a slight boost.



Anyway, the non-inlined call tree for 193330 is:



 main

  MAIN__

   resid_ (x4)

    comm3_

   psinv_ (x3)

    comm3_

   norm2u3_ (x2)

   interp_ (x2)

   setup_

   rprj3_ (x4)

   zran3_



The non-inlined call tree for 193331 is:



 main

  MAIN__

   comm3_ (x5)

   resid_

    comm3_

   norm2u3_ (x2)

   setup_

   zran3_



So with 193331 we have the following additional inlines:



  3 inlines of resid_,  size = 1068, total size = 3204

  3 inlines of psinv_,  size = 1046, total size = 3138

  2 inlines of interp_, size = 1544, total size = 3088

  4 inlines of rprj3_,  size = 220,  total size = 880



Here "size" is the number of lines of assembly code of the called procedure,

including labels, so it's just a rough measure.  The number of static call

sites of comm3_ was also reduced by one, but I don't know whether it was

inlined or specialized away.



These are pretty large procedures to be duplicating, particularly to be

duplicating more than once.  Looking at resid_, it already generates spill code

on its own, so putting 3 copies of this in its caller isn't likely to be very

helpful.  Of these, I think only rprj3_ looks like a reasonable inline

candidate.



Total lines of the assembly files are:



  8660 r193330/mgrid.s

 16398 r193331/mgrid.s

 14592 r196171/mgrid.s



Inlining creates unreachable code, so removing the unreachable procedures

gives:



  7765 r193330/mgrid.s

 12591 r193331/mgrid.s

 10795 r196171/mgrid.s



With r196171 the reachable code is still about 40% larger than r193330 (where

some reasonable inlining was already being done).  This is better than the 60%

bloat with r193331 but still seems too high.  Again, these are rough measures

but I think they are indicative.



Without knowing anything about the inliner, I think the inlining heuristics

probably need to take more account of code size than they seem to do at the

moment, particularly when making more than one copy of a procedure and thus

reducing spatial locality.

Reply via email to