http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636
William J. Schmidt <wschmidt at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bergner at gcc dot gnu.org --- Comment #44 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2013-03-04 17:53:17 UTC --- Compiling mgrid.f on powerpc64-unknown-linux-gnu as follows: $ gfortran -S -m32 -O3 -mcpu=power7 -fpeel-loops -funroll-loops -ffast-math -fvect-cost-model mgrid.f I examined the assembly generated for revisions 193330, 193331 (this issue), and 196171 (PR55334). What I'm seeing is that for both 193331 and 196171, the inliner is much more aggressive, and in particular is inlining several copies of some pretty large functions. For -m32, I am not seeing any specialization of resid_, so although the change in 196171 helped a little, it appears that this was by reducing overall code size. There weren't any changes in inlining decisions. Of course there is a lot of distance between 193331 and 196171, so it is not a perfect comparison, though it appears 196171 is where -m32 received a slight boost. Anyway, the non-inlined call tree for 193330 is: main MAIN__ resid_ (x4) comm3_ psinv_ (x3) comm3_ norm2u3_ (x2) interp_ (x2) setup_ rprj3_ (x4) zran3_ The non-inlined call tree for 193331 is: main MAIN__ comm3_ (x5) resid_ comm3_ norm2u3_ (x2) setup_ zran3_ So with 193331 we have the following additional inlines: 3 inlines of resid_, size = 1068, total size = 3204 3 inlines of psinv_, size = 1046, total size = 3138 2 inlines of interp_, size = 1544, total size = 3088 4 inlines of rprj3_, size = 220, total size = 880 Here "size" is the number of lines of assembly code of the called procedure, including labels, so it's just a rough measure. The number of static call sites of comm3_ was also reduced by one, but I don't know whether it was inlined or specialized away. These are pretty large procedures to be duplicating, particularly to be duplicating more than once. Looking at resid_, it already generates spill code on its own, so putting 3 copies of this in its caller isn't likely to be very helpful. Of these, I think only rprj3_ looks like a reasonable inline candidate. Total lines of the assembly files are: 8660 r193330/mgrid.s 16398 r193331/mgrid.s 14592 r196171/mgrid.s Inlining creates unreachable code, so removing the unreachable procedures gives: 7765 r193330/mgrid.s 12591 r193331/mgrid.s 10795 r196171/mgrid.s With r196171 the reachable code is still about 40% larger than r193330 (where some reasonable inlining was already being done). This is better than the 60% bloat with r193331 but still seems too high. Again, these are rough measures but I think they are indicative. Without knowing anything about the inliner, I think the inlining heuristics probably need to take more account of code size than they seem to do at the moment, particularly when making more than one copy of a procedure and thus reducing spatial locality.