------- Comment #22 from hubicka at gcc dot gnu dot org 2008-02-06 19:22 ------- Yes, there are number of unlucky variables. However the real source is here seems to be always wrong profile guiding regalloc to optimize for cold portions of the function rather than real increase of register pressure increase due to inlining.
In general, inlining operation itself only decrease register pressure: you don't fix function parameters/return value to fixed registers and you know precisely what registers survive the body so you don't need to save caller saved registers when not needed. The losses from inlining with our regalloc is partly due to callee saved registers being sometimes more effective sort of immitating live range splitting. Increased register pressure is effect of propagating from function body to the rest of program, but it is not that bat either: at least all the inlining heuristic/RA bugs turned to be something else. The high speedup by forwprop patch in 64bit mode (and slowdown in 32bit) is actually also register allocation related: the internal loop consisting of sequence of ++ operations ends up with extra copy instructions without forwprop patch, while with the patch we produce normal induction variable. On 32bit it however results in regalloc putting this variable on stack because its liferange heuristics gives it lower priority then. For 32bit data, britten 32-bit SPEC tester peaked at 760, while we now get 620 on peak with -fomit-frame-pointer. 20% regression on rather simple commonly used codebase definitly makes us look stupid.... More though that ICC 7.x did 820 on same machine. 64bit tester is 830 versus 740 approximately. Honza -- hubicka at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761