Hi, On Wed, Apr 17, 2013 at 12:43:59PM -0600, Jeff Law wrote: > On 04/17/2013 09:49 AM, Martin Jambor wrote: > > > >The reason why it helps so much is that before register allocation > >there are instructions moving the value of actual arguments from > >"originally hard" register (e.g. SI, DI, etc.) to a pseudo at the > >beginning of each function. When the argument is live across a > >function call, the pseudo is likely to be assigned to a callee-saved > >register and then also accessed from that register, even in the first > >BB, making it require prologue, though it could be fetched from the > >original one. When we convert all uses (at least in the first BB) to > >the original register, the preparatory stage of shrink wrapping is > >often capable of moving the register moves to a later BB, thus > >creating fast paths which do not require prologue and epilogue. > I noticed similar effects when looking at range splitting. Being > able to move those calls into a deeper control level in the CFG > would definitely be an improvement. > > > > >We believe this change in the pipeline should not bring about any > >negative effects. During gcc bootstrap, the number of instructions > >changed by pass_cprop_hardreg dropped but by only 1.2%. We have also > >ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all > >run time differences could be attributed to noise. The changes in > >binary sizes were also small:
> Did anyone ponder just doing the hard register propagation on > argument registers prior the prologue/epilogue handling, then the > full blown propagation pass in its current location in the pipeline? I did not because I did not think it would be substantially faster than running the pass as-is twice. I may be wrong but it would still had to look at all statements and examine them at very similar level of detail (to look for clobbers and manage value_data_entry chains) and it would not really do that much less work fiddling with its own data structures. What would very likely be a working alternative for shrink-wrapping is to have shrink-wrapping preparation invoke copyprop_hardreg_forward_1 on the first BB and the few BBs it tries to move stuff across. But of course that would be a bit ugly and so I think we should do it only if there is a reason not to move the pass (or schedule it twice). I also have not tried scheduling the hard register copy propagation pass twice and measuring the impact on compile times. Any suggestion what might be a good testcase for that? Thanks, Martin > > That would get you the benefit you're seeking and minimize other > effects. Of course if you try that and get effectively the same > results as moving the full propagation pass before prologue/epilogue > handling then the complexity of only propagating argument registers > early is clearly not needed and we'd probably want to go with your > patch as-is. > > > jeff >