Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

Martin Jambor Thu, 18 Apr 2013 15:09:40 -0700

Hi,

On Wed, Apr 17, 2013 at 12:43:59PM -0600, Jeff Law wrote:
> On 04/17/2013 09:49 AM, Martin Jambor wrote:
> >
> >The reason why it helps so much is that before register allocation
> >there are instructions moving the value of actual arguments from
> >"originally hard" register (e.g. SI, DI, etc.) to a pseudo at the
> >beginning of each function.  When the argument is live across a
> >function call, the pseudo is likely to be assigned to a callee-saved
> >register and then also accessed from that register, even in the first
> >BB, making it require prologue, though it could be fetched from the
> >original one.  When we convert all uses (at least in the first BB) to
> >the original register, the preparatory stage of shrink wrapping is
> >often capable of moving the register moves to a later BB, thus
> >creating fast paths which do not require prologue and epilogue.
> I noticed similar effects when looking at range splitting.  Being
> able to move those calls into a deeper control level in the CFG
> would definitely be an improvement.
> 
> >
> >We believe this change in the pipeline should not bring about any
> >negative effects.  During gcc bootstrap, the number of instructions
> >changed by pass_cprop_hardreg dropped but by only 1.2%.  We have also
> >ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all
> >run time differences could be attributed to noise.  The changes in
> >binary sizes were also small:


> Did anyone ponder just doing the hard register propagation on
> argument registers prior the prologue/epilogue handling, then the
> full blown propagation pass in its current location in the pipeline?

I did not because I did not think it would be substantially faster
than running the pass as-is twice.  I may be wrong but it would still
had to look at all statements and examine them at very similar level
of detail (to look for clobbers and manage value_data_entry chains)
and it would not really do that much less work fiddling with its own
data structures.

What would very likely be a working alternative for shrink-wrapping is
to have shrink-wrapping preparation invoke copyprop_hardreg_forward_1
on the first BB and the few BBs it tries to move stuff across.  But of
course that would be a bit ugly and so I think we should do it only if
there is a reason not to move the pass (or schedule it twice).

I also have not tried scheduling the hard register copy propagation
pass twice and measuring the impact on compile times.  Any suggestion
what might be a good testcase for that?

Thanks,

Martin

> 
> That would get you the benefit you're seeking and minimize other
> effects.  Of course if you try that and get effectively the same
> results as moving the full propagation pass before prologue/epilogue
> handling then the complexity of only propagating argument registers
> early is clearly not needed and we'd probably want to go with your
> patch as-is.
> 
> 
> jeff
>

Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

Reply via email to