Steve Ellcey <steve.ell...@imgtec.com> writes:
> On Thu, 2015-08-13 at 02:14 -0700, Matthew Fortune wrote:
> > Hi Steve,
> >
> > Overall, I don't think these optimizations are ready to include. In 
> > principle
> > the idea looks good but it is done at the wrong point in the compiler in my
> > opinion.
> >
> > The biggest concern I have is that the analysis should be possible at (or
> > prior to) the point where the prologue/epilogue are expanded. I don't think
> > it is safe enough to post-process the code and delete the stack allocation.
> 
> I think that to do this, what we would have to do is introduce a new
> pass at the tree level (just before expanding to rtl) where we could do
> the analysis of whether or not the outgoing argument area is needed or
> not.  Then we could use that info during expand_prologue to reset
> frame->args_size if the space is not needed.

One thought was that this problem seems to fit into the category of ipa-ra and
while I don't know how that is implemented or if it is early enough... it may
be worth seeing if extra information can be calculated there.

> > There is at least one other optimization idea that competes with this one
> > which is to allow LRA to use the argument save area for arbitrary spills 
> > when
> > it is not used for spilling arguments or to prepare varargs. I think we need
> > to at least consider how the frame header removal will interact with such
> > an optimization.
> 
> I am not sure how this would work.  It seems better to just not allocate
> the space if it is not needed and then LRA can separately allocate
> whatever it needs for its own use (if any).  I'll add Robert to the cc
> list on this in case he has any ideas since he did the LRA
> implementation for MIPS.

I think between these two there will always be one optimization that has to
come first and win. If we decide prior to expansion whether an outgoing argument
area is needed (and therefore also decide if an incoming argument area is
available in any given function) then we will of course preclude any use of
this area for spilling/locals in the callee. The saving when re-using this
area is that the callee doesn't have to do stack allocation which could be
a performance win if called in a loop. Removing the stack allocation from the
caller is not as big of a win.

Perhaps balancing the two optimizations (if/when we do the LRA one) can be
fit in later without too much trouble.

Thanks,
Matthew

Reply via email to