Hello,

On Mon, 7 Jun 2021, Jeff Law wrote:

> 
> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're
> building a new processor and we've come across an issue where I think we need
> upstream discussion.
> 
> I can't divulge many of the details right now, but one of the quirks of our
> architecture is that reg+d addressing modes for our vector loads/stores
> require the displacement to be aligned.  This is an artifact of how these
> instructions are encoded.
> 
> Obviously we can emit a load of the address into a register when the
> displacement isn't aligned.  From a correctness point that works perfectly. 
> Unfortunately, it's a significant performance hit on some standard benchmarks
> (spec) where we have a great number of spills of vector objects into the stack
> at unaligned offsets in the hot parts of the code.
> 
> 
> We've considered 3 possible approaches to solve this problem.
> 
> 1. When the displacement isn't properly aligned, allocate more space in
> assign_stack_local so that we can make the offset aligned.  The downside is
> this potentially burns a lot of stack space, but in practice the cost was
> minimal (16 bytes in a 9k frame)  From a performance standpoint this works
> perfectly.
> 
> 2. Abuse the register elimination code to create a second pointer into the
> stack.  Spills would start as <virtual> + offset, then either get eliminated
> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
> wasn't properly aligned. We started a bit down this path, but with #1 working
> so well, we didn't get this approach to proof-of-concept.
> 
> 3. Hack up the post-reload optimizers to fix things up as best as we can. 
> This may still be advantageous, but again with #1 working so well, we didn't
> explore this in any significant way.  We may still look at this at some point
> in other contexts.
> 
> Here's what we're playing with.  Obviously we'd need a target hook to 
> drive this behavior.  I was thinking that we'd pass in any slot offset 
> alignment requirements (from the target hook) to assign_stack_local and 
> that would bubble down to this point in try_fit_stack_local:

Why is the machinery involving STACK_SLOT_ALIGNMENT and 
spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for 
backing stack slots) not working for you?  If everything is setup 
correctly the input alignment to try_fit_stack_local ought to be correct 
already.


Ciao,
Michael.

Reply via email to