Hello, On Mon, 7 Jun 2021, Jeff Law wrote:
> > So, as many of you know I left Red Hat a while ago and joined Tachyum. We're > building a new processor and we've come across an issue where I think we need > upstream discussion. > > I can't divulge many of the details right now, but one of the quirks of our > architecture is that reg+d addressing modes for our vector loads/stores > require the displacement to be aligned. This is an artifact of how these > instructions are encoded. > > Obviously we can emit a load of the address into a register when the > displacement isn't aligned. From a correctness point that works perfectly. > Unfortunately, it's a significant performance hit on some standard benchmarks > (spec) where we have a great number of spills of vector objects into the stack > at unaligned offsets in the hot parts of the code. > > > We've considered 3 possible approaches to solve this problem. > > 1. When the displacement isn't properly aligned, allocate more space in > assign_stack_local so that we can make the offset aligned. The downside is > this potentially burns a lot of stack space, but in practice the cost was > minimal (16 bytes in a 9k frame) From a performance standpoint this works > perfectly. > > 2. Abuse the register elimination code to create a second pointer into the > stack. Spills would start as <virtual> + offset, then either get eliminated > to sp+offset' when the offset is aligned or gpr+offset'' when the offset > wasn't properly aligned. We started a bit down this path, but with #1 working > so well, we didn't get this approach to proof-of-concept. > > 3. Hack up the post-reload optimizers to fix things up as best as we can. > This may still be advantageous, but again with #1 working so well, we didn't > explore this in any significant way. We may still look at this at some point > in other contexts. > > Here's what we're playing with. Obviously we'd need a target hook to > drive this behavior. I was thinking that we'd pass in any slot offset > alignment requirements (from the target hook) to assign_stack_local and > that would bubble down to this point in try_fit_stack_local: Why is the machinery involving STACK_SLOT_ALIGNMENT and spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for backing stack slots) not working for you? If everything is setup correctly the input alignment to try_fit_stack_local ought to be correct already. Ciao, Michael.