https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585

--- Comment #14 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #13)
> 
> You mean stores like the following?
> 
> (insn 13 12 14 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 150 virtual-stack-vars)
>                 (const_int 112 [0x70])) [1 a+48 S16 A128])
>         (vec_select:V4SI (reg:V4SI 190)
>             (parallel [
>                     (const_int 2 [0x2])
>                     (const_int 3 [0x3])
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                 ]))) t.c:14 -1
>      (nil))
> 
> I wonder why dse can't simply force the rhs to a register?  Of course if
> power really has stores that do this vec_select but no non-store with
> the operation then this might not be valid ...

Right, the problem is our limited selection of vector stores on POWER8.  We can
either use stvx, which requires that the address be 16-byte aligned, or we can
use stxvd2x (what you see here), which has the odd property of being a
big-endian store even on a little-endian system.  We can't force the rhs to a
register because that would have the unwanted side effect of converting an
unaligned load to a forcibly aligned load (masking off the low-order 4 bits).

Now, for parameters, this is legal because the stack slots are 16-byte aligned,
but DSE doesn't know that.  I don't think we want to pollute DSE with extra
logic for this architectural anomaly, so it's probably best if we do some more
work to figure out when we can safely use the aligning store.  (Something
that's been on the back burner for a while, but this discovery makes it more
important.)

For POWER9, we have unaligned stores with proper endian behavior, so it won't
be a problem except for POWER8.

> 
> Now, in the end this example just shows that lowering register passing
> only at RTL expansion leads to a load of missed optimizations regarding
> to parameter setup ... some scheme to apply the lowering on GIMPLE already
> would be interesting to explore (but albeit quite a bit of work).  We'd
> have a second set of "parameter decls" somewhere, like in struct function,
> and use that when the IL is on lowered form.  Same for DECL_RESULT of course.
> And then the interesting part is whether to expose the stack in some way or
> restrict the lowering to decomposition/combining to registers.

Right...lots of work here, and of course some added complexity to ABI
implementation for new and existing targets.  But small structures/arrays
passed entirely in registers is a not-uncommon ABI feature, and even just
exposing those early could be helpful.  (Complications set in when you run out
of registers halfway through a structure and so forth, so even just lowering
that sort of thing early would not be trivial.)

Reply via email to