On Tue, 22 Nov 2022, Richard Sandiford wrote:

> Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> So it's not easily possible the within current infrastructure.  But it 
> >> does look
> >> like ARM might eventually benefit from something like STV on x86?
> >> 
> >
> > I'm not sure.  The problem with trying to do this in RTL is that you'd have 
> > to be
> > able to decide from two psuedos whether they come from extracts that are
> > sequential. When coming in from a hard register that's easy yes.  When 
> > coming in
> > from a load, or any other operation that produces psuedos that becomes 
> > harder.
> 
> Yeah.
> 
> Just in case anyone reading the above is tempted to implement STV for
> AArch64: I think it would set a bad precedent if we had a paste-&-adjust
> version of the x86 pass.  AFAIK, the target capabilities and constraints
> are mostly modelled correctly using existing mechanisms, so I don't
> think there's anything particularly target-specific about the process
> of forcing things to be on the general or SIMD/FP side.
> 
> So if we did have an STV-ish thing for AArch64, I think it should be
> a target-independent pass that uses hooks and recog, even if the pass
> is initially enabled for AArch64 only.

Agreed - maybe some of the x86 code can be leveraged, but of course
the cost modeling is the most difficult to get right - IIRC the x86
backend resorts to backend specific tuning flags rather than trying
to get rtx_cost or insn_cost "correct" here.

> (FWIW, on the patch itself, I tend to agree that this is really an
> SLP optimisation.  If the vectoriser fails to see the benefit, or if
> it fails to handle more complex cases, then it would be good to try
> to fix that.)

Also agreed - but costing is hard ;)

Richard.

Reply via email to