On Tue, Mar 17, 2026 at 5:53 PM Andrew Stubbs <[email protected]> wrote:
>
> On 17/03/2026 14:44, Richard Biener wrote:
> > On Tue, Mar 17, 2026 at 3:12 PM Michael Matz <[email protected]> wrote:
> >>
> >> Hello,
> >>
> >> On Tue, 17 Mar 2026, Richard Biener via Gcc wrote:
> >>
> >>> The issue is that (mem:<vectype> (reg:<vectype>)) does not play
> >>> nicely with the idea that a (mem:...) accesses contiguous memory
> >>
> >> That's the big thing indeed.  If it were only MEM_ATTRs the solution is
> >> simple: assert that there aren't any on those vMEMs (or as Andrew suggests
> >> later, only a sane subset).  It think there are more places that
> >> conceptually assume such a contiguous access, like disambiguation and
> >> similar, without MEM_ATTRs, in the sense that if those places think they
> >> have figured out a lower bound of the base address and an upper bound of
> >> the access size, then they assume that nothing outside that range is
> >> accessed.
> >
> > That there can be (well-defined) conflicts within a scatter (WAW conflicts) 
> > does
> > not help either.  Either a RTL representation would disallow that, but then
> > intrinsics cannot map to this scheme, or we somehow have to deal with it.
> > I guess it should be a black box, meaning you cannot combine or split
> > a scatter into/from multiple scatters.
>
> I don't think I understand this point. When can scatters get combined?

Suppose there's a (vec_concat:V4DI (reg:V2DI) (reg:V2DI)) and
both V2DI are from gathers.  For combining scatters there might be
a V4DI scatter pattern, so presumably two back-to-back scatters
could be combined by (vec_concat ..) on the address vector of the MEM?

I'm saying we would need to disallow this.

> How is this different from the existing scatter_load patterns? Or is it
> just that those are unspecs that have never allowed transformations
> outside the backend?

Yes.

> Or is it just that scalar MEM can get combined and the optimizer might
> try the same thing with vector MEM?

That's the logic why such transform would be "obvious"(ly good).

> I'm not proposing any changes to the middle-end representation or
> features, nor any enforced changes to backend capabilities.  The change
> in representation will primarily be more convenient, and if it adds more
> expressability for the future then good.
>
> I *am* asking what the unintended consequences might be, so if this is
> one of those then thank you.

I think we need to document exactly what a MEM of a vector address is
in terms of a RTL abstract machine, otherwise we cannot work on it
with generic code.  That abstract machine might be dependent on
target hooks (though that's generally not my preference).  How CPUs
deal with WAW conflicts in scatters makes this a bit awkward.  If
we document a MEM of a vector is effectively an UNSPEC we win
nothing?

What "common" code that handles MEMs at the moment are you
trying to rely on?

> >> I also think that all those could be fixed as well (e.g. by giving up).
> >>
> >> Furthermore I think we somewhen do need a proper representation of the
> >> concept behind scatter/gather MEMs, where "proper" is not "a RTL
> >> vec_concat of MEMs".  If we went that vec_concat route when vector modes
> >> were introduced and we had represented vector REGs as a vec_concat as
> >> well, instead of the current top-level RTL REG, we would all be mad by
> >> now.
> >>
> >> So, IMHO a top-level construct for "N-lane MEM access with N-lane
> >> addresses" is the right thing to do (and was, for a long time).  The only
> >> question is specifics: should it be a MEM, or a new top-level code?
> >> Should the only difference between a MEM as-of-now and the vMEM be the
> >> fact that the address has a vector mode?  Or flags on the MEM?
> >>
> >> (IMHO: MEM with vMODE addresses is enough, but see below for a case of
> >> new toplevel code).
> >>
> >> Which transformations should be allowed to be represented within the
> >> addresses?  Should it only be a vMODE REG?  Could it be more, like the
> >> scalar offset that's added to all lanes that Andrews architecture would
> >> have, or a scalar scale that's multiplied to each lane?  How to represent
> >> that?  If the vMEM would be a separate top-level RTL, it could have two
> >> slots, one for the base addresses (vMODE), and one for an arithmetic
> >> scalar transform applied to each lane (word_mode).  With a MEM that's more
> >> complicated and would somehow have to be wrapped in the vMODE address.
> >> But the latter might be convenient in other places as well, for instance
> >> when calculating such address vector without actual memory access.
> >>
> >> And so on...
> >>
> >> But I think when Andrew wants to put in the work to make this ... well,
> >> work, then it would be good for GCC.
> >
> > I think the recent discussion on how to represent (len-)masking and else
> > values also comes into play here given at least we have masked variants
> > of gathers and scatters.
>
> I've not been following this (the "len" stuff is not usually relevant to
> GCN), so I'm not completely sure which issue you're referring to.
>
> I agree that masking is an issue here, because ...
>
>    (set (mem ....)
>         (vec_merge
>             (src)
>             (mem .... "0")
>             (mask)))
>
> ... is potentially different to ...
>
>    (set (reg 123)
>         (mem ...))
>    (set (reg 123)
>         (vec_merge
>             (src)
>             (reg 123)
>             (mask)))
>    (set (mem ...)
>         (reg 123))
>
> However, this was already an issue for contiguous vectors, so while it
> could be a new problem for GCN (good catch!), surely this is an old
> problem on other architectures?

Yes, it's a representational issue (for that RTL abstract machine).

> (This is not currently a problem on GCN because maskload gives an
> unbreakable UNSPEC.)

Likewise on x86, but not on aarch64 it was said.

Richard.

>
> Andrew
>
>

Reply via email to