On 18/03/2026 07:39, Richard Biener wrote:
On Tue, Mar 17, 2026 at 5:53 PM Andrew Stubbs <[email protected]> wrote:

On 17/03/2026 14:44, Richard Biener wrote:
On Tue, Mar 17, 2026 at 3:12 PM Michael Matz <[email protected]> wrote:

Hello,

On Tue, 17 Mar 2026, Richard Biener via Gcc wrote:

The issue is that (mem:<vectype> (reg:<vectype>)) does not play
nicely with the idea that a (mem:...) accesses contiguous memory

That's the big thing indeed.  If it were only MEM_ATTRs the solution is
simple: assert that there aren't any on those vMEMs (or as Andrew suggests
later, only a sane subset).  It think there are more places that
conceptually assume such a contiguous access, like disambiguation and
similar, without MEM_ATTRs, in the sense that if those places think they
have figured out a lower bound of the base address and an upper bound of
the access size, then they assume that nothing outside that range is
accessed.

That there can be (well-defined) conflicts within a scatter (WAW conflicts) does
not help either.  Either a RTL representation would disallow that, but then
intrinsics cannot map to this scheme, or we somehow have to deal with it.
I guess it should be a black box, meaning you cannot combine or split
a scatter into/from multiple scatters.

I don't think I understand this point. When can scatters get combined?

Suppose there's a (vec_concat:V4DI (reg:V2DI) (reg:V2DI)) and
both V2DI are from gathers.  For combining scatters there might be
a V4DI scatter pattern, so presumably two back-to-back scatters
could be combined by (vec_concat ..) on the address vector of the MEM?

I'm saying we would need to disallow this.

OK, we do not use vec_concat anywhere in GCN, AFAIK. It's not a natural operation for this architecture, although you could implement it using a permute and vec_merge sequence.

I can see this being a pitfall, but it's not something that would happen by accident, right? The machine description would need to define a number of insns that are composable in just that way, or is there some smarts somewhere beyond the "combine" pattern matching?

How is this different from the existing scatter_load patterns? Or is it
just that those are unspecs that have never allowed transformations
outside the backend?

Yes.

Or is it just that scalar MEM can get combined and the optimizer might
try the same thing with vector MEM?

That's the logic why such transform would be "obvious"(ly good).

I'm not proposing any changes to the middle-end representation or
features, nor any enforced changes to backend capabilities.  The change
in representation will primarily be more convenient, and if it adds more
expressability for the future then good.

I *am* asking what the unintended consequences might be, so if this is
one of those then thank you.

I think we need to document exactly what a MEM of a vector address is
in terms of a RTL abstract machine, otherwise we cannot work on it
with generic code.  That abstract machine might be dependent on
target hooks (though that's generally not my preference).  How CPUs
deal with WAW conflicts in scatters makes this a bit awkward.  If
we document a MEM of a vector is effectively an UNSPEC we win
nothing?

A MEM with a vector of addresses is equivalent to N independent MEMs with scalar addresses. This much is clear, I think.

We can also define (and assert) what attributes are valid, and what they mean, as already discussed. I'm inclined to say that the alignment is per-lane (although I'm not sure where that matters)

What happens if the vector sizes don't match is not something I have considered, yet. For GCN, I can conceive of wanting to implement packed vectors using (mem:V128HI (reg:V64DI)), in which case it loads pairs of values, but perhaps that's better expressed as (subreg:V128HI (mem:V64SI (reg:V64DI)) 0), in any case. (Support for the packed vector instructions is not implemented in the backend; this is just hypothetical.)

Scatters that might write to the same address multiple times are an interesting problem. A hook that says which one "wins" would be easy enough (first, last, undefined), although it might depend on context?

For GCN, I believe the result of multiple writes to the same address by the same instruction is undefined -- the vector threads run sufficiently independently that the hardware is not deterministic -- but if the MEM appears in an atomic_add, for example, then that's completely fine: the conflicting writes are serialized in a non-deterministic order, but all the updates are applied.

For me, I don't think it matters if vector MEM is effectively an UNSPEC to the optimizers, if I still get the benefits in the machine description. Once in place, somebody will find something smart to do with it, sooner or later.

What "common" code that handles MEMs at the moment are you
trying to rely on?

The immediate goal is to simplify my machine description and by extension simplify some experimental code transformations that I'm working on. In particular, it's helpful if the scalar code has a similar shape to the vector code that does the same thing (but wider -- note that, on GCN, "scalar" code often looks like "V1" in assembler).

For that to work, I need match_operand to work for all the legitimate addresses, and very little more.

I also think that all those could be fixed as well (e.g. by giving up).

Furthermore I think we somewhen do need a proper representation of the
concept behind scatter/gather MEMs, where "proper" is not "a RTL
vec_concat of MEMs".  If we went that vec_concat route when vector modes
were introduced and we had represented vector REGs as a vec_concat as
well, instead of the current top-level RTL REG, we would all be mad by
now.

So, IMHO a top-level construct for "N-lane MEM access with N-lane
addresses" is the right thing to do (and was, for a long time).  The only
question is specifics: should it be a MEM, or a new top-level code?
Should the only difference between a MEM as-of-now and the vMEM be the
fact that the address has a vector mode?  Or flags on the MEM?

(IMHO: MEM with vMODE addresses is enough, but see below for a case of
new toplevel code).

Which transformations should be allowed to be represented within the
addresses?  Should it only be a vMODE REG?  Could it be more, like the
scalar offset that's added to all lanes that Andrews architecture would
have, or a scalar scale that's multiplied to each lane?  How to represent
that?  If the vMEM would be a separate top-level RTL, it could have two
slots, one for the base addresses (vMODE), and one for an arithmetic
scalar transform applied to each lane (word_mode).  With a MEM that's more
complicated and would somehow have to be wrapped in the vMODE address.
But the latter might be convenient in other places as well, for instance
when calculating such address vector without actual memory access.

And so on...

But I think when Andrew wants to put in the work to make this ... well,
work, then it would be good for GCC.

I think the recent discussion on how to represent (len-)masking and else
values also comes into play here given at least we have masked variants
of gathers and scatters.

I've not been following this (the "len" stuff is not usually relevant to
GCN), so I'm not completely sure which issue you're referring to.

I agree that masking is an issue here, because ...

    (set (mem ....)
         (vec_merge
             (src)
             (mem .... "0")
             (mask)))

... is potentially different to ...

    (set (reg 123)
         (mem ...))
    (set (reg 123)
         (vec_merge
             (src)
             (reg 123)
             (mask)))
    (set (mem ...)
         (reg 123))

However, this was already an issue for contiguous vectors, so while it
could be a new problem for GCN (good catch!), surely this is an old
problem on other architectures?

Yes, it's a representational issue (for that RTL abstract machine).

Is this something you're looking to address in the specification, or is it something to bear in mind in the implementation?

(This is not currently a problem on GCN because maskload gives an
unbreakable UNSPEC.)

Likewise on x86, but not on aarch64 it was said.

Richard.


Andrew



Reply via email to