On 18/03/2026 13:52, Michael Matz wrote:
Now, how come scatters, i.e. writes, into play? Are you worried about
combining two gather-loads plus merge plus scatter-store into a single
gather-scatter instruction? Well, if the backend/architecture does define
a mem-mem (with scatter/gather, no less!) insn, then sure, a combiner
could be tempted to try that. I say: good! If the the target does have
such an insns, more power to them. Of course the usual RTL semantics must
match: all uses (here: the gather loads) come before _all_ writes (the
scatter stores). If that's not the case for the target insns, then
early-clobbers must be used on the respective operands.
This is actually an excellent point, because GCN *does* have such
instructions (but only when the input and output addresses match).
Hypothetically, we could do this for LDS memory (which uses 32-bit
pointers):
(set (mem:V64SI (reg:V64SI 123))
(plus:V64SI (mem:V64SI (reg:V64SI 123))
(reg:V64SI 456)))
This is "ds_add_u32" which does an addition of a value in memory without
loading the value into registers.
We do not generally implement such instructions because the UNSPECs mean
they're not easily "discoverable" by combine and we'd need a variant of
the UNSPEC for every flavour of address.
But, with vector MEM this would be trivial!
(Although I probably still wouldn't because nothing in OpenMP or Newlib
could make use of it).
There are also similar instructions for general memory, but those are
labelled "atomic" and I'm not sure what the performance/caching would be
like if we tried to use them for optimization.
Andrew