Hello, On Wed, 18 Mar 2026, Richard Biener wrote:
> > > For combining scatters there might be a V4DI scatter pattern, so > > > presumably two back-to-back scatters could be combined by (vec_concat > > > ..) on the address vector of the MEM? > > > > Now, how come scatters, i.e. writes, into play? Are you worried about > > combining two gather-loads plus merge plus scatter-store into a single > > gather-scatter instruction? > > No, I'm thinking of combine combining the defs for the V2DI, and RTX > simplification merging two concated gathers to a single gather. Then I'm confused. What problems do you forsee in combining two load instructions into one load instruction, when the constraints that we currently apply for that, apply (both needs to be non-trapping, no intervening side-effects, and so on). > Or a pass seing two back-to-back scatters combining it to a "larger" > scatter if such insn passes recog. This is also currently "possible". Two SI writes, four bytes apart, could be combined into a single DI write. Whatever makes us avoid doing that currently, when invalid, also will avoid the scatter-combine when invalid. Yes, its possible that "Whatever" needs amending, of course. > > > I think we need to document exactly what a MEM of a vector address is > > > in terms of a RTL abstract machine, otherwise we cannot work on it > > > with generic code. > > > > Yes, but I don't see the hardship in doing that. Most of it is > > obvious: (MEM:VxMODE (rtl:VyPTR)) (x and y, i.e. number of lanes must > > match!) represents the obvious blobs in memory. If there are overlaps in > > the blobs: choices: > > > > a) target defined > > b) disallowed aka undefined > > c) implementation defined (bad choice) > > Yes, but for example on x86 the memory order for scatters is defined to > be left-to-right. For GCN it's undefined, aka it would be > implementation defined. No, that's target defined then. Implementation defined would be "GCC defines is like so-and-so, always", which is the bad choice because whatever so-and-so is, some targets won't be able to implement that without jumping through hoops. > Or we declare it undefined, but then, as said, _mm_scatter (..) cannot > directly map to such operation Yes, choice (b) is also strictly worse than choice (a). > and the vectorizer couldn't use it > either, since we cannot rule out such conflicts. We can in certain situations, for instance often when vectorizing. > That also means that lane order (on x86) matters and I guess GCN cannot > use scatter, and that we cannot change a loop (lane) iteration schedule > (like for whatever reason reversing it). Hmm? If conflicts can be ruled out, the choice for what happens with conflicts doesn't matter. If conflicts might exist: sure, target dependend if choice (a), and then certain things can be used only conditionally, on some targets or some situations. > > always with possibly a flag on the MEM saying "nope, I guarantee no > > overlap". > > > > in a way it's similar to an atomic access straddling a cache line, in > > respect to atomicity guarantees: ultimately its target dependend, and the > > compiler cannot nilly-willy invent MEMs with a different structure in such > > cases. > > OK, so MEMs with vectors would be "special" then. Not in my mind. UNSPECs are special. > IIRC atomics are UNSPECs(?) No. Atomic operations have their own optabs, so, its whatever the target expands, and the MEMs for atomic objects simply are MEM_VOLATILE_P (/v). Ciao, Michael.
