https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63483
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- I still don't get it. What we end up doing is reg:DI 72 = /u load from 'a' (insn 6) reg:DI 78 = /u load from 'b' (insn 15) ... RMW sequence on (mem:DI (reg:DI 72 ... store to (mem:SI (reg:DI 78 ... thus completely fine (after sched1). sched2 then moves the /u load from 'b' until after the R of the RMW sequence, but that's fine. Now I think you seem to say that this isn't about /u or whatever but the common issue that for example for globals char c, d; two RMW sequences to store to c and d may not overlap if 'c' and 'd' are not padded out to DImode. But this is an issue on all targets (ISTR a bug about this), and if that happens it introduces a store data race that is not allowed with the C++ memory model for example. Bad luck for early alphas then I'd say. At least I can see nothing wrong with the code generated for the two testcases you provided nor is the fix you propose any good (I see nothing else but the (and:DI ...) addressing that might possibly serve as "barrier" here, but even that would not be enough for, say char c __attribute__((aligned(8))); char d; because then we hopefully optimize that AND away for the load from c. Can you clarify whether I miss anything with the code generated for the testcase?