On Wed, 29 Nov 2017, Daniel Lustig wrote: > While we're here, let me ask about another test which isn't directly > about unlock/lock but which is still somewhat related to this > discussion: > > "MP+wmb+xchg-acq" (or some such) > > {} > > P0(int *x, int *y) > { > WRITE_ONCE(*x, 1); > smp_wmb(); > WRITE_ONCE(*y, 1); > } > > P1(int *x, int *y) > { > r1 = atomic_xchg_relaxed(y, 2); > r2 = smp_load_acquire(y); > r3 = READ_ONCE(*x); > } > > exists (1:r1=1 /\ 1:r2=2 /\ 1:r3=0) > > C/C++ would call the atomic_xchg_relaxed part of a release sequence > and hence would forbid this outcome. > > x86 and Power would forbid this. ARM forbids this via a special-case > rule in the memory model, ordering atomics with later load-acquires. > > RISC-V, however, wouldn't forbid this by default using RCpc or RCsc > atomics for smp_load_acquire(). It's an "fri; rfi" type of pattern, > because xchg doesn't have an inherent internal data dependency. > > If the Linux memory model is going to forbid this outcome, then > RISC-V would either need to use fences instead, or maybe we'd need to > add a special rule to our memory model similarly. This is one detail > where RISC-V is still actively deciding what to do. > > Have you all thought about this test before? Any idea which way you > are leaning regarding the outcome above?
Good questions. Currently the LKMM allows this, and I think it should because xchg doesn't have a dependency from its read to its write. On the other hand, herd isn't careful enough in the way it implements internal dependencies for RMW operations. If we change atomic_xchg_relaxed(y, 2) to atomic_inc(y) and remove r1 from the test: C MP+wmb+inc-acq {} P0(int *x, int *y) { WRITE_ONCE(*x, 1); smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *x, int *y) { atomic_inc(y); r2 = smp_load_acquire(y); r3 = READ_ONCE(*x); } exists (1:r2=2 /\ 1:r3=0) then the test _should_ be forbidden, but it isn't -- herd doesn't realize that all atomic RMW operations other than xchg must have a dependency (either data or control) between their internal read and write. (Although the smp_load_acquire is allowed to execute before the write part of the atomic_inc, it cannot execute before the read part. I think a similar argument applies even on ARM.) Luc, consider this a bug report. :-) Alan