On Thu, Jul 12, 2018 at 01:52:49PM +0200, Andrea Parri wrote: > On Thu, Jul 12, 2018 at 09:40:40AM +0200, Peter Zijlstra wrote: > > On Wed, Jul 11, 2018 at 02:34:21PM +0200, Andrea Parri wrote: > > > Simplicity is the eye of the beholder. From my POV (LKMM maintainer), the > > > simplest solution would be to get rid of rfi-rel-acq and unlock-rf-lock-po > > > (or its analogous in v3) all together: > > > > <snip> > > > > > Among other things, this would immediately: > > > > > > 1) Enable RISC-V to use their .aq/.rl annotations _without_ having to > > > "worry" about tso or release/acquire fences; IOW, this will permit > > > a partial revert of: > > > > > > 0123f4d76ca6 ("riscv/spinlock: Strengthen implementations with > > > fences") > > > 5ce6c1f3535f ("riscv/atomic: Strengthen implementations with > > > fences") > > > > But I feel this goes in the wrong direction. This weakens the effective > > memory model, where I feel we should strengthen it. > > > > Currently PowerPC is the weakest here, and the above RISC-V changes > > (reverts) would make RISC-V weaker still. > > > > Any any effective weakening makes me very uncomfortable -- who knows > > what will come apart this time. This memory ordering stuff causes > > horrible subtle bugs at best. > > Indeed, what I was suggesting above is a weaking of the current model > (and I agree: I wouldn't say that bugs in this context are nice ;-). > > These changes would affect a specific area: (IMO,) the examples we've > been considering here aren't for the faint-hearted ;-) and as Daniel > already suggested, everything would again be "nice and neat", if this > was all about locking/if every thread used lock-synchronization. > > > > > > > 2) Resolve the above mentioned controversy (the inconsistency between > > > - locking operations and atomic RMWs on one side, and their actual > > > implementation in generic code on the other), thus enabling the use > > > of LKMM _and_ its tools for the analysis/reviewing of the latter. > > > > This is a good point; so lets see if there is something we can do to > > strengthen the model so it all works again. > > > > And I think if we raise atomic*_acquire() to require TSO (but ideally > > raise it to RCsc) we're there. > > > > The TSO archs have RCpc load-acquire and store-release, but fully > > ordered atomics. Most of the other archs have smp_mb() everything, with > > the exception of PPC, ARM64 and now RISC-V. > > > > PPC has the RCpc TSO fence LWSYNC, ARM64 has the RCsc > > load-acquire/store-release. And RISC-V has a gazillion of options IIRC. > > > > > > So ideally atomic*_acquire() + smp_store_release() will be RCsc, and is > > with the notable exception of PPC, and ideally RISC-V would be RCsc > > here. But at the very least it should not be weaker than PPC. > > > > By increasing atomic*_acquire() to TSO we also immediately get the > > proposed: > > > > P0() > > { > > WRITE_ONCE(X, 1); > > spin_unlock(&s); > > spin_lock(&s); > > WRITE_ONCE(Y, 1); > > } > > > > P1() > > { > > r1 = READ_ONCE(Y); > > smp_rmb(); > > r2 = READ_ONCE(X); > > } > > > > behaviour under discussion; because the spin_lock() will imply the TSO > > ordering. > > You mean: "when paired with a po-earlier release to the same memory > location", right? I am afraid that neither arm64 nor riscv current > implementations would ensure "(r1 == 1 && r2 == 0) forbidden" if we > removed the po-earlier spin_unlock()... > > AFAICT, the current implementation would work with that release: as > you remarked above, arm64 release->acquire is SC; riscv has an rw,w > fence in its spin_unlock() (hence an w,w fence between the stores), > or it could have a .tso fence ... > > But again, these are stuble patterns, and my guess is that several/ > most kernel developers really won't care about such guarantees (and > if some will do, they'll have the tools to figure out what they can > actually rely on ...) > > OTOH (as I pointed out earlier) the strengthening we're configuring > will prevent some arch. (riscv being just the example of today!) to > go "full RCsc", and this will inevitably "complicate" both the LKMM
"full RCpc" Andrea > and the reviewing process of related changes (atomics, locking, ...; > c.f., this debate), apparently, just because you ;-) want to "care" > about these guarantees. > > Not yet convinced ... :/ > > Andrea > > > > > > And note that this retains regular RCpc ACQUIRE for smp_load_acquire() > > and associated primitives -- as they have had since their introduction > > not too long ago.