On Thu, Jun 13, 2019 at 12:58:11PM -0400, Alan Stern wrote: > On Thu, 13 Jun 2019, David Howells wrote: > > > Peter Zijlstra <pet...@infradead.org> wrote: > > > > > Basically we fail for: > > > > > > *x = 1; > > > atomic_inc(u); > > > smp_mb__after_atomic(); > > > r0 = *y; > > > > > > Because, while the atomic_inc() implies memory order, it > > > (surprisingly) does not provide a compiler barrier. This then allows > > > the compiler to re-order like so: > > > > To quote memory-barriers.txt: > > > > (*) smp_mb__before_atomic(); > > (*) smp_mb__after_atomic(); > > > > These are for use with atomic (such as add, subtract, increment and > > decrement) functions that don't return a value, especially when used > > for > > reference counting. These functions do not imply memory barriers. > > > > so it's entirely to be expected? > > The text is perhaps ambiguous. It means that the atomic functions > which don't return values -- like atomic_inc() -- do not imply memory > barriers. It doesn't mean that smp_mb__before_atomic() and > smp_mb__after_atomic() do not imply memory barriers. > > The behavior Peter described is not to be expected. The expectation is > that the smp_mb__after_atomic() in the example should force the "*x = > 1" store to execute before the "r0 = *y" load. But on current x86 it > doesn't force this, for the reason explained in the description.
Indeed, thanks Alan. The other other approach would be to upgrade smp_mb__{before,after}_mb() to actual full memory barriers on x86, but that seems quite rediculous since atomic_inc() already does all the expensive bits and is only missing the compiler barrier. That would result in code like: mov $1, x lock inc u lock addl $0, -4(%rsp) # aka smp_mb() mov y, %r which is really quite silly. And as noted in the Changelog, about half the non-value returning atomics already implied the compiler barrier anyway.