On Sat, Apr 20, 2019 at 11:50:14PM +0900, Akira Yokosawa wrote: > On Fri, 19 Apr 2019 11:06:41 -0700, Paul E. McKenney wrote: > > On Sat, Apr 20, 2019 at 12:06:58AM +0900, Akira Yokosawa wrote: > >> Hi Paul, > >> > [...] > > > >>> + (1) The compiler can reorder the load from a to precede the > >>> + atomic_dec(), (2) Because x86 smp_mb__before_atomic() is only a > >>> + compiler barrier, the CPU can reorder the preceding store to > >>> + obj->dead with the later load from a. > >>> + > >>> + This could be avoided by using READ_ONCE(), which would prevent the > >>> + compiler from reordering due to both atomic_dec() and READ_ONCE() > >>> + being volatile accesses, and is usually preferable for loads from > >>> + shared variables. However, weakly ordered CPUs would still be > >>> + free to reorder the atomic_dec() with the load from a, so a more > >>> + readable option is to also use smp_mb__after_atomic() as follows: > >> > >> The point here is not just "readability", but also the portability of the > >> code, isn't it? > > > > As Andrea noted, in this particular case, the guarantee that the > > store to obj->dead precedes the load from x is portable. Either the > > smp_mb__before_atomic() or the atomic_dec() must provide the ordering. > > I think I understood this. What I wanted to say was the code for x86 implied > in the subjunctive sentence: > > obj->dead = 1; > smp_mb__before_atomic(); > atomic_dec(&obj->ref_count); > r1 = READ_ONCE(x); > > , which was not spelled out, is not portable if we expect the ordering of > atomic_dec() with READ_ONCE().
I now understand that you understood. ;-) > > However, you are right that there is some non-portability. But this > > non-portability involves the order of the atomic_dec() and the store to x. > > Yes, you've guessed it right. Don't worry, it won't happen again! > > So what I did was ... > > > >> Thanks, Akira > >> > >>> + > >>> + WRITE_ONCE(obj->dead, 1); > >>> + smp_mb__before_atomic(); > >>> + atomic_dec(&obj->ref_count); > >>> + smp_mb__after_atomic(); > >>> + r1 = READ_ONCE(a); > >>> + > >>> + This orders all three accesses against each other, and also makes > >>> + the intent quite clear. > > > > ... change the above paragraph to read as follows: > > > > In addition, the example without the smp_mb__after_atomic() does > > not necessarily order the atomic_dec() with the load from x. > > In contrast, the example with both smp_mb__before_atomic() and > > smp_mb__after_atomic() orders all three accesses against each other, > > and also makes the intent quite clear. > > > > Does that help? > > This looks a little bit redundant to me. The original one is clear > enough. > > How about editing the leading sentence above: > > >>> + shared variables. However, weakly ordered CPUs would still be > >>> + free to reorder the atomic_dec() with the load from a, so a more > >>> + readable option is to also use smp_mb__after_atomic() as follows: > > to read as follows? > > shared variables. However, weakly ordered CPUs would still be > free to reorder the atomic_dec() with the load from x, so a > portable and more readable option is to also use > smp_mb__after_atomic() as follows: Adding "portable and", correct? Makes sense, so I applied this change. > Obviously, the interesting discussion going on in another thread will > surely affect this patch. Quite possibly! ;-) Thanx, Paul > >>> See Documentation/atomic_{t,bitops}.txt for more information. > >>> > >>> diff --git a/tools/memory-model/linux-kernel.cat > >>> b/tools/memory-model/linux-kernel.cat > >>> index 8dcb37835b61..b6866f93abb8 100644 > >>> --- a/tools/memory-model/linux-kernel.cat > >>> +++ b/tools/memory-model/linux-kernel.cat > >>> @@ -28,8 +28,8 @@ include "lock.cat" > >>> let rmb = [R \ Noreturn] ; fencerel(Rmb) ; [R \ Noreturn] > >>> let wmb = [W] ; fencerel(Wmb) ; [W] > >>> let mb = ([M] ; fencerel(Mb) ; [M]) | > >>> - ([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) | > >>> - ([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) | > >>> + ([M] ; fencerel(Before-atomic) ; [RMW]) | > >>> + ([RMW] ; fencerel(After-atomic) ; [M]) | > >>> ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) | > >>> ([M] ; po ; [UL] ; (co | po) ; [LKW] ; > >>> fencerel(After-unlock-lock) ; [M]) > >>> > >> > > >