> > + (1) The compiler can reorder the load from a to precede the > > + atomic_dec(), (2) Because x86 smp_mb__before_atomic() is only a > > + compiler barrier, the CPU can reorder the preceding store to > > + obj->dead with the later load from a. > > + > > + This could be avoided by using READ_ONCE(), which would prevent the > > + compiler from reordering due to both atomic_dec() and READ_ONCE() > > + being volatile accesses, and is usually preferable for loads from > > + shared variables. However, weakly ordered CPUs would still be > > + free to reorder the atomic_dec() with the load from a, so a more > > + readable option is to also use smp_mb__after_atomic() as follows: > > + > > + WRITE_ONCE(obj->dead, 1); > > + smp_mb__before_atomic(); > > + atomic_dec(&obj->ref_count); > > + smp_mb__after_atomic(); > > + r1 = READ_ONCE(a); > > The point here is not just "readability", but also the portability of the > code, isn't it?
The implicit assumption was, I guess, that all weakly ordered CPUs which are free to reorder the atomic_dec() with the READ_ONCE() execute a full memory barrier in smp_mb__before_atomic() ... This assumption currently holds, AFAICT, but yes: it may well become "non-portable"! ... ;-) Andrea