On Thu, Jun 13, 2019 at 12:58:11PM -0400, Alan Stern wrote:
> On Thu, 13 Jun 2019, David Howells wrote:
> 
> > Peter Zijlstra <pet...@infradead.org> wrote:
> > 
> > > Basically we fail for:
> > > 
> > >   *x = 1;
> > >   atomic_inc(u);
> > >   smp_mb__after_atomic();
> > >   r0 = *y;
> > > 
> > > Because, while the atomic_inc() implies memory order, it
> > > (surprisingly) does not provide a compiler barrier. This then allows
> > > the compiler to re-order like so:
> > 
> > To quote memory-barriers.txt:
> > 
> >  (*) smp_mb__before_atomic();
> >  (*) smp_mb__after_atomic();
> > 
> >      These are for use with atomic (such as add, subtract, increment and
> >      decrement) functions that don't return a value, especially when used 
> > for
> >      reference counting.  These functions do not imply memory barriers.
> > 
> > so it's entirely to be expected?
> 
> The text is perhaps ambiguous.  It means that the atomic functions
> which don't return values -- like atomic_inc() -- do not imply memory
> barriers.  It doesn't mean that smp_mb__before_atomic() and
> smp_mb__after_atomic() do not imply memory barriers.
> 
> The behavior Peter described is not to be expected.  The expectation is 
> that the smp_mb__after_atomic() in the example should force the "*x = 
> 1" store to execute before the "r0 = *y" load.  But on current x86 it 
> doesn't force this, for the reason explained in the description.

Indeed, thanks Alan.

The other other approach would be to upgrade smp_mb__{before,after}_mb()
to actual full memory barriers on x86, but that seems quite rediculous
since atomic_inc() already does all the expensive bits and is only
missing the compiler barrier.

That would result in code like:

        mov $1, x
        lock inc u
        lock addl $0, -4(%rsp) # aka smp_mb()
        mov y, %r

which is really quite silly.

And as noted in the Changelog, about half the non-value returning
atomics already implied the compiler barrier anyway.

Reply via email to