On Thu, 2 Nov 2017, Peter Zijlstra wrote:

> > Lock functions such as refcount_dec_and_lock() &
> > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > they atomic counterparts. 
> 
> Nope. The atomic_dec_and_lock() provides smp_mb() while
> refcount_dec_and_lock() merely orders all prior load/store's against all
> later load/store's.

In fact there is no guaranteed ordering when refcount_dec_and_lock()  
returns false; it provides ordering only if the return value is true.  
In which case it provides acquire ordering (thanks to the spin_lock),
and both release ordering and a control dependency (thanks to the
refcount_dec_and_test).

> The difference is subtle and involves at least 3 CPUs. I can't seem to
> write up anything simple, keeps turning into monsters :/ Will, Paul,
> have you got anything simple around?

The combination of acquire + release is not the same as smp_mb, because 
they allow things to pass by in one direction.  Example:

C C-refcount-vs-atomic-dec-and-lock

{
}

P0(int *x, int *y, refcount_t *r)
{
        refcount_set(r, 1);
        WRITE_ONCE(*x, 1);
        smp_wmb();
        WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, refcount_t *r, spinlock_t *s)
{
        int rx, ry;
        bool r1;

        ry = READ_ONCE(*y);
        r1 = refcount_dec_and_lock(r, s);
        if (r1)
                rx = READ_ONCE(*x);
}

exists (1:ry=1 /\ 1:r1=1 /\ 1:rx=0)

This is allowed.  The idea is that the CPU can take:

        Read y
        Acquire
        Release
        Read x

and execute the first read after the Acquire and the second read before 
the Release:

        Acquire
        Read y
        Read x
        Release

and then the CPU can reorder the reads:

        Acquire
        Read x
        Read y
        Release

If the program had used atomic_dec_and_lock() instead, which provides a 
full smp_mb barrier, this outcome would not be possible.

Alan Stern

Reply via email to