On Mon, Feb 20, 2017 at 12:58:39PM +0800, Boqun Feng wrote:
> > So Waiman, the fact is that in this case, we want the following code
> > sequence:
> > 
> >     CPU 0                                   CPU 1
> >     =================                       ====================
> >     {pn->state = vcpu_running, node->locked = 0}
> > 
> >     smp_store_smb(&pn->state, vcpu_halted):
> >       WRITE_ONCE(pn->state, vcpu_halted);
> >       smp_mb();
> >     r1 = READ_ONCE(node->locked);
> >                                             
> > arch_mcs_spin_unlock_contented();
> >                                               WRITE_ONCE(node->locked, 1)
> > 
> >                                             cmpxchg(&pn->state, 
> > vcpu_halted, vcpu_hashed);
> > 
> > never ends up in:
> > 
> >     r1 == 0 && cmpxchg fail(i.e. the read part of cmpxchg reads the
> >     value vcpu_running).
> > 
> > We can have such a guarantee if cmpxchg has a smp_mb() before its load
> > part, which is true for PPC. But semantically, cmpxchg() doesn't provide
> > any order guarantee if it fails, which is true on ARM64, IIUC. (Add Will
> > in Cc for his insight ;-)).

I think you're right. The write to node->locked on CPU1 is not required
to be ordered before the load part of the failing cmpxchg.

> > So a possible "fix"(in case ARM64 will use qspinlock some day), would be
> > replace cmpxchg() with smp_mb() + cmpxchg_relaxed().

Peversely, we could actually get away with cmpxchg_acquire on arm64 because
arch_mcs_spin_unlock_contended is smp_store_release and we order release ->
acquire in the architecture. But that just brings up the age old unlock/lock
discussion again...

Will

Reply via email to