Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory

Honnappa Nagarahalli Mon, 31 Aug 2020 11:46:16 -0700


Hi Diogo,


Thanks for your explanation.

As documented in https://developer.arm.com/documentation/ddi0487/fc  B2.9.5 
Load-Exclusive and Store-Exclusive instruction usage restrictions:
" Between the Load-Exclusive and the Store-Exclusive, there are no explicit 
memory accesses, preloads, 
direct or indirect System register writes, address translation instructions, 
cache or TLB maintenance
instructions, exception generating instructions, exception returns, or indirect 
branches."
[Honnappa] This is a requirement on the software, not on the micro-architecture.
We are having few discussions internally, will get back soon.

So it is not allowed to insert (1) & (4) between (2, 3). The cmpxchg operation 
is atomic.

Thanks,
Phil Yang

> -----Original Message-----
> From: Diogo Behrens <mailto:[email protected]>
> Sent: Thursday, August 27, 2020 4:57 PM
> To: Phil Yang <mailto:[email protected]>
> Cc: mailto:[email protected]; nd <mailto:[email protected]>; Honnappa Nagarahalli
> <mailto:[email protected]>
> Subject: RE: [PATCH] librte_eal: fix mcslock hang on weak memory
> 
> Hi Phil,
> 
> thanks for taking a look at this. Indeed, the cmpxchg will have release
> semantics, but implicit barriers do not provide the same ordering guarantees
> as DMB ISH, ie, atomic_thread_fence(__ATOMIC_ACQ_REL).
> 
> Let me try to explain the scenario. Here is a pseudo-ARM assembly with the
> instructions involved:
> 
> STR me->locked  = 1   // 1: initialize the node
> LDAXR pred = tail  // 2: cmpxchg
> STLXR tail = me    // 3: cmpxchg
> STR pred->next = me // 4: the problematic store
> 
> Now, ARM allows instruction 1 and 2 to be reordered, and also instructions 3
> and 4 to be reordered:
> 
> LDAXR pred = tail  // 2: cmpxchg  (acquires can be moved up)
> STR me-> locked = 1   // 1: initialize the node
> STR pred->next = me // 4: the problematic store
> STLXR tail = me    // 3: cmpxchg (releases can be moved down)
> 
> And since stores 1 and 4 can be reordered too, the following execution is
> possible:
> 
> LDAXR pred = tail  // 2: cmpxchg
> STR pred->next = me // 4: the problematic store
> STR me-> locked = 1   // 1: initialize the node
> STLXR tail = me    // 3: cmpxchg
> 
> In this subtle situation, the locking-thread can overwrite the store to next-
> >locked of the unlocking-thread (if it occurs between 4 and 1), and
> subsequently hang waiting for me->locked to become 0.
> 
> We couldn't reproduce this with our available hardware, but we
> "reproduced" it with the RMEM model checker, which precisely models the
> ARM memory model (https://github.com/rems-project/rmem). The GenMC
> model checker (https://github.com/MPI-SWS/genmc) also finds the issue,
> but its intermediate memory model is slightly more general than the ARM
> model.
> 
> The release attached to store (4) prevents (1) to reordering with it.
> 
> Please, let me know if that makes sense or if I am missing something.
> 
> Thanks,
> -Diogo
> 
> -----Original Message-----
> From: Phil Yang [mailto:[email protected]]
> Sent: Wednesday, August 26, 2020 12:17 PM
> To: Diogo Behrens <mailto:[email protected]>
> Cc: mailto:[email protected]; nd <mailto:[email protected]>; Honnappa Nagarahalli
> <mailto:[email protected]>; nd <mailto:[email protected]>
> Subject: RE: [PATCH] librte_eal: fix mcslock hang on weak memory
> 
> Diogo Behrens <mailto:[email protected]> writes:
> 
> > Subject: [PATCH] librte_eal: fix mcslock hang on weak memory
> >
> >     The initialization me->locked=1 in lock() must happen before
> >     next->locked=0 in unlock(), otherwise a thread may hang forever,
> >     waiting me->locked become 0. On weak memory systems (sHi Puch as
> ARMv8),
> >     the current implementation allows me->locked=1 to be reordered with
> >     announcing the node (pred->next=me) and, consequently, to be
> >     reordered with next->locked=0 in unlock().
> >
> >     This fix adds a release barrier to pred->next=me, forcing
> >     me->locked=1 to happen before this operation.
> >
> > Signed-off-by: Diogo Behrens <mailto:[email protected]>
> > ---
> >  lib/librte_eal/include/generic/rte_mcslock.h | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/include/generic/rte_mcslock.h
> > b/lib/librte_eal/include/generic/rte_mcslock.h
> > index 2bef28351..ce553f547 100644
> > --- a/lib/librte_eal/include/generic/rte_mcslock.h
> > +++ b/lib/librte_eal/include/generic/rte_mcslock.h
> > @@ -68,7 +68,14 @@ rte_mcslock_lock(rte_mcslock_t **msl,
> rte_mcslock_t
> > *me)
> >                           */
> >                          return;
> >          }
> > -       __atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
> > +      /* The store to me->next above should also complete before the
> > node is
> > +      * visible to predecessor thread releasing the lock. Hence, the store
> > +      * prev->next also requires release semantics. Note that, for
> > example,
> > +      * on ARM, the release semantics in the exchange operation is not
> > +      * strong as a release fence and is not sufficient to enforce the
> > +      * desired order here.
> 
> 
> Hi Diogo,
> 
> I didn't quite understand why the exchange operation with ACQ_REL
> memory ordering is not sufficient.
> It emits 'stlxr' on armv8.0-a and 'swpal' on armv8.1-a (with LSE support).
> Both of these two instructions contain a release semantics.
> 
> Please check the URL for the detail.
> https://gcc.godbolt.org/z/Mc4373
> 
> BTW, if you captured a deadlock issue on your testbed, could you please
> share your test case here?
> 
> Thanks,
> Phil
> 
> 
> > +      */
> > +      __atomic_store_n(&prev->next, me, __ATOMIC_RELEASE);
> >
> >          /* The while-load of me->locked should not move above the
> previous
> >           * store to prev->next. Otherwise it will cause a deadlock. Need a
> > --
> > 2.28.0
> >

Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory

Reply via email to