> The function __rte_ring_headtail_move_head() assumes that the barrier > (fence) between the load of the head and the load-acquire of the > opposing tail guarantees the following: if a first thread reads tail > and then writes head and a second thread reads the new value of head > and then reads tail, then it should observe the same (or a later) > value of tail. > > This assumption is incorrect under the C11 memory model. If the barrier > (fence) is intended to establish a total ordering of ring operations, > it fails to do so. Instead, the current implementation only enforces a > partial ordering, which can lead to unsafe interleavings. In particular, > some partial orders can cause underflows in free slot or available > element computations, potentially resulting in data corruption. > > The issue manifests when a CPU first acts as a producer and later as a > consumer. In this scenario, the barrier assumption may fail when another > core takes the consumer role. A Herd7 litmus test in C11 can demonstrate > this violation. The problem has not been widely observed so far because: > (a) on strong memory models (e.g., x86-64) the assumption holds, and > (b) on relaxed models with RCsc semantics the ordering is still strong > enough to prevent hazards. > The problem becomes visible only on weaker models, when load-acquire is > implemented with RCpc semantics (e.g. some AArch64 CPUs which support > the LDAPR and LDAPUR instructions). > > Three possible solutions exist: > 1. Strengthen ordering by upgrading release/acquire semantics to > sequential consistency. This requires using seq-cst for stores, > loads, and CAS operations. However, this approach introduces a > significant performance penalty on relaxed-memory architectures. > > 2. Establish a safe partial order by enforcing a pair-wise > happens-before relationship between thread of same role by changing > the CAS and the preceding load of the head by converting them to > release and acquire respectively. This approach makes the original > barrier assumption unnecessary and allows its removal. > > 3. Retain partial ordering but ensure only safe partial orders are > committed. This can be done by detecting underflow conditions > (producer < consumer) and quashing the update in such cases. > This approach makes the original barrier assumption unnecessary > and allows its removal. > > This patch implements solution (2) to preserve the “enqueue always > succeeds” contract expected by dependent libraries (e.g., mempool). > While solution (3) offers higher performance, adopting it now would > break that assumption. > > Signed-off-by: Wathsala Vithanage <[email protected]> > Signed-off-by: Ola Liljedahl <[email protected]> > Reviewed-by: Honnappa Nagarahalli <[email protected]> > Reviewed-by: Dhruv Tripathi <[email protected]> > --- > lib/ring/rte_ring_c11_pvt.h | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h > index b9388af0da..98c6584edb 100644 > --- a/lib/ring/rte_ring_c11_pvt.h > +++ b/lib/ring/rte_ring_c11_pvt.h > @@ -78,14 +78,11 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail > *d, > unsigned int max = n; > > *old_head = rte_atomic_load_explicit(&d->head, > - rte_memory_order_relaxed); > + rte_memory_order_acquire); > do { > /* Reset n to the initial burst count */ > n = max; > > - /* Ensure the head is read before tail */ > - rte_atomic_thread_fence(rte_memory_order_acquire); > - > /* load-acquire synchronize with store-release of ht->tail > * in update_tail. > */ > @@ -115,8 +112,8 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail > *d, > /* on failure, *old_head is updated */ > success = > rte_atomic_compare_exchange_strong_explicit( > &d->head, old_head, *new_head, > - rte_memory_order_relaxed, > - rte_memory_order_relaxed); > + rte_memory_order_acq_rel, > + rte_memory_order_acquire); > } while (unlikely(success == 0)); > return n; > } > --
LGTM, though. I think that we also need to make similar changes in rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h: for CAS use 'acq_rel' order instead of simple 'acquire'. Let me know would you have a bandwidth to do that. Acked-by: Konstantin Ananyev <[email protected]> Tested-by: Konstantin Ananyev <[email protected]> > 2.43.0 >

