tail update

Konstantin Ananyev Sat, 18 Oct 2025 04:40:43 -0700


> The function __rte_ring_headtail_move_head() assumes that the barrier
> (fence) between the load of the head and the load-acquire of the
> opposing tail guarantees the following: if a first thread reads tail
> and then writes head and a second thread reads the new value of head
> and then reads tail, then it should observe the same (or a later)
> value of tail.
> 
> This assumption is incorrect under the C11 memory model. If the barrier
> (fence) is intended to establish a total ordering of ring operations,
> it fails to do so. Instead, the current implementation only enforces a
> partial ordering, which can lead to unsafe interleavings. In particular,
> some partial orders can cause underflows in free slot or available
> element computations, potentially resulting in data corruption.
> 
> The issue manifests when a CPU first acts as a producer and later as a
> consumer. In this scenario, the barrier assumption may fail when another
> core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
> this violation. The problem has not been widely observed so far because:
>   (a) on strong memory models (e.g., x86-64) the assumption holds, and
>   (b) on relaxed models with RCsc semantics the ordering is still strong
>       enough to prevent hazards.
> The problem becomes visible only on weaker models, when load-acquire is
> implemented with RCpc semantics (e.g. some AArch64 CPUs which support
> the LDAPR and LDAPUR instructions).
> 
> Three possible solutions exist:
>   1. Strengthen ordering by upgrading release/acquire semantics to
>      sequential consistency. This requires using seq-cst for stores,
>      loads, and CAS operations. However, this approach introduces a
>      significant performance penalty on relaxed-memory architectures.
> 
>   2. Establish a safe partial order by enforcing a pair-wise
>      happens-before relationship between thread of same role by changing
>      the CAS and the preceding load of the head by converting them to
>      release and acquire respectively. This approach makes the original
>      barrier assumption unnecessary and allows its removal.
> 
>   3. Retain partial ordering but ensure only safe partial orders are
>      committed. This can be done by detecting underflow conditions
>      (producer < consumer) and quashing the update in such cases.
>      This approach makes the original barrier assumption unnecessary
>      and allows its removal.
> 
> This patch implements solution (2) to preserve the “enqueue always
> succeeds” contract expected by dependent libraries (e.g., mempool).
> While solution (3) offers higher performance, adopting it now would
> break that assumption.
> 
> Signed-off-by: Wathsala Vithanage <[email protected]>
> Signed-off-by: Ola Liljedahl <[email protected]>
> Reviewed-by: Honnappa Nagarahalli <[email protected]>
> Reviewed-by: Dhruv Tripathi <[email protected]>
> ---
>  lib/ring/rte_ring_c11_pvt.h | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index b9388af0da..98c6584edb 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -78,14 +78,11 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> *d,
>       unsigned int max = n;
> 
>       *old_head = rte_atomic_load_explicit(&d->head,
> -                     rte_memory_order_relaxed);
> +                     rte_memory_order_acquire);
>       do {
>               /* Reset n to the initial burst count */
>               n = max;
> 
> -             /* Ensure the head is read before tail */
> -             rte_atomic_thread_fence(rte_memory_order_acquire);
> -
>               /* load-acquire synchronize with store-release of ht->tail
>                * in update_tail.
>                */
> @@ -115,8 +112,8 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> *d,
>                       /* on failure, *old_head is updated */
>                       success =
> rte_atomic_compare_exchange_strong_explicit(
>                                       &d->head, old_head, *new_head,
> -                                     rte_memory_order_relaxed,
> -                                     rte_memory_order_relaxed);
> +                                     rte_memory_order_acq_rel,
> +                                     rte_memory_order_acquire);
>       } while (unlikely(success == 0));
>       return n;
>  }
> --


LGTM, though. I think that we also need to make similar changes in
rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
for CAS use 'acq_rel' order instead of simple 'acquire'.  
Let me know would you have a bandwidth to do that.

Acked-by: Konstantin Ananyev <[email protected]>
Tested-by: Konstantin Ananyev <[email protected]>

> 2.43.0
>

RE: [PATCH v2 1/1] ring: safe partial ordering for head/tail update

Reply via email to