On Mon, 25 May 2026 07:41:13 +0000 Konstantin Ananyev <[email protected]> wrote:
> Hi Stephen, > > > The rte_atomic32_cmpset is deprecated. Initial attempts at > > changing this with direct conversion to > > rte_atomic_compare_exchange_weak_explicit() > > regressed MP/MC contended performance on x86 by 10-30%, > > because the C11 builtin's failure-writeback semantic forces > > GCC to emit extra instructions on the CAS critical path. > > > > Add an internal __rte_ring_compare_and_swap() wrapper that calls > > __sync_bool_compare_and_swap() directly, which keeps the original > > instruction sequence. Add equivalent function for MSVC. > > In fact, in rte_ring we do have 2 implementations of the same core functions: > lib/ring/rte_ring_c11_pvt.h - uses C11 atomics > lib/ring/rte_ring_generic_pvt.h - uses legacy instructions (smp_mb, extra), > If we going remove these legacy instructions anyway (or reimplementing them > using C11 atomics), > then there is probably no point to keep rte_ring_generic_pvt.h. > Konstantin Have been deep diving into why C11 atomics give 20-30% performance drop versus atomic32 version. So far it comes down to GCC optimizer not doing as well with C11 versus assembly. The C11 form with the excessive use of always_inline consumes more registers.

