> From: Stephen Hemminger [mailto:[email protected]]
> Sent: Monday, 25 May 2026 17.35
> 
> On Mon, 25 May 2026 07:41:13 +0000
> Konstantin Ananyev <[email protected]> wrote:
> 
> > Hi Stephen,
> >
> > > The rte_atomic32_cmpset is deprecated. Initial attempts at
> > > changing this with direct conversion to
> > > rte_atomic_compare_exchange_weak_explicit()
> > > regressed MP/MC contended performance on x86 by 10-30%,
> > > because the C11 builtin's failure-writeback semantic forces
> > > GCC to emit extra instructions on the CAS critical path.
> > >
> > > Add an internal __rte_ring_compare_and_swap() wrapper that calls
> > > __sync_bool_compare_and_swap() directly, which keeps the original
> > > instruction sequence. Add equivalent function for MSVC.
> >
> > In fact, in rte_ring we do have 2 implementations of the same core
> functions:
> > lib/ring/rte_ring_c11_pvt.h  - uses C11 atomics
> > lib/ring/rte_ring_generic_pvt.h - uses legacy instructions (smp_mb,
> extra),
> > If we going remove these legacy instructions anyway (or
> reimplementing them using C11 atomics),
> > then there is probably no point to keep rte_ring_generic_pvt.h.
> > Konstantin
> 
> Have been deep diving into why C11 atomics give 20-30% performance
> drop versus atomic32 version. So far it comes down to GCC optimizer
> not doing as well with C11 versus assembly. The C11 form with the
> excessive use of always_inline consumes more registers.

Just an idea:
Perhaps adding "const" and/or "restrict" to relevant parameters will give the 
optimizer the information it needs?

Reply via email to