On Fri, 22 May 2026 15:19:00 +0100 Bruce Richardson <[email protected]> wrote:
> > > I decided to test this patchset with the ring_perf_autotest (using only two > cores on same socket) to see how performance may be affected on x86 with > this change. On an initial once-off test to compare performance > with/without this patchset for MP/MC cases, it looks like smaller enq/deq > burst e.g. 8/32 are slower after this set, while larger bursts e.g. 128/256 > are slightly faster. > > I then ran two more tests with the patches applied and again without, and > got AI to analyse the set of 6 results to come up with more meaningful > conclusions after a little bit more numeric analysis. Below is some of the > summary. > > While not necessarily a deal-breaker, the regressions seen are cause for > pause. We probably want to benchmark on a few other x86 (both Intel and > AMD) systems to see if this is a consistent picture. > > /Bruce Could you see if problem is the use of intrinsics on x86 or the changes to rte_ring_pvt? I am not convinced that deprecation of these function is hard requirement. This patchset is more of a what-if experiment. The other alternative is remove the deprecation notice and just leave well enough alone. But some of the places actually benefit from the change over because the are using flags as lock and using other memory orders should be faster on Arm.

