On Tue, 17 Sep 2024 13:09:41 +0100 Konstantin Ananyev <konstantin.v.anan...@yandex.ru> wrote:
> From: Konstantin Ananyev <konstantin.anan...@huawei.com> > > v2 -> v3: > - fix compilation/doxygen complains (attempt #2) > - updated release notes > > v2 -> v3: > - fix compilation/doxygen complains > - dropped patch: > "examples/l3fwd: make ACL work in pipeline and eventdev modes": [2] > As was mentioned in the patch desctiption it was way too big, > controversial and incomplete. If the community is ok to introduce > pipeline model into the l3fwd, then it is propbably worth to be > a separate patch series. > > v1 -> v2: > - rename 'elmst/objst' to 'meta' (Morten) > - introduce new data-path APIs set: one with both meta{} and objs[], > second with just objs[] (Morten) > - split data-path APIs into burst/bulk flavours (same as rte_ring) > - added dump function for te_soring and improved dump() for rte_ring. > - dropped patch: > " ring: minimize reads of the counterpart cache-line" > - no performance gain observed > - actually it does change behavior of conventional rte_ring > enqueue/dequeue APIs - > it could return available/free less then actually exist in the ring. > As in some other libs we reliy on that information - it will > introduce problems. > > The main aim of these series is to extend ring library with > new API that allows user to create/use Staged-Ordered-Ring (SORING) > abstraction. In addition to that there are few other patches that serve > different purposes: > - first two patches are just code reordering to de-duplicate > and generalize existing rte_ring code. > - patch #3 extends rte_ring_dump() to correctly print head/tail metadata > for different sync modes. > - next two patches introduce SORING API into the ring library and > provide UT for it. > > SORING overview > =============== > Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues > with multiple processing 'stages'. It is based on conventional DPDK > rte_ring, re-uses many of its concepts, and even substantial part of > its code. > It can be viewed as an 'extension' of rte_ring functionality. > In particular, main SORING properties: > - circular ring buffer with fixed size objects > - producer, consumer plus multiple processing stages in between. > - allows to split objects processing into multiple stages. > - objects remain in the same ring while moving from one stage to the other, > initial order is preserved, no extra copying needed. > - preserves the ingress order of objects within the queue across multiple > stages > - each stage (and producer/consumer) can be served by single and/or > multiple threads. > > - number of stages, size and number of objects in the ring are > configurable at ring initialization time. > > Data-path API provides four main operations: > - enqueue/dequeue works in the same manner as for conventional rte_ring, > all rte_ring synchronization types are supported. > - acquire/release - for each stage there is an acquire (start) and > release (finish) operation. After some objects are 'acquired' - > given thread can safely assume that it has exclusive ownership of > these objects till it will invoke 'release' for them. > After 'release', objects can be 'acquired' by next stage and/or dequeued > by the consumer (in case of last stage). > > Expected use-case: applications that uses pipeline model > (probably with multiple stages) for packet processing, when preserving > incoming packet order is important. > > The concept of ‘ring with stages’ is similar to DPDK OPDL eventdev PMD [1], > but the internals are different. > In particular, SORING maintains internal array of 'states' for each element > in the ring that is shared by all threads/processes that access the ring. > That allows 'release' to avoid excessive waits on the tail value and helps > to improve performancei and scalability. > In terms of performance, with our measurements rte_soring and > conventional rte_ring provide nearly identical numbers. > As an example, on our SUT: Intel ICX CPU @ 2.00GHz, > l3fwd (--lookup=acl) in pipeline mode [2] both > rte_ring and rte_soring reach ~20Mpps for single I/O lcore and same > number of worker lcores. > > [1] > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/DPDK-China2017-Ma-OPDL.pdf > [2] > https://patchwork.dpdk.org/project/dpdk/patch/20240906131348.804-7-konstantin.v.anan...@yandex.ru/ > > Eimear Morrissey (1): > ring: make dump function more verbose > > Konstantin Ananyev (4): > ring: common functions for 'move head' ops > ring: make copying functions generic > ring/soring: introduce Staged Ordered Ring > app/test: add unit tests for soring API > > .mailmap | 1 + > app/test/meson.build | 3 + > app/test/test_ring_stress_impl.h | 1 + > app/test/test_soring.c | 442 +++++++++++++ > app/test/test_soring_mt_stress.c | 40 ++ > app/test/test_soring_stress.c | 48 ++ > app/test/test_soring_stress.h | 35 ++ > app/test/test_soring_stress_impl.h | 827 +++++++++++++++++++++++++ > doc/guides/rel_notes/release_24_11.rst | 8 + > lib/ring/meson.build | 4 +- > lib/ring/rte_ring.c | 87 ++- > lib/ring/rte_ring.h | 15 + > lib/ring/rte_ring_c11_pvt.h | 134 +--- > lib/ring/rte_ring_elem_pvt.h | 181 ++++-- > lib/ring/rte_ring_generic_pvt.h | 121 +--- > lib/ring/rte_ring_hts_elem_pvt.h | 85 +-- > lib/ring/rte_ring_rts_elem_pvt.h | 85 +-- > lib/ring/rte_soring.c | 182 ++++++ > lib/ring/rte_soring.h | 543 ++++++++++++++++ > lib/ring/soring.c | 548 ++++++++++++++++ > lib/ring/soring.h | 124 ++++ > lib/ring/version.map | 26 + > 22 files changed, 3144 insertions(+), 396 deletions(-) > create mode 100644 app/test/test_soring.c > create mode 100644 app/test/test_soring_mt_stress.c > create mode 100644 app/test/test_soring_stress.c > create mode 100644 app/test/test_soring_stress.h > create mode 100644 app/test/test_soring_stress_impl.h > create mode 100644 lib/ring/rte_soring.c > create mode 100644 lib/ring/rte_soring.h > create mode 100644 lib/ring/soring.c > create mode 100644 lib/ring/soring.h > Makes sense, fix the review comments please. Also, to keep checkpatch spell checker from generating lots of false positives; recommend updating build-dict.sh to elide soring from the generated dictionary.