> From: Konstantin Ananyev [mailto:[email protected]] > Sent: Thursday, 23 April 2026 11.16 > > SORING internal finalize() function is MT-safe and can be called from > multiple places: from it's own stage release(), also from 'acquire()' > for next stage or even from consumer's 'dequeue(). > But calling finalize() from not its own stage release() function > creates extra contention and might slow-down ring operations, > especially > for the cases when we have multiple threads doing acquire/release > for the same stage. > We can't compeletely avoid calling finalize() from all these multiple > places, as it can in some rare cases break soring behavior. > But we can make release() for given stage to invoke it always. > That increases number of 'finalize()' operations done from 'release()' > for current stage, and helps to minimize number of finalize() calls > from > other stages, which in turn, help to reduce the contention. > According to the soring_stress_autotest, for multiple workers (8+) > it reduces number of cycles spent by 1.5x-1.8x factor. > For l3fwd-like workload it improves things by ~20%. > For small number of workers, I didn't observe any serious change. > Note that it doesn't introduce any changes in functionality provided. > > Signed-off-by: Konstantin Ananyev <[email protected]>
Good idea. Not peeking into the tail to determine if finalize() might be omitted also makes release() cleaner. Removing an optimization that was not really an optimization. :-) Acked-by: Morten Brørup <[email protected]> > --- > lib/ring/soring.c | 33 +++++++++++++++------------------ > 1 file changed, 15 insertions(+), 18 deletions(-) > > diff --git a/lib/ring/soring.c b/lib/ring/soring.c > index 3b90521bdb..4bc2321fb5 100644 > --- a/lib/ring/soring.c > +++ b/lib/ring/soring.c > @@ -37,24 +37,24 @@ > * plus current stage index). > * 'release()' extracts old head value from provided ftoken and checks > that > * corresponding 'state[]' contains expected values(mostly for sanity > - * purposes). > - * Then it marks this state[] with 'SORING_ST_FINISH' flag to indicate > - * that given subset of objects was released. > - * After that, it checks does old head value equals to current tail > value? > - * If yes, then it performs 'finalize()' operation, otherwise > 'release()' > - * just returns (without spinning on stage tail value). > - * As updated state[] is shared by all threads, some other thread can > do > - * 'finalize()' for given stage. > - * That allows 'release()' to avoid excessive waits on the tail value. > + * purposes). Then it marks this state[] with 'SORING_ST_FINISH' flag > to > + * indicate that given subset of objects was released. > + * After that, it calls 'finalize()'. > * Main purpose of 'finalize()' operation is to walk through 'state[]' > * from current stage tail up to its head, check state[] and move > stage tail > * through elements that already are in SORING_ST_FINISH state. > * Along with that, corresponding state[] values are reset to zero. > - * Note that 'finalize()' for given stage can be done from multiple > places: > + * Note that updated state[] is shared by all threads, so > + * 'finalize()' for given stage can be done from multiple places: > * 'release()' for that stage or from 'acquire()' for next stage > * even from consumer's 'dequeue()' - in case given stage is the last > one. > * So 'finalize()' has to be MT-safe and inside it we have to > - * guarantee that only one thread will update state[] and stage's tail > values. > + * guarantee that only one thread at a time will update state[] and > + * stage's tail values (sort of critical-section). > + * When multiple threads trying to do finalize() for the same stage, > + * simultaneously one thread will win the race and do all the pending > + * updates, while others will simply return (kind of try-lock > scenario). > + * That allows 'release()' to avoid excessive waits on the tail value. > */ > > #include "soring.h" > @@ -442,7 +442,7 @@ static __rte_always_inline void > soring_release(struct rte_soring *r, const void *objs, > const void *meta, uint32_t stage, uint32_t n, uint32_t ftoken) > { > - uint32_t idx, pos, tail; > + uint32_t idx, pos; > struct soring_stage *stg; > union soring_state st; > > @@ -479,12 +479,9 @@ soring_release(struct rte_soring *r, const void > *objs, > rte_atomic_store_explicit(&r->state[idx].raw, st.raw, > rte_memory_order_relaxed); > > - /* try to do finalize(), if appropriate */ > - tail = rte_atomic_load_explicit(&stg->sht.tail.pos, > - rte_memory_order_relaxed); > - if (tail == pos) > - __rte_soring_stage_finalize(&stg->sht, stage, r->state, r- > >mask, > - r->capacity); > + /* now, try to do finalize() */ > + __rte_soring_stage_finalize(&stg->sht, stage, r->state, r->mask, > + r->capacity); > } > > /* > -- > 2.51.0

