On Mon, 23 Oct 2023 11:43:04 -0400 Steven Rostedt <rost...@goodmis.org> wrote:
> On Mon, 23 Oct 2023 19:24:52 +0800 > "wuqiang.matt" <wuqiang.m...@bytedance.com> wrote: > > > The objpool_push can only happen on local cpu node, so only the local > > cpu can touch slot->tail and slot->last, which ensures the correctness > > of using cmpxchg without lock prefix (using try_cmpxchg_local instead > > of try_cmpxchg_acquire). > > > > Testing with IACA found the lock version of pop/push pair costs 16.46 > > cycles and local-push version costs 15.63 cycles. Kretprobe throughput > > is improved to 1.019 times of the lock version for x86_64 systems. > > > > OS: Debian 10 X86_64, Linux 6.6rc6 with freelist > > HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s > > > > 1T 2T 4T 8T 16T > > lock: 29909085 59865637 119692073 239750369 478005250 > > local: 30297523 60532376 121147338 242598499 484620355 > > 32T 48T 64T 96T 128T > > lock: 957553042 1435814086 1680872925 2043126796 2165424198 > > local: 968526317 1454991286 1861053557 2059530343 2171732306 > > > > Signed-off-by: wuqiang.matt <wuqiang.m...@bytedance.com> > > --- > > lib/objpool.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/objpool.c b/lib/objpool.c > > index ce0087f64400..a032701beccb 100644 > > --- a/lib/objpool.c > > +++ b/lib/objpool.c > > @@ -166,7 +166,7 @@ objpool_try_add_slot(void *obj, struct objpool_head > > *pool, int cpu) > > head = READ_ONCE(slot->head); > > /* fault caught: something must be wrong */ > > WARN_ON_ONCE(tail - head > pool->nr_objs); > > - } while (!try_cmpxchg_acquire(&slot->tail, &tail, tail + 1)); > > + } while (!try_cmpxchg_local(&slot->tail, &tail, tail + 1)); > > > > /* now the tail position is reserved for the given obj */ > > WRITE_ONCE(slot->entries[tail & slot->mask], obj); > > I'm good with the change, but I don't like how "cpu" is passed to this > function. It currently is only used in one location, which does: > > rc = objpool_try_add_slot(obj, pool, raw_smp_processor_id()); > > Which makes this change fine. But there's nothing here to prevent someone > for some reason passing another CPU to that function. > > If we are to make that change, I would be much more comfortable with > removing "int cpu" as a parameter to objpool_try_add_slot() and adding: > > int cpu = raw_smp_processor_id(); > > Which now shows that this function *only* deals with the current CPU. Oh indeed. It used to search all CPUs to push the object, but I asked him to stop that because there should be enough space to push it in the local ring. This is a remnant of that time. Wuqiang, can you make another patch to fix it? Thank you, > > -- Steve -- Masami Hiramatsu (Google) <mhira...@kernel.org>