On 4/29/25 09:36, Harry Yoo wrote: > On Fri, Apr 25, 2025 at 10:27:22AM +0200, Vlastimil Babka wrote: >> Extend the sheaf infrastructure for more efficient kfree_rcu() handling. >> For caches with sheaves, on each cpu maintain a rcu_free sheaf in >> addition to main and spare sheaves. >> >> kfree_rcu() operations will try to put objects on this sheaf. Once full, >> the sheaf is detached and submitted to call_rcu() with a handler that >> will try to put it in the barn, or flush to slab pages using bulk free, >> when the barn is full. Then a new empty sheaf must be obtained to put >> more objects there. >> >> It's possible that no free sheaves are available to use for a new >> rcu_free sheaf, and the allocation in kfree_rcu() context can only use >> GFP_NOWAIT and thus may fail. In that case, fall back to the existing >> kfree_rcu() implementation. >> >> Expected advantages: >> - batching the kfree_rcu() operations, that could eventually replace the >> existing batching >> - sheaves can be reused for allocations via barn instead of being >> flushed to slabs, which is more efficient >> - this includes cases where only some cpus are allowed to process rcu >> callbacks (Android) >> >> Possible disadvantage: >> - objects might be waiting for more than their grace period (it is >> determined by the last object freed into the sheaf), increasing memory >> usage - but the existing batching does that too. >> >> Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny >> implementation favors smaller memory footprint over performance. >> >> Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to >> count how many kfree_rcu() used the rcu_free sheaf successfully and how >> many had to fall back to the existing implementation. >> >> Signed-off-by: Vlastimil Babka <vba...@suse.cz> >> --- > > Looks good to me, > Reviewed-by: Harry Yoo <harry....@oracle.com>
Thanks! > >> +/* Legal flag mask for kmem_cache_create(), for various configurations */ > > nit: I think now this line should be removed? Yeah looks like rebasing mistake. Removed. >> #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ >> SLAB_CACHE_DMA32 | SLAB_PANIC | \ >> SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \ >> diff --git a/mm/slab_common.c b/mm/slab_common.c >> index >> 4f295bdd2d42355af6311a799955301005f8a532..6c3b90f03cb79b57f426824450f576a977d85c53 >> 100644 >> --- a/mm/slab_common.c >> +++ b/mm/slab_common.c >> diff --git a/mm/slub.c b/mm/slub.c >> index >> ae3e80ad9926ca15601eef2f2aa016ca059498f8..6f31a27b5d47fa6621fa8af6d6842564077d4b60 >> 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -5304,6 +5340,140 @@ bool free_to_pcs(struct kmem_cache *s, void *object) >> return true; >> } >> >> +bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj) >> +{ >> + struct slub_percpu_sheaves *pcs; >> + struct slab_sheaf *rcu_sheaf; >> + >> + if (!local_trylock(&s->cpu_sheaves->lock)) >> + goto fail; >> + >> + pcs = this_cpu_ptr(s->cpu_sheaves); >> + >> + if (unlikely(!pcs->rcu_free)) { >> + >> + struct slab_sheaf *empty; > > nit: should we grab the spare sheaf here if it's empty? Hmm yeah why not. But only completely empty. Done, thanks! >> + >> + empty = barn_get_empty_sheaf(pcs->barn); >> + >> + if (empty) { >> + pcs->rcu_free = empty; >> + goto do_free; >> + } >> + >> + local_unlock(&s->cpu_sheaves->lock); >> + >> + empty = alloc_empty_sheaf(s, GFP_NOWAIT); >> + >> + if (!empty) >> + goto fail; >> + >> /* >> * Bulk free objects to the percpu sheaves. >> * Unlike free_to_pcs() this includes the calls to all necessary hooks >