Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations

Jon Hunter Thu, 27 Nov 2025 03:39:10 -0800



On 31/10/2025 21:32, Daniel Gomez wrote:



On 10/09/2025 10.01, Vlastimil Babka wrote:

Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
For caches with sheaves, on each cpu maintain a rcu_free sheaf in
addition to main and spare sheaves.

kfree_rcu() operations will try to put objects on this sheaf. Once full,
the sheaf is detached and submitted to call_rcu() with a handler that
will try to put it in the barn, or flush to slab pages using bulk free,
when the barn is full. Then a new empty sheaf must be obtained to put
more objects there.

It's possible that no free sheaves are available to use for a new
rcu_free sheaf, and the allocation in kfree_rcu() context can only use
GFP_NOWAIT and thus may fail. In that case, fall back to the existing
kfree_rcu() implementation.

Expected advantages:
- batching the kfree_rcu() operations, that could eventually replace the
   existing batching
- sheaves can be reused for allocations via barn instead of being
   flushed to slabs, which is more efficient
   - this includes cases where only some cpus are allowed to process rcu
     callbacks (Android)

Possible disadvantage:
- objects might be waiting for more than their grace period (it is
   determined by the last object freed into the sheaf), increasing memory
   usage - but the existing batching does that too.

Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
implementation favors smaller memory footprint over performance.

Also for now skip the usage of rcu sheaf for CONFIG_PREEMPT_RT as the
contexts where kfree_rcu() is called might not be compatible with taking
a barn spinlock or a GFP_NOWAIT allocation of a new sheaf taking a
spinlock - the current kfree_rcu() implementation avoids doing that.

Teach kvfree_rcu_barrier() to flush all rcu_free sheaves from all caches
that have them. This is not a cheap operation, but the barrier usage is
rare - currently kmem_cache_destroy() or on module unload.

Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
count how many kfree_rcu() used the rcu_free sheaf successfully and how
many had to fall back to the existing implementation.

Signed-off-by: Vlastimil Babka <[email protected]>


Hi Vlastimil,

This patch increases kmod selftest (stress module loader) runtime by about
~50-60%, from ~200s to ~300s total execution time. My tested kernel has
CONFIG_KVFREE_RCU_BATCHED enabled. Any idea or suggestions on what might be
causing this, or how to address it?

I have been looking into a regression for Linux v6.18-rc where timetaken to run some internal graphics tests on our Tegra234 device hasincreased from around 35% causing the tests to timeout. Bisect ispointing to this commit and I also see we have CONFIG_KVFREE_RCU_BATCHED=y.

I have not tried disabling CONFIG_KVFREE_RCU_BATCHED=y but I can. I amnot sure if there are any downsides to disabling this?


Thanks
Jon

--
nvpublic

Re: [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations

Reply via email to