[RFC PATCH 0/7] k[v]free_rcu() improvements

Harry Yoo Fri, 06 Feb 2026 01:39:21 -0800

These are a few improvements for k[v]free_rcu() API, which were suggested
by Alexei Starovoitov.


[ To kmemleak folks: I'm going to teach delete_object_full() and
  paint_ptr() to ignore cases when the object does not exist.
  Could you please let me know if the way it's done in patch 3
  looks good? Only part 2 is relevant to you. ]

Although I've put some effort into providing a decent quality
implementation, I'd like you to consider this as a proof-of-concept
and let's discuss how best we could tackle those problems:

  1) Allow an 8-byte field to be used as an alternative to
     struct rcu_head (16-byte) for 2-argument kvfree_rcu()
  2) kmalloc_nolock() -> kfree[_rcu]() support
  3) Add kfree_rcu_nolock() for NMI context

# Part 1. Allow an 8-byte field to be used as an alternative to
  struct rcu_head for 2-argument kvfree_rcu()
  
  Technically, objects that are freed with k[v]free_rcu() need
  only one pointer to link objects, because we already know that
  the callback function is always kvfree(). For this purpose,
  struct rcu_head is unnecessarily large (16 bytes on 64-bit).

  Allow a smaller, 8-byte field (of struct rcu_ptr type) to be used
  with k[v]free_rcu(). Let's save one pointer per slab object.
  
  I have to admit that my naming skill isn't great; hopefully
  we'll come up with a better name than `struct rcu_ptr`.

  With this feature, either a struct rcu_ptr or rcu_head field
  can be used as the second argument of the k[v]free_rcu() API.

  Users that only use k[v]free_rcu() are highly encouraged to use
  struct rcu_ptr; otherwise you're wasting memory. However, some users,
  such as maple tree, may use call_rcu() or k[v]free_rcu() depending on
  the situation for objects of the same type. For such users,
  struct rcu_head remains the only option.

  Patch 1 implements this feature, and patch 2 adds a few users in mm/.

# Part 2. kmalloc_nolock() -> kfree() or kfree_rcu() path support
  
  Allow objects allocated with kmalloc_nolock() to be freed with
  kfree[_rcu](). Without this support, users are forced to call
  call_rcu() with kfree_nolock() to free objects after a grace period.
  This is not efficient and can create unnecessarily many grace periods
  by bypassing the kfree_rcu batching layer.

  The reason why it was not supported before was because some alloc
  hooks are not called in kmalloc_nolock(), while all free hooks are
  called in kfree().

  Patch 3 adds support for this by teaching kmemleak to ignore cases
  when free hooks are called without prior alloc hooks. Patch 4 frees
  a bit in enum objexts_flags, since we no longer have to remember
  whether the array was allocated using kmalloc_nolock() or kmalloc().

  Note that the free hooks fall into these categories:

  - Its alloc hook is called in kmalloc_nolock(), no problem!
    (kmsan_slab_alloc(), kasan_slab_alloc(),
     memcg_slab_post_alloc_hook(), alloc_tagging_slab_alloc_hook())

  - Its alloc hook isn't called in kmalloc_nolock(); free hooks
    must handle asymmetric hook calls. (kfence_free(),
    kmemleak_free_recursive())

  - There is no matching alloc hook for the free hook; it's safe to
    call. (debug_check_no_{locks,obj}_freed, __kcsan_check_access())

  Note that kmalloc() -> kfree_nolock() or kfree_rcu_nolock() isn't
  still supported! That's much trickier :)

# Part 3. Add kfree_rcu_nolock() for NMI context

  Add a new 2-argument kfree_rcu_nolock() variant that is safe to be
  called in NMI context. In NMI context, calling kfree_rcu() or
  call_rcu() is not legal, and thus users are forced to implement some
  sort of deferred freeing. Let's make users' lives easier with the new
  variant.

  Note that 1-argument kfree_rcu_nolock() is not supported, since there
  is not much we can do when trylock & memory allocation fails.
  (You can't call synchronize_rcu() in NMI context!)

  When spinning on a lock is not allowed, try to acquire the spinlock.
  When it succeeds in acquiring the lock, do either:

  1) Use the rcu sheaf to free the object. Note that call_rcu() cannot
     be called in NMI context! When the rcu sheaf becomes full by
     freeing the object, it cannot free to the sheaf and has to fall back.
  
  2) Use struct rcu_ptr field to link objects. Consuming a bnode
     (of struct kvfree_rcu_bulk_data) and queueing work to maintain
     a number of cached bnodes is avoided in NMI context.

  Note that scheduling delayed monitor work to drain objects after
  KFREE_DRAIN_JIFFIES is done using a lazy irq_work to avoid raising
  self-IPIs. That means scheduling delayed monitor work can be delayed
  up to the length of a time slice.

  In rare cases where trylock fails, a non-lazy irq_work is used to
  defer calling kvfree_rcu_call().

  When certain debug features (kmemleak, debugobjects) are enabled,
  freeing in NMI context is always deferred because they use spinlocks.

  Patch 6 implements kfree_rcu_nolock() support, patch 7 adds sheaves
  support for the new API.

Harry Yoo (7):
  mm/slab: introduce k[v]free_rcu() with struct rcu_ptr
  mm: use rcu_ptr instead of rcu_head
  mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]()
  mm/slab: free a bit in enum objexts_flags
  mm/slab: move kfree_rcu_cpu[_work] definitions
  mm/slab: introduce kfree_rcu_nolock()
  mm/slab: make kfree_rcu_nolock() work with sheaves

 include/linux/list_lru.h   |   2 +-
 include/linux/memcontrol.h |   3 +-
 include/linux/rcupdate.h   |  68 +++++---
 include/linux/shrinker.h   |   2 +-
 include/linux/types.h      |   9 ++
 mm/kmemleak.c              |  11 +-
 mm/slab.h                  |   2 +-
 mm/slab_common.c           | 309 +++++++++++++++++++++++++------------
 mm/slub.c                  |  47 ++++--
 mm/vmalloc.c               |   4 +-
 10 files changed, 310 insertions(+), 147 deletions(-)

-- 
2.43.0

[RFC PATCH 0/7] k[v]free_rcu() improvements

Reply via email to