Now I'm investigating and fixing pre-existing two kmalloc_nolock() bugs
and (hopefully) planning to post the fixes later this week.

I will later rebase the this series onto Vlastimil's slab_alloc_flags v3
and the kmalloc_nolock() fixes, and address human and sashiko's
comments.

But this should be enough for review.

Thanks!

On 6/15/26 8:05 PM, Harry Yoo (Oracle) wrote:
> Not the best time to post a series, but didn't want to delay posting
> the series for too long. no pressures ;)  This is aimed to be queued
> for review and testing after the merge window closes.
> 
> This series is based on next-20260612, and is also available on
> git.kernel.org [3].
> 
> To RCU folks: It would be great if you could kindly take a quick look at
> patch 4 and either ack or nack the patch ;)
> 
> To BPF folks: Ulad asked to share workloads to measure performance
> of kfree_rcu_nolock(). Unfortunately, I focused more on correctness
> and have not spent much effort on that. It would be nice if BPF folks
> could help evaluate it on their relevant workloads.
> 
> To PREEMPT_RT folks: The most relevant part is allowing
> kfree_rcu_sheaf() on PREEMPT_RT (patch 6). It carefully avoids sleeping
> by acquiring the locks via local_trylock() or spin_trylock_irqsave()
> to avoid sleeping within a raw spinlock. When trylock or unlock is
> unsafe, kmalloc_nolock() always fails.
> 
> Changes since RFC v2
> ====================
> 
> Reduced complexity and intrusiveness (Uladzislau Rezki)
> -------------------------------------------------------
> 
> While discussing concerns about the complexity of adding allow_spin
> handling with Ulad (Thanks!), I realized that adding complexity to the
> kvfree_rcu batching is not strictly necessary: only slab objects need to
> be batched, they are already batched by rcu sheaves, and slab already
> supports unknown context. So it is enough to implement only a minimal
> fallback for the sheaves path.
> 
> I tried to avoid making intrusive changes to the existing kvfree_rcu
> path as much as possible. struct rcu_ptr is renamed to kfree_rcu_head
> following Vlastimil's suggestion, and it is used only in the
> kfree_rcu_nolock() path for now.
> 
> As a result, the complexity is significantly reduced and the series
> became much less intrusive. This is also reflected well in the diffstat
> below.
> 
> RFC v2 diffstat:
>   8 files changed, 514 insertions(+), 163 deletions(-)
> 
> v3 diffstat:
>   6 files changed, 370 insertions(+), 105 deletions(-)
> 
> v3 diffstat (slub_kunit improvements - patch 1, 2, 9 excluded):
>   5 files changed, 199 insertions(+), 66 deletions(-)
> 
> kfree_rcu_sheaf() PREEMPT_RT support (Vlastimil Babka)
> ------------------------------------------------------
> 
> As suggested by Vlastimil (Thanks!), kfree_rcu_sheaf() can now be used
> on PREEMPT_RT as well, by always assuming allow_spin is false on
> PREEMPT_RT.
> 
> slub_kunit enhancements
> -----------------------
> 
> - Currently the test is skipped when there is no hardware PMU. This can
>   happen on machines without a PMU, or in virtualized environments
>   (e.g., automated testing or virtme). Implement a fallback based on SW
>   perf events so that the test can still run in such environments, even
>   though the coverage is slightly smaller.
> 
> - While testing on PREEMPT_RT, I found that kmalloc_nolock() fails every
>   time, so the fallback path is not properly tested. This is a limitation
>   of perf events: the handler is called in NMI (HW perf events) or
>   interrupt context (SW perf events), where kmalloc_nolock() cannot
>   succeed.
> 
>   slub_kunit now registers a kprobe pre-handler at the points in the slab
>   allocator where lockdep_assert_held() is invoked. The pre-handler calls
>   kmalloc_nolock() and friends, to improve coverage on PREEMPT_RT instead
>   of relying on perf events.
> 
> One thing that needs to be further explored
> -------------------------------------------
> 
> The global deferred_free_by_rcu (introduced by patch 8) list for the
> fallback should probably be per-CPU [5].
> 
> Actual Cover Letter
> ===================
> 
> This series improves kmalloc_nolock() and kfree_nolock() coverage
> in slub_kunit (patch 1 and 2) and introduces kfree_rcu_nolock() for
> an unknown context as suggested by Alexei Starovoitov.
> 
> Unknown context means the caller does not know whether spinning on a lock
> is safe (e.g., a BPF program attached to an arbitrary kernel function or
> in NMI context).
> 
> The slab allocator already supports unknown context via kmalloc_nolock()
> and kfree_nolock(), but te slab allocator does not support freeing
> objects by RCU in unknown context.
> 
> It is not ideal to have completely separate batching for unknown context
> because the worst scenario where spinning on a lock would lead to
> deadlock is very rare, and in most cases, it is safe to use the
> existing mechanism (kfree_rcu_sheaf()).
> 
> Since most part of the slab allocator already supports unknown context
> and sheaves support batching kvfree_rcu() calls for slab objects,
> implement kfree_rcu_nolock() with minimal changes by teaching
> kfree_rcu_sheaf() how to support unknown context and making
> it a little bit harder to allocate an empty sheaf, instead of making
> intrusive changes to the existing kvfree_rcu batching logic.
> 
> kfree_rcu_nolock() tries to free the object to the rcu sheaf if
> trylock succeeds. Once the rcu sheaf becomes full, it is submitted to
> RCU via call_rcu() if spinning is allowed or IRQs are enabled (to avoid
> calling call_rcu() in the middle of call_rcu()). Otherwise, call_rcu()
> is deferred via irq work.
> 
> In unknown context, when there is no sheaf available, kfree_rcu_sheaf()
> falls back to defer_kfree_rcu(), which inserts the object to a global
> lockless list [5] and those objects are freed after synchronize_rcu() in
> a workqueue.
> 
> Unlike kfree_rcu(), only the 2-argument variant is supported.
> This is because the last resort of the 1-arg variant is
> synchronize_rcu(), which cannot be used in an unknown context.
> 
> As suggested by Alexei Starovoitov, kfree_rcu_nolock() can be used with
> struct kfree_rcu_head (8 bytes), which is smaller than struct rcu_head
> (16 bytes).
> 
> For more background and future plans, please see [4].
> 
> [1] RFC v1: 
> https://lore.kernel.org/linux-mm/[email protected]
> 
> [2] RFC v2: 
> https://lore.kernel.org/linux-mm/[email protected]
> 
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=kfree_rcu_nolock-v3r3
> 
> [4] kmalloc_nolock() follow-ups, including kfree_rcu_nolock(),
>     
> https://lore.kernel.org/linux-mm/esepccfhqg7m6jo76ns2znj2cnuaepx2xvw5zaygtwohq4psma@563ypprp6rr3
> 
> [5] However, we should probably make the list percpu because,
>     unlike RFC v2, it can be triggered more frequently under memory
>     pressure.
> 
>     
> https://lore.kernel.org/linux-mm/805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop
> 
> Signed-off-by: Harry Yoo (Oracle) <[email protected]>
> ---
> Harry Yoo (Oracle) (9):
>       slub_kunit: fall back to SW perf events when HW PMU is not available
>       mm/slab, slub_kunit: register kprobe to trigger _nolock APIs
>       mm/slab: handle the !allow_spin case in kfree_rcu_sheaf()
>       mm/slab: use call_rcu() in unknown context if irqs are enabled
>       mm/slab: extend deferred free mechanism to handle rcu sheaves
>       mm/slab: allow kfree_rcu_sheaf() on PREEMPT_RT
>       mm/slab: introduce kfree_rcu_nolock()
>       mm/slab: introduce struct kfree_rcu_head and use in kfree_rcu_nolock()
>       slub_kunit: extend the test for kfree_rcu_nolock()
> 
>  include/linux/rcupdate.h |  12 +++
>  include/linux/types.h    |   4 +
>  lib/tests/slub_kunit.c   | 174 ++++++++++++++++++++++++++++------
>  mm/slab.h                |   5 +-
>  mm/slab_common.c         |  38 ++++++--
>  mm/slub.c                | 242 
> ++++++++++++++++++++++++++++++++++-------------
>  6 files changed, 370 insertions(+), 105 deletions(-)
> ---
> base-commit: c425609d6ac4012c8bbf01ec2e10e801b1923a7b
> change-id: 20260615-kfree_rcu_nolock-e5502555992f
> 
> Best regards,

-- 
Cheers,
Harry / Hyeonggon

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to