Now I'm investigating and fixing pre-existing two kmalloc_nolock() bugs and (hopefully) planning to post the fixes later this week.
I will later rebase the this series onto Vlastimil's slab_alloc_flags v3 and the kmalloc_nolock() fixes, and address human and sashiko's comments. But this should be enough for review. Thanks! On 6/15/26 8:05 PM, Harry Yoo (Oracle) wrote: > Not the best time to post a series, but didn't want to delay posting > the series for too long. no pressures ;) This is aimed to be queued > for review and testing after the merge window closes. > > This series is based on next-20260612, and is also available on > git.kernel.org [3]. > > To RCU folks: It would be great if you could kindly take a quick look at > patch 4 and either ack or nack the patch ;) > > To BPF folks: Ulad asked to share workloads to measure performance > of kfree_rcu_nolock(). Unfortunately, I focused more on correctness > and have not spent much effort on that. It would be nice if BPF folks > could help evaluate it on their relevant workloads. > > To PREEMPT_RT folks: The most relevant part is allowing > kfree_rcu_sheaf() on PREEMPT_RT (patch 6). It carefully avoids sleeping > by acquiring the locks via local_trylock() or spin_trylock_irqsave() > to avoid sleeping within a raw spinlock. When trylock or unlock is > unsafe, kmalloc_nolock() always fails. > > Changes since RFC v2 > ==================== > > Reduced complexity and intrusiveness (Uladzislau Rezki) > ------------------------------------------------------- > > While discussing concerns about the complexity of adding allow_spin > handling with Ulad (Thanks!), I realized that adding complexity to the > kvfree_rcu batching is not strictly necessary: only slab objects need to > be batched, they are already batched by rcu sheaves, and slab already > supports unknown context. So it is enough to implement only a minimal > fallback for the sheaves path. > > I tried to avoid making intrusive changes to the existing kvfree_rcu > path as much as possible. struct rcu_ptr is renamed to kfree_rcu_head > following Vlastimil's suggestion, and it is used only in the > kfree_rcu_nolock() path for now. > > As a result, the complexity is significantly reduced and the series > became much less intrusive. This is also reflected well in the diffstat > below. > > RFC v2 diffstat: > 8 files changed, 514 insertions(+), 163 deletions(-) > > v3 diffstat: > 6 files changed, 370 insertions(+), 105 deletions(-) > > v3 diffstat (slub_kunit improvements - patch 1, 2, 9 excluded): > 5 files changed, 199 insertions(+), 66 deletions(-) > > kfree_rcu_sheaf() PREEMPT_RT support (Vlastimil Babka) > ------------------------------------------------------ > > As suggested by Vlastimil (Thanks!), kfree_rcu_sheaf() can now be used > on PREEMPT_RT as well, by always assuming allow_spin is false on > PREEMPT_RT. > > slub_kunit enhancements > ----------------------- > > - Currently the test is skipped when there is no hardware PMU. This can > happen on machines without a PMU, or in virtualized environments > (e.g., automated testing or virtme). Implement a fallback based on SW > perf events so that the test can still run in such environments, even > though the coverage is slightly smaller. > > - While testing on PREEMPT_RT, I found that kmalloc_nolock() fails every > time, so the fallback path is not properly tested. This is a limitation > of perf events: the handler is called in NMI (HW perf events) or > interrupt context (SW perf events), where kmalloc_nolock() cannot > succeed. > > slub_kunit now registers a kprobe pre-handler at the points in the slab > allocator where lockdep_assert_held() is invoked. The pre-handler calls > kmalloc_nolock() and friends, to improve coverage on PREEMPT_RT instead > of relying on perf events. > > One thing that needs to be further explored > ------------------------------------------- > > The global deferred_free_by_rcu (introduced by patch 8) list for the > fallback should probably be per-CPU [5]. > > Actual Cover Letter > =================== > > This series improves kmalloc_nolock() and kfree_nolock() coverage > in slub_kunit (patch 1 and 2) and introduces kfree_rcu_nolock() for > an unknown context as suggested by Alexei Starovoitov. > > Unknown context means the caller does not know whether spinning on a lock > is safe (e.g., a BPF program attached to an arbitrary kernel function or > in NMI context). > > The slab allocator already supports unknown context via kmalloc_nolock() > and kfree_nolock(), but te slab allocator does not support freeing > objects by RCU in unknown context. > > It is not ideal to have completely separate batching for unknown context > because the worst scenario where spinning on a lock would lead to > deadlock is very rare, and in most cases, it is safe to use the > existing mechanism (kfree_rcu_sheaf()). > > Since most part of the slab allocator already supports unknown context > and sheaves support batching kvfree_rcu() calls for slab objects, > implement kfree_rcu_nolock() with minimal changes by teaching > kfree_rcu_sheaf() how to support unknown context and making > it a little bit harder to allocate an empty sheaf, instead of making > intrusive changes to the existing kvfree_rcu batching logic. > > kfree_rcu_nolock() tries to free the object to the rcu sheaf if > trylock succeeds. Once the rcu sheaf becomes full, it is submitted to > RCU via call_rcu() if spinning is allowed or IRQs are enabled (to avoid > calling call_rcu() in the middle of call_rcu()). Otherwise, call_rcu() > is deferred via irq work. > > In unknown context, when there is no sheaf available, kfree_rcu_sheaf() > falls back to defer_kfree_rcu(), which inserts the object to a global > lockless list [5] and those objects are freed after synchronize_rcu() in > a workqueue. > > Unlike kfree_rcu(), only the 2-argument variant is supported. > This is because the last resort of the 1-arg variant is > synchronize_rcu(), which cannot be used in an unknown context. > > As suggested by Alexei Starovoitov, kfree_rcu_nolock() can be used with > struct kfree_rcu_head (8 bytes), which is smaller than struct rcu_head > (16 bytes). > > For more background and future plans, please see [4]. > > [1] RFC v1: > https://lore.kernel.org/linux-mm/[email protected] > > [2] RFC v2: > https://lore.kernel.org/linux-mm/[email protected] > > [3] > https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=kfree_rcu_nolock-v3r3 > > [4] kmalloc_nolock() follow-ups, including kfree_rcu_nolock(), > > https://lore.kernel.org/linux-mm/esepccfhqg7m6jo76ns2znj2cnuaepx2xvw5zaygtwohq4psma@563ypprp6rr3 > > [5] However, we should probably make the list percpu because, > unlike RFC v2, it can be triggered more frequently under memory > pressure. > > > https://lore.kernel.org/linux-mm/805c33d7-3a7b-470c-bd9d-065717a3e3e2@paulmck-laptop > > Signed-off-by: Harry Yoo (Oracle) <[email protected]> > --- > Harry Yoo (Oracle) (9): > slub_kunit: fall back to SW perf events when HW PMU is not available > mm/slab, slub_kunit: register kprobe to trigger _nolock APIs > mm/slab: handle the !allow_spin case in kfree_rcu_sheaf() > mm/slab: use call_rcu() in unknown context if irqs are enabled > mm/slab: extend deferred free mechanism to handle rcu sheaves > mm/slab: allow kfree_rcu_sheaf() on PREEMPT_RT > mm/slab: introduce kfree_rcu_nolock() > mm/slab: introduce struct kfree_rcu_head and use in kfree_rcu_nolock() > slub_kunit: extend the test for kfree_rcu_nolock() > > include/linux/rcupdate.h | 12 +++ > include/linux/types.h | 4 + > lib/tests/slub_kunit.c | 174 ++++++++++++++++++++++++++++------ > mm/slab.h | 5 +- > mm/slab_common.c | 38 ++++++-- > mm/slub.c | 242 > ++++++++++++++++++++++++++++++++++------------- > 6 files changed, 370 insertions(+), 105 deletions(-) > --- > base-commit: c425609d6ac4012c8bbf01ec2e10e801b1923a7b > change-id: 20260615-kfree_rcu_nolock-e5502555992f > > Best regards, -- Cheers, Harry / Hyeonggon
OpenPGP_signature.asc
Description: OpenPGP digital signature

