On 6/16/26 5:28 AM, Alexei Starovoitov wrote:
> On Mon Jun 15, 2026 at 4:05 AM PDT, Harry Yoo (Oracle) wrote:
>> Not the best time to post a series, but didn't want to delay posting
>> the series for too long. no pressures ;)  This is aimed to be queued
>> for review and testing after the merge window closes.
>>
>> This series is based on next-20260612, and is also available on
>> git.kernel.org [3].
>>
>> To RCU folks: It would be great if you could kindly take a quick look at
>> patch 4 and either ack or nack the patch ;)
>>
>> To BPF folks: Ulad asked to share workloads to measure performance
>> of kfree_rcu_nolock(). Unfortunately, I focused more on correctness
>> and have not spent much effort on that. It would be nice if BPF folks
>> could help evaluate it on their relevant workloads.
> 
> kfree_rcu_nolock() needs to replace bpf_mem_alloc which is backbone
> of bpf maps and bpf local storage.

> So all of the selftests/bpf/benchs/run_bench_*.sh

Didn't notice it was a thing, thanks!

> will exercise it one way or the other the replacement is complete.
> In other words performance is absolutely critical.
>
>> To PREEMPT_RT folks: The most relevant part is allowing
>> kfree_rcu_sheaf() on PREEMPT_RT (patch 6). It carefully avoids sleeping
>> by acquiring the locks via local_trylock() or spin_trylock_irqsave()
>> to avoid sleeping within a raw spinlock. When trylock or unlock is
>> unsafe, kmalloc_nolock() always fails.
>>
>> Changes since RFC v2
>> ====================
>>
>> Reduced complexity and intrusiveness (Uladzislau Rezki)
>> -------------------------------------------------------
>>
>> While discussing concerns about the complexity of adding allow_spin
>> handling with Ulad (Thanks!), I realized that adding complexity to the
>> kvfree_rcu batching is not strictly necessary: only slab objects need to
>> be batched, they are already batched by rcu sheaves, and slab already
>> supports unknown context. So it is enough to implement only a minimal
>> fallback for the sheaves path.
>>
>> I tried to avoid making intrusive changes to the existing kvfree_rcu
>> path as much as possible. struct rcu_ptr is renamed to kfree_rcu_head
>> following Vlastimil's suggestion, and it is used only in the
>> kfree_rcu_nolock() path for now.
>>
>> As a result, the complexity is significantly reduced and the series
>> became much less intrusive. This is also reflected well in the diffstat
>> below.
> 
> Overall looks good to me.

Thanks!

> btw sashiko was confused in few cases.
> Not everything that it flags needs a fix. Sometimes it's not an issue at all.
> It only sounds like one.

Right, most of the comments are false positives.
But there is one comment that looks like a real bug...

Sashiko wrote:
> This is a pre-existing issue, but can concurrent lockless calls to
> deferred_work_barrier() cause an rcuwait race on PREEMPT_RT, leading to
> permanent task hangs?
>
> The function iterates over all CPUs, invoking irq_work_sync() on each
> CPU's deferred work object. On PREEMPT_RT, irq_work_sync() relies on
> rcuwait_wait_event() to block until completion,

Yes. If CONFIG_PREEPMT_RT is enabled and irq_work doesn't have
IRQ_WORK_HARD_IRQ, irq_work_sync() calls rcuwait_wait_event().

> and the rcuwait
> synchronization primitive strictly allows only one waiter at a time.

Hmm yeah, include/linux/rcuwait.h says:
| The caller is responsible for locking around rcuwait_wait_event(),
| and [prepare_to/finish]_rcuwait() such that writes to @task are
| properly serialized.

> Because deferred_work_barrier() is called without any global
> serialization (for instance, in kmem_cache_destroy() and
> kvfree_rcu_barrier_on_cache(), and now in flush_all_rcu_sheaves()),

Yes, without slab_mutex held.

> multiple threads can enter irq_work_sync() for the same work object
> concurrently.
> 
> This overwrites the waiter task pointer, meaning only one task will be
> woken up when the work completes, leaving the other tasks hanging
> permanently in an uninterruptible sleep.

I think this is indeed a pre-existing issue that needs to be resolved.
Looks like we now have one more bug to fix ;-)

-- 
Cheers,
Harry / Hyeonggon

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to