On 20/1/26 10:01, Alexei Starovoitov wrote:
> On Mon, Jan 19, 2026 at 5:57 PM Leon Hwang <[email protected]> wrote:
>>
>>
>>
>> On 20/1/26 02:46, [email protected] wrote:
>>>> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
>>>> index c091f3232cc5..03d37f72731a 100644
>>>> --- a/kernel/bpf/bpf_lru_list.c
>>>> +++ b/kernel/bpf/bpf_lru_list.c
>>>
>>> [ ... ]
>>>
>>>> @@ -451,13 +455,12 @@ static struct bpf_lru_node
>>>> *bpf_common_lru_pop_free(struct bpf_lru *lru,
>>>>
>>>> loc_l = per_cpu_ptr(clru->local_list, cpu);
>>>>
>>>> - raw_spin_lock_irqsave(&loc_l->lock, flags);
>>>> + if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
>>>> + return NULL;
>>>>
>>>> node = __local_list_pop_free(loc_l);
>>>> - if (!node) {
>>>> - bpf_lru_list_pop_free_to_local(lru, loc_l);
>>>> + if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
>>>> node = __local_list_pop_free(loc_l);
>>>> - }
>>>>
>>>> if (node)
>>>> __local_list_add_pending(lru, loc_l, cpu, node, hash);
>>>
>>> The trylock conversion here looks correct, but what about the steal path
>>> that follows later in bpf_common_lru_pop_free()? Looking at the existing
>>> code (not shown in this diff), the steal loop still uses unconditional
>>> raw_spin_lock_irqsave():
>>>
>>> do {
>>> steal_loc_l = per_cpu_ptr(clru->local_list, steal);
>>>
>>> raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
>>> ...
>>> } while (...);
>>>
>>> If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
>>> can the same deadlock scenario occur when NMI interrupts during the steal
>>> loop and the NMI handler tries to acquire the same steal_loc_l->lock?
>>>
>>> Similarly, after a successful steal, there is another unconditional lock:
>>>
>>> if (node) {
>>> raw_spin_lock_irqsave(&loc_l->lock, flags);
>>> __local_list_add_pending(lru, loc_l, cpu, node, hash);
>>> raw_spin_unlock_irqrestore(&loc_l->lock, flags);
>>> }
>>>
>>> Should these also use trylock to maintain consistency with the stated goal
>>> of avoiding NMI-context deadlocks?
>>>
>>
>> This patch is not intended to eliminate all possible deadlock scenarios.
>> Its goal is to avoid deadlocks caused by long-lived critical sections
>> in the free-node pop paths, where lock contention can persist and lead
>> to re-entrant lock acquisition from NMI context.
>>
>> The steal path and the post-steal update are both short-lived critical
>> sections. They do not exhibit the same contention characteristics and
>> have not been observed to trigger the reported deadlock scenarios.
>> Converting these paths to trylock would add complexity without clear
>> benefit, and is therefore unnecessary for the stated goal of this change.
>
> AI is correct. Either everything needs to be converted or none.
> Adding trylock in a few places because syzbot found them is not fixing
> anything.
> Just silencing one (or a few?) syzbot reports.
> As I said in the other email, trylock is not an option.
> rqspinlock is the only true way of addressing potential deadlocks.
> If it's too hard, then leave it as-is. Do not hack things half way.
Understood.
Leave it as-is.
Thanks,
Leon