On Mon, Jan 19, 2026 at 5:50 PM Leon Hwang <[email protected]> wrote:
>
>
>
> On 20/1/26 03:47, Daniel Borkmann wrote:
> > On 1/19/26 3:21 PM, Leon Hwang wrote:
> >> Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
> >> on contended LRU locks.
> >>
> >> If the global or per-CPU LRU lock is unavailable, refuse to refill the
> >> local free list and return NULL instead. This allows callers to back
> >> off safely rather than blocking or re-entering the same lock context.
> >>
> >> This change avoids lockdep warnings and potential deadlocks caused by
> >> re-entrant LRU lock acquisition from NMI context, as shown below:
> >>
> >> [  418.260323] bpf_testmod: oh no, recursing into test_1,
> >> recursion_misses 1
> >> [  424.982207] ================================
> >> [  424.982216] WARNING: inconsistent lock state
> >> [  424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> >> [  424.982314]  *** DEADLOCK ***
> >> [...]
> >>
> >> Signed-off-by: Leon Hwang <[email protected]>
> >> ---
> >>   kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
> >>   1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > Documentation/bpf/map_lru_hash_update.dot needs update?
> >
>
> Yes, it needs update.
>
> >> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> >> index c091f3232cc5..03d37f72731a 100644
> >> --- a/kernel/bpf/bpf_lru_list.c
> >> +++ b/kernel/bpf/bpf_lru_list.c
> >> @@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct
> >> bpf_lru_list *l,
> >>       raw_spin_unlock_irqrestore(&l->lock, flags);
> >>   }
> >>   -static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> >> +static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> >>                          struct bpf_lru_locallist *loc_l)
> >>   {
> >>       struct bpf_lru_list *l = &lru->common_lru.lru_list;
> >>       struct bpf_lru_node *node, *tmp_node;
> >>       unsigned int nfree = 0;
> >>   -    raw_spin_lock(&l->lock);
> >> +    if (!raw_spin_trylock(&l->lock))
> >> +        return false;
> >>
> >
> > Could you provide some more analysis, and the effect this has on real-world
> > programs? Presumably they'll unexpectedly encounter a lot more frequent
> > -ENOMEM as an error on bpf_map_update_elem even though memory might be
> > available just that locks are contended?
> >
> > Also, have you considered rqspinlock as a potential candidate to discover
> > deadlocks?
>
> Thanks for the questions.
>
> While I haven’t encountered this issue in production systems myself, the
> deadlock has been observed repeatedly in practice, including the cases
> shown in the cover letter. It can also be reproduced reliably when
> running the LRU tests locally, so this is a real and recurring problem.
>
> I agree that returning -ENOMEM when locks are contended is not ideal.
> Using -EBUSY would better reflect the situation where memory is
> available but forward progress is temporarily blocked by lock
> contention. I can update the patch accordingly.
>
> Regarding rqspinlock: as mentioned in the cover letter, Menglong
> previously explored using rqspinlock to address these deadlocks but was
> unable to arrive at a complete solution. After further off-list
> discussion, we agreed that using trylock is a more practical approach
> here. In most observed cases, the lock contention leading to deadlock
> occurs in bpf_common_lru_pop_free(), and trylock allows callers to back
> off safely rather than risking re-entrancy and deadlock.

Sorry, trylock is not an option here.

We are not going to sacrifice LRU map reliability for the sake of syzbot.

Reply via email to