On Wed May 13, 2026 at 3:53 PM PDT, Eduard Zingerman wrote:
> On Tue, 2026-05-12 at 06:41 +0000, [email protected] wrote:
>
> [...]
>
>> When a BPF program holds an owning or refcount-acquired reference to
>> one of these nodes (node X), which is structurally supported because
>> __bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
>> refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
>> corruption:
>> 
>> CPU 0 (bpf_list_head_free, lock released)  CPU 1 (BPF prog, refcount X)
>> -----------------------------------------   ----------------------------
>> (owner of X == NULL, X linked in drain)
>>                                             bpf_list_push_back(other, X)
>>                                               __bpf_list_add: spin_lock()
>>                                               cmpxchg(X->owner, NULL,
>>                                                       POISON) -> OK
>>                                               list_add_tail(&X->list_head,
>>                                                             other_head)
>>                                                 -> overwrites X->next,
>>                                                    X->prev, corrupts
>>                                                    other_head's chain
>>                                                    because X is still
>>                                                    stitched into drain
>> pos = drain.next;      (may be X or neighbor using X's stale next)
>> list_del_init(pos);    reads X->next/prev now pointing into other_head,
>>                        corrupts other_head's list and/or drain
>
>
> Kaitao, this scenario seem plausible, could you please comment on it?

I think bot is correct.
This patch looks buggy.
It seems to me an optimization that breaks the concurrent logic.
May be just drop this patch and reorder the other one, so that bot
sees nonown suffix logic first.


Reply via email to