On Thu, Apr 17, 2025 at 3:29 PM Bui Quang Minh <minhquangbu...@gmail.com> wrote: > > When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call > napi_disable() on the receive queue's napi. In delayed refill_work, it > also calls napi_disable() on the receive queue's napi. When > napi_disable() is called on an already disabled napi, it will sleep in > napi_disable_locked while still holding the netdev_lock. As a result, > later napi_enable gets stuck too as it cannot acquire the netdev_lock. > This leads to refill_work and the pause-then-resume tx are stuck > altogether. > > This scenario can be reproducible by binding a XDP socket to virtio-net > interface without setting up the fill ring. As a result, try_fill_recv > will fail until the fill ring is set up and refill_work is scheduled. > > This commit adds virtnet_rx_(pause/resume)_all helpers and fixes up the > virtnet_rx_resume to disable future and cancel all inflights delayed > refill_work before calling napi_disable() to pause the rx. > > Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") > Acked-by: Michael S. Tsirkin <m...@redhat.com> > Signed-off-by: Bui Quang Minh <minhquangbu...@gmail.com>
Acked-by: Jason Wang <jasow...@redhat.com> (In the future, we may consider switch to per virtqueue refill work instead) Thanks