On Thu, Apr 17, 2025 at 3:29 PM Bui Quang Minh <minhquangbu...@gmail.com> wrote:
>
> When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call
> napi_disable() on the receive queue's napi. In delayed refill_work, it
> also calls napi_disable() on the receive queue's napi.  When
> napi_disable() is called on an already disabled napi, it will sleep in
> napi_disable_locked while still holding the netdev_lock. As a result,
> later napi_enable gets stuck too as it cannot acquire the netdev_lock.
> This leads to refill_work and the pause-then-resume tx are stuck
> altogether.
>
> This scenario can be reproducible by binding a XDP socket to virtio-net
> interface without setting up the fill ring. As a result, try_fill_recv
> will fail until the fill ring is set up and refill_work is scheduled.
>
> This commit adds virtnet_rx_(pause/resume)_all helpers and fixes up the
> virtnet_rx_resume to disable future and cancel all inflights delayed
> refill_work before calling napi_disable() to pause the rx.
>
> Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()")
> Acked-by: Michael S. Tsirkin <m...@redhat.com>
> Signed-off-by: Bui Quang Minh <minhquangbu...@gmail.com>

Acked-by: Jason Wang <jasow...@redhat.com>

(In the future, we may consider switch to per virtqueue refill work instead)

Thanks


Reply via email to