vhost checked the counter within the refcnt before decrementing. It really wanted to know that there aren't too many references, as a way to batch freeing resources a bit more efficiently.
This works well but it we now access the ref counter twice so there's a race: all users might see a high count and decide to defer freeing resources. In the end no one initiates freeing resources until the last reference is gone (which is on VM shotdown so might happen after a looooong time). Let's do what we should have done straight away: add a kref API to return the kref value atomically, and use that to avoid the deadlock. Reported-by: Qin Chuanyu <qinchua...@huawei.com> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> --- drivers/vhost/net.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 831eb4f..7eaf2de 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -140,9 +140,9 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy) return ubufs; } -static void vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs) +static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs) { - kref_put(&ubufs->kref, vhost_net_zerocopy_done_signal); + return kref_sub_return(&ubufs->kref, 1, vhost_net_zerocopy_done_signal); } static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs) @@ -306,22 +306,21 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) { struct vhost_net_ubuf_ref *ubufs = ubuf->ctx; struct vhost_virtqueue *vq = ubufs->vq; - int cnt = atomic_read(&ubufs->kref.refcount); + int cnt; /* set len to mark this desc buffers done DMA */ vq->heads[ubuf->desc].len = success ? VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN; - vhost_net_ubuf_put(ubufs); + cnt = vhost_net_ubuf_put(ubufs); /* * Trigger polling thread if guest stopped submitting new buffers: - * in this case, the refcount after decrement will eventually reach 1 - * so here it is 2. + * in this case, the refcount after decrement will eventually reach 1. * We also trigger polling periodically after each 16 packets * (the value 16 here is more or less arbitrary, it's tuned to trigger * less than 10% of times). */ - if (cnt <= 2 || !(cnt % 16)) + if (cnt <= 1 || !(cnt % 16)) vhost_poll_queue(&vq->poll); } -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/