vhost checked the counter within the refcnt before decrementing.  It
really wanted to know that there aren't too many references, as a way to
batch freeing resources a bit more efficiently.

This works well but it we now access the
ref counter twice so there's a race:
all users might see a high count and decide
to defer freeing resources.
In the end no one initiates freeing resources
until the last reference is gone (which is on VM shotdown
so might happen after a looooong time).

Let's do what we should have done straight away:
add a kref API to return the kref value atomically,
and use that to avoid the deadlock.

Reported-by: Qin Chuanyu <qinchua...@huawei.com>
Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
---
 drivers/vhost/net.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831eb4f..7eaf2de 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -140,9 +140,9 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
        return ubufs;
 }
 
-static void vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
+static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
 {
-       kref_put(&ubufs->kref, vhost_net_zerocopy_done_signal);
+       return kref_sub_return(&ubufs->kref, 1, vhost_net_zerocopy_done_signal);
 }
 
 static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
@@ -306,22 +306,21 @@ static void vhost_zerocopy_callback(struct ubuf_info 
*ubuf, bool success)
 {
        struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
        struct vhost_virtqueue *vq = ubufs->vq;
-       int cnt = atomic_read(&ubufs->kref.refcount);
+       int cnt;
 
        /* set len to mark this desc buffers done DMA */
        vq->heads[ubuf->desc].len = success ?
                VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
-       vhost_net_ubuf_put(ubufs);
+       cnt = vhost_net_ubuf_put(ubufs);
 
        /*
         * Trigger polling thread if guest stopped submitting new buffers:
-        * in this case, the refcount after decrement will eventually reach 1
-        * so here it is 2.
+        * in this case, the refcount after decrement will eventually reach 1.
         * We also trigger polling periodically after each 16 packets
         * (the value 16 here is more or less arbitrary, it's tuned to trigger
         * less than 10% of times).
         */
-       if (cnt <= 2 || !(cnt % 16))
+       if (cnt <= 1 || !(cnt % 16))
                vhost_poll_queue(&vq->poll);
 }
 
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to