On 5/29/26 21:35, Andrey Drobyshev wrote:
> Saving or migrating a vhost-blk guest under disk load can fail to load on
> the destination:
>
> qemu-kvm: VQ 0 size 0x100 < last_avail_idx 0xb8ab - used_idx 0xb934
> qemu-kvm: Failed to load vhost-blk:virtio
> qemu-kvm: error while loading state for instance 0x0 of device vhost-blk
> load of migration failed: Operation not permitted
>
> virtio_load() rejects the device because the saved used_idx is ahead of
> last_avail_idx, which is impossible for a coherent vring.
>
> The root cause is that vhost-blk has no "stop fetching" step before the
> device is stopped. On stop, QEMU's vhost_dev_stop() reads last_avail_idx
> via VHOST_GET_VRING_BASE, but the vhost worker is still running: it keeps
> pulling the avail-ring backlog and completing those requests, advancing
> the guest used->idx past the last_avail_idx that was just sampled. The
> saved state is therefore incoherent.
>
> vhost-net does not hit this because it detaches the backend
> (VHOST_NET_SET_BACKEND, fd == -1) before VHOST_GET_VRING_BASE, so its
> worker stops fetching. vhost-blk had no equivalent operation.
>
> Teach VHOST_BLK_SET_BACKEND to treat a negative fd as "stop the device":
> detach the backend from every vq (vhost_blk_handle_guest_kick() bails on a
> NULL backend), drain in-flight requests with vhost_blk_flush(), and release
> the backing file. After this the worker no longer advances the rings, so
> the subsequent VHOST_GET_VRING_BASE reports a final, coherent
> last_avail_idx. The unconsumed avail backlog stays in the ring and is
> reprocessed once the device is restarted. The companion QEMU change issues
> this stop before vhost_dev_stop().
>
> https://virtuozzo.atlassian.net/browse/VSTOR-133464
> Fixes: 40a5928ec730 ("drivers/vhost: vhost-blk accelerator for virtio-blk
> guests")
> Signed-off-by: Andrey Drobyshev <[email protected]>
Reviewed-by: Pavel Tikhomirov <[email protected]>
> ---
> drivers/vhost/blk.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
> index b11f08f878f4..1b073011c445 100644
> --- a/drivers/vhost/blk.c
> +++ b/drivers/vhost/blk.c
> @@ -744,6 +744,24 @@ static long vhost_blk_set_backend(struct vhost_blk *blk,
> int fd)
> if (ret)
> goto out_dev;
>
> + /*
> + * fd < 0 means "stop the device". Detach the backend from every vq so
> + * vhost_blk_handle_guest_kick() stops fetching descriptors, drain the
> + * in-flight requests, and release the backing file.
> + */
> + if (fd < 0) {
> + if (!blk->backend) {
> + mutex_unlock(&blk->dev.mutex);
> + return 0; /* already stopped */
> + }
> + vhost_blk_drop_backends(blk);
> + vhost_blk_flush(blk);
> + fput(blk->backend);
> + blk->backend = NULL;
> + mutex_unlock(&blk->dev.mutex);
> + return 0;
> + }
> +
> if (blk->backend) {
> ret = -EBUSY;
> goto out_dev;
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel