On Mon, Jul 28, 2025 at 6:24 PM Jonah Palmer <[email protected]> wrote:
>
>
>
> On 7/28/25 11:30 AM, Eugenio Perez Martin wrote:
> > On Tue, Jul 22, 2025 at 2:41 PM Jonah Palmer <[email protected]>
> > wrote:
> >>
> >> Iterative live migration for virtio-net sends an initial
> >> VMStateDescription while the source is still active. Because data
> >> continues to flow for virtio-net, the guest's avail index continues to
> >> increment after last_avail_idx had already been sent. This causes the
> >> destination to often see something like this from virtio_error():
> >>
> >> VQ 0 size 0x100 Guest index 0x0 inconsistent with Host index 0xc: delta
> >> 0xfff4
> >>
> >> This patch suppresses this consistency check if we're loading the
> >> initial VMStateDescriptions via iterative migration and unsuppresses
> >> it for the stop-and-copy phase when the final VMStateDescriptions
> >> (carrying the correct indices) are loaded.
> >>
> >> A temporary VirtIODevMigration migration data structure is introduced here
> >> to
> >> represent the iterative migration process for a VirtIODevice. For now it
> >> just holds a flag to indicate whether or not the initial
> >> VMStateDescription was sent during the iterative live migration process.
> >>
> >> Signed-off-by: Jonah Palmer <[email protected]>
> >> ---
> >> hw/net/virtio-net.c | 13 +++++++++++++
> >> hw/virtio/virtio.c | 32 ++++++++++++++++++++++++--------
> >> include/hw/virtio/virtio.h | 6 ++++++
> >> 3 files changed, 43 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 86a6fe5b91..b7ac5e8278 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -3843,12 +3843,19 @@ static void virtio_net_save_cleanup(void *opaque)
> >>
> >> static int virtio_net_load_setup(QEMUFile *f, void *opaque, Error **errp)
> >> {
> >> + VirtIONet *n = opaque;
> >> + VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >> + vdev->migration = g_new0(VirtIODevMigration, 1);
> >> + vdev->migration->iterative_vmstate_loaded = false;
> >> +
> >> return 0;
> >> }
> >>
> >> static int virtio_net_load_state(QEMUFile *f, void *opaque, int
> >> version_id)
> >> {
> >> VirtIONet *n = opaque;
> >> + VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >> + VirtIODevMigration *mig = vdev->migration;
> >> uint64_t flag;
> >>
> >> flag = qemu_get_be64(f);
> >> @@ -3861,6 +3868,7 @@ static int virtio_net_load_state(QEMUFile *f, void
> >> *opaque, int version_id)
> >> case VNET_MIG_F_INIT_STATE:
> >> {
> >> vmstate_load_state(f, &vmstate_virtio_net, n,
> >> VIRTIO_NET_VM_VERSION);
> >> + mig->iterative_vmstate_loaded = true;
> >
> > This code will need to change if we send the status iteratively more
> > than once. For example, if the guest changes the mac address, the
> > number of vqs, etc.
> >
>
> Hopefully we can reach a solution where we'd only need to call the full
> vmstate_load_state(f, &vmstate_virtio_net, ...) for a virtio-net device
> once and then handle any changes afterwards individually.
>
> Perhaps, maybe for simplicity, we could just send the
> sub-states/subsections (instead of the whole state again) iteratively if
> there were any changes in the fields that those sub-states/subsections
> govern.
>
> Definitely something I'll keep in mind as this series develops.
>
> > In my opinion, we should set a flag named "in_iterative_migration" (or
> > equivalent) in virtio_net_load_setup and clear it in
> > virtio_net_load_cleanup. That's enough to tell in virtio_load if we
> > should perform actions like checking for inconsistent indices.
> >
>
> I did actually try something like this but I realized that the
> .load_cleanup and .save_cleanup hooks actually fire at the very end of
> live migration (e.g. during the stop-and-copy phase). I thought they
> fired at the end of the iterative portion of live migration, but this
> didn't appear to be the case.
>
Ok that makes a lot of sense. What about .switchover_start ? We need
the switchover capability though, not sure if it is a good idea to
mandate it as a requirement. So yes, maybe this patch is the most
reliable way to do so.
> >> break;
> >> }
> >> default:
> >> @@ -3875,6 +3883,11 @@ static int virtio_net_load_state(QEMUFile *f, void
> >> *opaque, int version_id)
> >>
> >> static int virtio_net_load_cleanup(void *opaque)
> >> {
> >> + VirtIONet *n = opaque;
> >> + VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >> + g_free(vdev->migration);
> >> + vdev->migration = NULL;
> >> +
> >> return 0;
> >> }
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 5534251e01..68957ee7d1 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -3222,6 +3222,7 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int
> >> version_id)
> >> int32_t config_len;
> >> uint32_t num;
> >> uint32_t features;
> >> + bool inconsistent_indices;
> >> BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> >> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> >> VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
> >> @@ -3365,6 +3366,16 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int
> >> version_id)
> >> if (vdev->vq[i].vring.desc) {
> >> uint16_t nheads;
> >>
> >> + /*
> >> + * Ring indices will be inconsistent during iterative
> >> migration. The actual
> >> + * indices will be sent later during the stop-and-copy phase.
> >> + */
> >> + if (vdev->migration) {
> >> + inconsistent_indices =
> >> !vdev->migration->iterative_vmstate_loaded;
> >> + } else {
> >> + inconsistent_indices = false;
> >> + }
> >
> > Nit, "inconsistent_indices = vdev->migration &&
> > !vdev->migration->iterative_vmstate_loaded" ? I'm happy with the
> > current "if else" too, but I think the one line is clearer. Your call
> > :).
> >
>
> Ah, nice catch! I like the one-liner more :) Will change this for next
> series.
>
> >> +
> >> /*
> >> * VIRTIO-1 devices migrate desc, used, and avail ring
> >> addresses so
> >> * only the region cache needs to be set up. Legacy devices
> >> need
> >> @@ -3384,14 +3395,19 @@ virtio_load(VirtIODevice *vdev, QEMUFile *f, int
> >> version_id)
> >> continue;
> >> }
> >>
> >> - nheads = vring_avail_idx(&vdev->vq[i]) -
> >> vdev->vq[i].last_avail_idx;
> >> - /* Check it isn't doing strange things with descriptor
> >> numbers. */
> >> - if (nheads > vdev->vq[i].vring.num) {
> >> - virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
> >> - "inconsistent with Host index 0x%x: delta
> >> 0x%x",
> >> - i, vdev->vq[i].vring.num,
> >> - vring_avail_idx(&vdev->vq[i]),
> >> - vdev->vq[i].last_avail_idx, nheads);
> >> + if (!inconsistent_indices) {
> >> + nheads = vring_avail_idx(&vdev->vq[i]) -
> >> vdev->vq[i].last_avail_idx;
> >> + /* Check it isn't doing strange things with descriptor
> >> numbers. */
> >> + if (nheads > vdev->vq[i].vring.num) {
> >> + virtio_error(vdev, "VQ %d size 0x%x Guest index 0x%x "
> >> + "inconsistent with Host index 0x%x:
> >> delta 0x%x",
> >> + i, vdev->vq[i].vring.num,
> >> + vring_avail_idx(&vdev->vq[i]),
> >> + vdev->vq[i].last_avail_idx, nheads);
> >> + inconsistent_indices = true;
> >> + }
> >> + }
> >> + if (inconsistent_indices) {
> >> vdev->vq[i].used_idx = 0;
> >> vdev->vq[i].shadow_avail_idx = 0;
> >> vdev->vq[i].inuse = 0;
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index 214d4a77e9..06b6e6ba65 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -98,6 +98,11 @@ enum virtio_device_endian {
> >> VIRTIO_DEVICE_ENDIAN_BIG,
> >> };
> >>
> >> +/* VirtIODevice iterative live migration data structure */
> >> +typedef struct VirtIODevMigration {
> >> + bool iterative_vmstate_loaded;
> >> +} VirtIODevMigration;
> >> +
> >> /**
> >> * struct VirtIODevice - common VirtIO structure
> >> * @name: name of the device
> >> @@ -151,6 +156,7 @@ struct VirtIODevice
> >> bool disable_legacy_check;
> >> bool vhost_started;
> >> VMChangeStateEntry *vmstate;
> >> + VirtIODevMigration *migration;
> >> char *bus_name;
> >> uint8_t device_endian;
> >> /**
> >> --
> >> 2.47.1
> >>
> >
>