On Wed, Feb 25, 2026 at 11:04:40AM +0500, Alexandr Moshkov wrote:
> 
> On 2/24/26 21:13, Peter Xu wrote:
> > On Tue, Feb 24, 2026 at 12:10:20PM +0500, Alexandr Moshkov wrote:
> > > On 2/23/26 23:58, Peter Xu wrote:
> > > > On Fri, Feb 20, 2026 at 03:26:29PM +0500, Alexandr Moshkov wrote:
> > > > > When loading a subset, its name is checked for the parent prefix. The
> > > > > following bug may occur here:
> > > > > 
> > > > > Let's say there is a vmstate named "virtio-blk", it has a subsection
> > > > > named "virtio-blk/subsection", and it also has another vmstate named
> > > > > "virtio" in the fields.
> > > > > Then, during the migration, when trying to load this subsection for
> > > > > "virtio", the prefix condition will pass for "virtio-blk/subsection" 
> > > > > and
> > > > > then the migration will break, because this vmstate does not have 
> > > > > such a
> > > > > subsection.
> > > > > 
> > > > > In other words, if a field inside vmstate1 is set via vmstate2 with a
> > > > > name that is a prefix of the parent vmstate, then the field can 
> > > > > "steal"
> > > > > a subsection belonging to the parent state.
> > > > > 
> > > > > Fix it by checking `/` at the end of idstr.
> > > > Checking versus '\' looks reasonable, however I'm still confused on the
> > > > example given, and what problem you hit.
> > > > 
> > > > Here, your concern seems to be that vmstate_subsection_load() can
> > > > accidentally load a FIELD of the parent VMSD whose name is exactly the 
> > > > name
> > > > of the parent VMSD (which will be the prefix of all subsections).
> > > Thanks for reply,
> > > 
> > > vmstate_subsection_load() while trying to load a FIELD (whose name is 
> > > prefix
> > > of parent name) of the parent VMSD will try to load parent subsection.
> > > 
> > > > Now my question is, when reaching the line you modified below, it needs 
> > > > to
> > > > be prefixed with QEMU_VM_SUBSECTION.  It means the src QEMU is dumping a
> > > > subsection rather than a field.  OTOH, when dumping a field, we never 
> > > > dump
> > > > any name; I don't think we name FIELD at all..
> > > > 
> > > > Could you share the failure you hit in real life?  That might help to
> > > > understand the problem on its own.
> > > Here is code example:
> > > 
> > > ```
> > > 
> > > static const VMStateDescription vmstate_vhost_user_virtio_blk_inflight = {
> > > .name = "virtio-blk/inflight",
> > > .version_id = 2,
> > > .needed = vhost_user_blk_inflight_needed,
> > > .fields = (const VMStateField[]) {
> > > VMSTATE_VHOST_INFLIGHT_REGION(inflight, VHostUserBlk),
> > > VMSTATE_END_OF_LIST()
> > > }
> > > };
> > > 
> > > 
> > > static const VMStateDescription vmstate_vhost_user_virtio_blk = {
> > > .name = "virtio-blk",
> > > .minimum_version_id = 2,
> > > .version_id = 2,
> > > .fields = (VMStateField[]) {
> > > VMSTATE_VIRTIO_DEVICE,
> > > VMSTATE_END_OF_LIST()
> > > },
> > > .subsections = (const VMStateDescription * []) {
> > > &vmstate_vhost_user_virtio_blk_inflight,
> > > NULL
> > > }
> > > };
> > > 
> > > ```
> > > 
> > > VMSTATE_VIRTIO_DEVICEĀ is
> > > 
> > > ```
> > > 
> > > #define VMSTATE_VIRTIO_DEVICE \
> > > { \
> > > .name = "virtio", \
> > > .info = &virtio_vmstate_info, \
> > > .flags = VMS_SINGLE, \
> > > }
> > > ```
> > > So here is "virtio-blk" vmsd that have "virtio" vmsd field and
> > > "virtio-blk/inflight" subsection. This configuration will result in the 
> > > fact
> > > that inflight vmsd will not be loaded at all (assuming that it met all the
> > > requirements). Qemu logs will contain trace_vmstate_subsection_load_bad
> > > (lookup) error for "virtio" vmstate when loading "virtio-blk/inflight"
> > > subsection.
> > Thanks for the details.  However it didn't resolve my confusion.  Let me
> > ask more explicitly.
> > 
> > In this case, VMSTATE_VIRTIO_DEVICE will be a single field in the parent
> > vmsd of "virtio-blk".  Meanwhile, "virtio-blk/inflight" will be the only
> > subsection.
> > 
> > Now, if you hit a LOOKUP error of trace_vmstate_subsection_load_bad(), it
> > means the dst QEMU hits this:
> > 
> > vmstate_subsection_load():
> >          sub_vmsd = vmstate_get_subsection(vmsd->subsections, idstr);
> >          if (sub_vmsd == NULL) {
> >              trace_vmstate_subsection_load_bad(vmsd->name, idstr, 
> > "(lookup)");
> >              error_setg(errp, "VM subsection '%s' in '%s' does not exist",
> >                         idstr, vmsd->name);
> >              return -ENOENT;
> >          }
> 
> Yes, and it will return -ENOENT. This causes an error in upper function.
> Error in destination qemu logs will looks like this:
> 
> [... load virtio field ...]
> vmstate_subsection_load_bad virtio: virtio-blk/inflight/(lookup)

Yes, this implies vmstate_get_subsection() failure.

> qemu-system-x86_64: Failed to load virtio-blk:virtio

This implies the above failure happened in a nested load of
"virtio-blk:virtio", rather than the top "virtio-blk".

Here virtio_load invokes two vmstate_load_state on vdc->vmsd,
vmstate_virtio.

If the loader side was trying to lookup the "/inflight" subsection within
e.g. vmstate_virtio then it will fail indeed, but I don't understand why it
sees the "/inflight" subsection.  Can you help explain?

Are you using different versions of QEMU on src/dst when testing?  If
they're different (or with different patches applied), please spell them
out.

Or if you could help me to answer what is missing that I stated below it
might also help me to figure out what I missed.  So far I'm still a bit
lost and not yet understand why this patch helps..

Thanks,

> vmstate_load_field_error field "virtio" load failed, ret = -2
> qemu-system-x86_64: error while loading state for instance 0x0 of device
> '0000:80:03.0:00.0:00.0/virtio-blk'
> qemu-system-x86_64: load of migration failed: No such file or directory
> 
> > IIUC, when reaching here, the load of VMSTATE_VIRTIO_DEVICE should have
> > been completed, because we always load fields before subsections.
> > 
> > Meanwhile, when reaching here, "idstr" should be "virtio-blk/inflight",
> > because that's essentailly the only subsection this parent VMSD has.
> > 
> > vmstate_get_subsection() should try to lookup all subsections matching it,
> > and it should find it.
> > 
> > I do not yet get why the VMSTATE_VIRTIO_DEVICE can get involved in the
> > lookup (e.g. it is not on vmsd->subsections), and why it caused failure.
> > 
> > Could you help me to find where I missed?
> > 
> > Thanks,
> > 
> 

-- 
Peter Xu


Reply via email to