On Wed, Feb 25, 2026 at 11:04:40AM +0500, Alexandr Moshkov wrote:
>
> On 2/24/26 21:13, Peter Xu wrote:
> > On Tue, Feb 24, 2026 at 12:10:20PM +0500, Alexandr Moshkov wrote:
> > > On 2/23/26 23:58, Peter Xu wrote:
> > > > On Fri, Feb 20, 2026 at 03:26:29PM +0500, Alexandr Moshkov wrote:
> > > > > When loading a subset, its name is checked for the parent prefix. The
> > > > > following bug may occur here:
> > > > >
> > > > > Let's say there is a vmstate named "virtio-blk", it has a subsection
> > > > > named "virtio-blk/subsection", and it also has another vmstate named
> > > > > "virtio" in the fields.
> > > > > Then, during the migration, when trying to load this subsection for
> > > > > "virtio", the prefix condition will pass for "virtio-blk/subsection"
> > > > > and
> > > > > then the migration will break, because this vmstate does not have
> > > > > such a
> > > > > subsection.
> > > > >
> > > > > In other words, if a field inside vmstate1 is set via vmstate2 with a
> > > > > name that is a prefix of the parent vmstate, then the field can
> > > > > "steal"
> > > > > a subsection belonging to the parent state.
> > > > >
> > > > > Fix it by checking `/` at the end of idstr.
> > > > Checking versus '\' looks reasonable, however I'm still confused on the
> > > > example given, and what problem you hit.
> > > >
> > > > Here, your concern seems to be that vmstate_subsection_load() can
> > > > accidentally load a FIELD of the parent VMSD whose name is exactly the
> > > > name
> > > > of the parent VMSD (which will be the prefix of all subsections).
> > > Thanks for reply,
> > >
> > > vmstate_subsection_load() while trying to load a FIELD (whose name is
> > > prefix
> > > of parent name) of the parent VMSD will try to load parent subsection.
> > >
> > > > Now my question is, when reaching the line you modified below, it needs
> > > > to
> > > > be prefixed with QEMU_VM_SUBSECTION. It means the src QEMU is dumping a
> > > > subsection rather than a field. OTOH, when dumping a field, we never
> > > > dump
> > > > any name; I don't think we name FIELD at all..
> > > >
> > > > Could you share the failure you hit in real life? That might help to
> > > > understand the problem on its own.
> > > Here is code example:
> > >
> > > ```
> > >
> > > static const VMStateDescription vmstate_vhost_user_virtio_blk_inflight = {
> > > .name = "virtio-blk/inflight",
> > > .version_id = 2,
> > > .needed = vhost_user_blk_inflight_needed,
> > > .fields = (const VMStateField[]) {
> > > VMSTATE_VHOST_INFLIGHT_REGION(inflight, VHostUserBlk),
> > > VMSTATE_END_OF_LIST()
> > > }
> > > };
> > >
> > >
> > > static const VMStateDescription vmstate_vhost_user_virtio_blk = {
> > > .name = "virtio-blk",
> > > .minimum_version_id = 2,
> > > .version_id = 2,
> > > .fields = (VMStateField[]) {
> > > VMSTATE_VIRTIO_DEVICE,
> > > VMSTATE_END_OF_LIST()
> > > },
> > > .subsections = (const VMStateDescription * []) {
> > > &vmstate_vhost_user_virtio_blk_inflight,
> > > NULL
> > > }
> > > };
> > >
> > > ```
> > >
> > > VMSTATE_VIRTIO_DEVICEĀ is
> > >
> > > ```
> > >
> > > #define VMSTATE_VIRTIO_DEVICE \
> > > { \
> > > .name = "virtio", \
> > > .info = &virtio_vmstate_info, \
> > > .flags = VMS_SINGLE, \
> > > }
> > > ```
> > > So here is "virtio-blk" vmsd that have "virtio" vmsd field and
> > > "virtio-blk/inflight" subsection. This configuration will result in the
> > > fact
> > > that inflight vmsd will not be loaded at all (assuming that it met all the
> > > requirements). Qemu logs will contain trace_vmstate_subsection_load_bad
> > > (lookup) error for "virtio" vmstate when loading "virtio-blk/inflight"
> > > subsection.
> > Thanks for the details. However it didn't resolve my confusion. Let me
> > ask more explicitly.
> >
> > In this case, VMSTATE_VIRTIO_DEVICE will be a single field in the parent
> > vmsd of "virtio-blk". Meanwhile, "virtio-blk/inflight" will be the only
> > subsection.
> >
> > Now, if you hit a LOOKUP error of trace_vmstate_subsection_load_bad(), it
> > means the dst QEMU hits this:
> >
> > vmstate_subsection_load():
> > sub_vmsd = vmstate_get_subsection(vmsd->subsections, idstr);
> > if (sub_vmsd == NULL) {
> > trace_vmstate_subsection_load_bad(vmsd->name, idstr,
> > "(lookup)");
> > error_setg(errp, "VM subsection '%s' in '%s' does not exist",
> > idstr, vmsd->name);
> > return -ENOENT;
> > }
>
> Yes, and it will return -ENOENT. This causes an error in upper function.
> Error in destination qemu logs will looks like this:
>
> [... load virtio field ...]
> vmstate_subsection_load_bad virtio: virtio-blk/inflight/(lookup)
Yes, this implies vmstate_get_subsection() failure.
> qemu-system-x86_64: Failed to load virtio-blk:virtio
This implies the above failure happened in a nested load of
"virtio-blk:virtio", rather than the top "virtio-blk".
Here virtio_load invokes two vmstate_load_state on vdc->vmsd,
vmstate_virtio.
If the loader side was trying to lookup the "/inflight" subsection within
e.g. vmstate_virtio then it will fail indeed, but I don't understand why it
sees the "/inflight" subsection. Can you help explain?
Are you using different versions of QEMU on src/dst when testing? If
they're different (or with different patches applied), please spell them
out.
Or if you could help me to answer what is missing that I stated below it
might also help me to figure out what I missed. So far I'm still a bit
lost and not yet understand why this patch helps..
Thanks,
> vmstate_load_field_error field "virtio" load failed, ret = -2
> qemu-system-x86_64: error while loading state for instance 0x0 of device
> '0000:80:03.0:00.0:00.0/virtio-blk'
> qemu-system-x86_64: load of migration failed: No such file or directory
>
> > IIUC, when reaching here, the load of VMSTATE_VIRTIO_DEVICE should have
> > been completed, because we always load fields before subsections.
> >
> > Meanwhile, when reaching here, "idstr" should be "virtio-blk/inflight",
> > because that's essentailly the only subsection this parent VMSD has.
> >
> > vmstate_get_subsection() should try to lookup all subsections matching it,
> > and it should find it.
> >
> > I do not yet get why the VMSTATE_VIRTIO_DEVICE can get involved in the
> > lookup (e.g. it is not on vmsd->subsections), and why it caused failure.
> >
> > Could you help me to find where I missed?
> >
> > Thanks,
> >
>
--
Peter Xu