On Wed, Mar 11, 2026 at 09:53:07AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 10.03.26 21:00, Peter Xu wrote:
> > On Tue, Mar 10, 2026 at 01:39:04PM +0000, Peter Maydell wrote:
> > > On Tue, 10 Mar 2026 at 13:28, Alexandr Moshkov
> > > <[email protected]> wrote:
> > > > Hi, this breaks migration for s390x and ppc64:
> > > > 
> > > > qemu-system-s390x: Missing section footer for qemu-s390-flic
> > > > 
> > > > Thanks for reply! It happens because "qemu-s390-flic" vmsd has 
> > > > "qemu-s390-flic-full" subsection. I'm not sure if we can change the 
> > > > names of the existing subsections. Peter, what do you think?
> > > > 
> > > > It seems to be a bigger problem that the migration framework does not 
> > > > document the names for subsections in any way. At the same time, the 
> > > > code implies that at least the names should be a substring with its 
> > > > parent. There is even such a code comment `subsection name has to be 
> > > > "section_name/a"`, that seems to imply the presence of /.
> > > > 
> > > > Also I tried to put together a list of devices whose separator in 
> > > > subsections is not /:
> > > 
> > > > or-irq
> > > 
> > > This one and probably some of the others are my mistake, I think.
> > > I think this is because:
> > >   * the migration code imposes a constraint on the subsection names
> > >   * the migration documentation does not mention this constraint
> > >   * the migration code does not effectively enforce this constraint
> > >     (e.g. by asserting when the vmstate with a bad name is registered)
> > > 
> > > My assumption when writing that code was very likely that the
> > > name of the subsection didn't have to have any relation to
> > > the name of its parent subsection (after all, the migration
> > > code knows it is a subsection, so if it needed to have the
> > > name on the wire be "parentname/subsectionname" it could construct
> > > that itself), and since it all just worked I never noticed the
> > > mistake...
> > 
> > Indeed, we can do better on the 2nd/3rd bullets.. so we should document it
> > in struct VMStateDescription definition, and enforce it when registering
> > new VMSDs.
> > 
> > One thing to double check with Alexandr on this one:
> > 
> > > They using `_`, `-` or `.` as separator
> > 
> > Does it mean we also can't simply whitelist all these characters (including
> > "/"), because it won't always work?
> > 
> > For example in your case, it's "virtio-blk" being the parent VMSD, "virtio"
> > being the child VMSD, followed with a subsection belongs to virtio-blk.
> > We want to make that subsection be recognizable as virtio-blk's.
> > 
> > Then if we treat "-" also to be a separator, then "virtio" will still
> > mis-recognize virtio-blk's as its own (because its name is "virtio-blk/..."
> > hence it also satisfies the "virtio" check)?
> > 
> 
> I have an idea. What if we simply change the order, instead of
> 
> 1. check prefix (with '/' at the end). and stop the loop if it doesn't match
> 2. call vmstate_get_subsection(vmsd->subsections, idstr)
> 
> do
> 
> 0. Document, and check, that starting from 11.0 any new subsections must 
> follow <section name>/<subsection name> notation
> 
> in vmstate_subsection_load():
> 
> 1. call vmstate_get_subsection, if it succeeded we are done
> 2. check prefix with '/' at the end. If match - it's a new subsection, added 
> after 11.0 point, in correct notation. If does not match - it's subsection of 
> another state, stop the loop.

Yep, this sounds working.

But I wonder do we need (2) at all?  Say, can we sololy rely on
vmstate_get_subsection() to identify if we should take it or just leave it
for upper layers?  So we will never fail a subsection lookup, only either
(1) load it if owned, or (2) ignore it and pop the VMSD stack.

It means in code we don't restrict subsection name to be "$PARENT_NAME/*".

However we should likely still suggest that in doc of subsections, so we at
least avoid one parent VMSD having subsection S1, then its field having
subsection S2, and by accident S1 and S2 has the same name..

> 
> 
> Possible problems:
> 
> 1. may be some subsections (with wrong notation), which were added before 
> 11.0, but then removed (we break migration from old version to new).

If a subsection with wrong notation got removed, then it should require a
machine compat property otherwise migration should be broken anyway?

> 
> This can be checked by git-grepping for subsections through qemu releases.
> 
> 2. may be some downstream subsections (we may break migration from some 
> downstream versions to upstream).
> 
> Should we care of it? Probably not.

Not sure if this is an issue if we drop (2) directly above.

There seems to have a 3rd possible "issue": the failure will be more
obscure on a subsection that nobody recognizes; we used to try our best say
"VM subsection '...' in '...' does not exist", but then it'll always pop
the VMSD stack then the upper layer may read the SUBSECTION byte anywhere.
So debugging a stream with an extra subsection can be slightly more
awkward, but I'm not sure how much.  Meanwhile for a compatible / normal
migration it'll be all fine.  It just means when seeing a subsection will
try to pick it up from the inner VMSD and fallback to outter VMSD every
time.

-- 
Peter Xu


Reply via email to