Fabiano Rosas <faro...@suse.de> writes: > Markus Armbruster <arm...@redhat.com> writes: > >> Peter Xu <pet...@redhat.com> writes: >> >>> On Tue, Jan 09, 2024 at 10:22:31PM +0100, Philippe Mathieu-Daudé wrote: >>>> Hi Fabiano, >>>> >>>> On 9/1/24 21:21, Fabiano Rosas wrote: >>>> > Cédric Le Goater <c...@kaod.org> writes: >>>> > >>>> > > On 1/9/24 18:40, Fabiano Rosas wrote: >>>> > > > Cédric Le Goater <c...@kaod.org> writes: >>>> > > > >>>> > > > > On 1/3/24 20:53, Fabiano Rosas wrote: >>>> > > > > > Philippe Mathieu-Daudé <phi...@linaro.org> writes: >>>> > > > > > >>>> > > > > > > +Peter/Fabiano >>>> > > > > > > >>>> > > > > > > On 2/1/24 17:41, Cédric Le Goater wrote: >>>> > > > > > > > On 1/2/24 17:15, Philippe Mathieu-Daudé wrote: >>>> > > > > > > > > Hi Cédric, >>>> > > > > > > > > >>>> > > > > > > > > On 2/1/24 15:55, Cédric Le Goater wrote: >>>> > > > > > > > > > On 12/12/23 17:29, Philippe Mathieu-Daudé wrote: >>>> > > > > > > > > > > Hi, >>>> > > > > > > > > > > >>>> > > > > > > > > > > When a MPCore cluster is used, the Cortex-A cores >>>> > > > > > > > > > > belong the the >>>> > > > > > > > > > > cluster container, not to the board/soc layer. This >>>> > > > > > > > > > > series move >>>> > > > > > > > > > > the creation of vCPUs to the MPCore private container. >>>> > > > > > > > > > > >>>> > > > > > > > > > > Doing so we consolidate the QOM model, moving common >>>> > > > > > > > > > > code in a >>>> > > > > > > > > > > central place (abstract MPCore parent). >>>> > > > > > > > > > >>>> > > > > > > > > > Changing the QOM hierarchy has an impact on the state of >>>> > > > > > > > > > the machine >>>> > > > > > > > > > and some fixups are then required to maintain migration >>>> > > > > > > > > > compatibility. >>>> > > > > > > > > > This can become a real headache for KVM machines like >>>> > > > > > > > > > virt for which >>>> > > > > > > > > > migration compatibility is a feature, less for emulated >>>> > > > > > > > > > ones. >>>> > > > > > > > > >>>> > > > > > > > > All changes are either moving properties (which are not >>>> > > > > > > > > migrated) >>>> > > > > > > > > or moving non-migrated QOM members (i.e. pointers of >>>> > > > > > > > > ARMCPU, which >>>> > > > > > > > > is still migrated elsewhere). So I don't see any obvious >>>> > > > > > > > > migration >>>> > > > > > > > > problem, but I might be missing something, so I Cc'ed Juan >>>> > > > > > > > > :> >>>> > > > > > >>>> > > > > > FWIW, I didn't spot anything problematic either. >>>> > > > > > >>>> > > > > > I've ran this through my migration compatibility series [1] and >>>> > > > > > it >>>> > > > > > doesn't regress aarch64 migration from/to 8.2. The tests use '-M >>>> > > > > > virt -cpu max', so the cortex-a7 and cortex-a15 are not covered. >>>> > > > > > I don't >>>> > > > > > think we even support migration of anything non-KVM on arm. >>>> > > > > >>>> > > > > it happens we do. >>>> > > > > >>>> > > > >>>> > > > Oh, sorry, I didn't mean TCG here. Probably meant to say something >>>> > > > like >>>> > > > non-KVM-capable cpus, as in 32-bit. Nevermind. >>>> > > >>>> > > Theoretically, we should be able to migrate to a TCG guest. Well, this >>>> > > worked in the past for PPC. When I was doing more KVM related changes, >>>> > > this was very useful for dev. Also, some machines are partially >>>> > > emulated. >>>> > > Anyhow I agree this is not a strong requirement and we often break it. >>>> > > Let's focus on KVM only. >>>> > > >>>> > > > > > 1- https://gitlab.com/farosas/qemu/-/jobs/5853599533 >>>> > > > > >>>> > > > > yes it depends on the QOM hierarchy and virt seems immune to the >>>> > > > > changes. >>>> > > > > Good. >>>> > > > > >>>> > > > > However, changing the QOM topology clearly breaks migration compat, >>>> > > > >>>> > > > Well, "clearly" is relative =) You've mentioned pseries and aspeed >>>> > > > already, do you have a pointer to one of those cases were we broke >>>> > > > migration >>>> > > >>>> > > Regarding pseries, migration compat broke because of 5bc8d26de20c >>>> > > ("spapr: allocate the ICPState object from under sPAPRCPUCore") which >>>> > > is similar to the changes proposed by this series, it impacts the QOM >>>> > > hierarchy. Here is the workaround/fix from Greg : 46f7afa37096 >>>> > > ("spapr: fix migration of ICPState objects from/to older QEMU") which >>>> > > is quite an headache and this turned out to raise another problem some >>>> > > months ago ... :/ That's why I sent [1] to prepare removal of old >>>> > > machines and workarounds becoming a burden. >>>> > >>>> > This feels like something that could be handled by the vmstate code >>>> > somehow. The state is there, just under a different path. >>>> >>>> What, the QOM path is used in migration? ... >>> >>> Hopefully not.. > > Unfortunately the original fix doesn't mention _what_ actually broke > with migration. I assumed the QOM path was needed because otherwise I > don't think the fix makes sense. The thread discussing that patch also > directly mentions the QOM path: > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg450912.html > > But I probably misunderstood something while reading that thread. > >>> >>>> >>>> See recent discussions on "QOM path stability": >>>> https://lore.kernel.org/qemu-devel/zzfyvlmcxbcia...@redhat.com/ >>>> https://lore.kernel.org/qemu-devel/87jzojbxt7....@pond.sub.org/ >>>> https://lore.kernel.org/qemu-devel/87v883by34....@pond.sub.org/ >>> >>> If I read it right, the commit 46f7afa37096 example is pretty special that >>> the QOM path more or less decided more than the hierachy itself but changes >>> the existances of objects. >> >> Let's see whether I got this... >> >> We removed some useless objects, moved the useful ones to another home. >> The move changed their QOM path. >> >> The problem was the removal of useless objects, because this also >> removed their vmstate. > > If you checkout at the removal commit (5bc8d26de20c), the vmstate has > been kept untouched. > >> >> The fix was adding the vmstate back as a dummy. > > Since the vmstate was kept I don't see why would we need a dummy. The > incoming migration stream would still have the state, only at a > different point in the stream. It's surprising to me that that would > cause an issue, but I'm not well versed in that code.
Alright, I understand neither the problem nor the fix :) >> The QOM patch changes are *not* part of the problem. > > The only explanation I can come up with is that after the patch > migration has broken after a hotplug or similar operation. In such > situation, the preallocated state would always be present before the > patch, but sometimes not present after the patch in case, say, a > hot-unplug has taken away a cpu + ICPState. My head hurts... Oh, we're talking migration! Perfectly normal then.