On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <[email protected]> wrote:
> On Fri, Sep 19, 2025 at 17:09:07 +0800, [email protected] wrote: > > From: Hyman Huang <[email protected]> > > > > When saving the domain status in the path of the virDomainObjSave > > function, Libvirtd will omit the private data if the format > > interface is not provided. > > > > When the Libvirtd service is starting up and the driver has not > > yet been initialized, this behavior is justified. > > > > However, when the service is shutting down, the driver is being > > finalized and the interface is being released, a migration job > > or another job may call the qemuDomainSaveStatus and save the > > domain status at the same time. For the latter, this behavior > > I had another look and `virQEMUDriverPrivateDataCallbacks` is a global > variable in the qemu driver which is just populated and never changed. > > This struct is then copied (by assignment) into the virDomainXMLOption > instance that is used inside the qemu driver. Thus while this does have > a private copy of the function pointers. The virDomainXMLOption > instance lives the whole time the virt drivers live, so it's only ever > cleared in virStateCleanup, which in the daemons happens right before > exit. > > Now looking at the formatter code (let's use your patch as an example): > > --- > src/conf/domain_conf.c | 21 ++++++++++++++++++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > index 281846dfbe..74af08e584 100644 > --- a/src/conf/domain_conf.c > +++ b/src/conf/domain_conf.c > @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, > obj->deprecations[i]); > } > =20 > - if (xmlopt->privateData.format && > - xmlopt->privateData.format(&buf, obj) < 0) > - return NULL; > + if (xmlopt->privateData.format) { > + if (xmlopt->privateData.format(&buf, obj) < 0) > > Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside > of XMLopt you have only pointers that don't change. > > As of such the 'format' callback being NULL is only if something > overwrote the data inside 'xmlopt', which we don't do anywhere because > it doesn't make sense. > > The other possibility is that 'xmlopt' is used after free, thus was > randomly overwritten, but that would be a much different kind of bug and > your fix wouldn't fix that at all. > > > > causes the XML to be saved without private information (such as > > monitor path and qemuCaps), which is required for the Libvirtd > > service to manage a VM during startup. As a result, after restarting > > the Libvirtd service, a running VM that is being migrated previously > > might escape management. > > > > Thus, when formatting the status XML for a running virtual machine, > > we need to presume that the "privateData.format" interface is present. > > > > Add a thorough check in the virDomainObjSave path to make sure that > > private data in the status XML file always exists for the running VM > > so that we won't lose it after restart the Libvirtd service. > > Now this outlines the implications of losing the private data but > doesn't really give any information how this happened. > > Can you please clarify: > 1) What did you do to trigger this bug? What was the state of the host? > 2) How did you figure out that what you claim happened? > 3) Can you reproduce the bug? > > If you can reproduce the bug please make sure to collect debug logs of > libvirtd, and potentially consider running the daemon inside e.g. > valgrind to see if it reproduces with such heavy instrumentation. > > 1. Add a debuginfo print patch on Libvirtd, if privateData.format does not exist, log it: --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj, xmlopt->privateData.format(&buf, obj) < 0) return NULL; + if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + } + 2. Add a private patch(Not the complete code): Stop process if malformed config found Malformed material may prevent the virtual machine from loading in its live state, according to the configuration XML. If such a scenario was found by the Libvirt service during startup, report an error and terminate. diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ba5eb7ad1b..9a97fa63fd 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -951,7 +951,7 @@ qemuStateInitialize(bool privileged, /* Get all the running persistent or transient configs first */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->stateDir, - NULL, true, false, + NULL, true, true, qemu_driver->xmlopt, NULL, NULL) < 0) --- a/src/bhyve/bhyve_driver.c +++ b/src/bhyve/bhyve_driver.c @@ -1286,14 +1286,14 @@ bhyveStateInitialize(bool privileged, if (virDomainObjListLoadAllConfigs(bhyve_driver->domains, BHYVE_STATE_DIR, - NULL, true, + NULL, true, false, bhyve_driver->xmlopt, NULL, NULL) < 0) goto cleanup; if (virDomainObjListLoadAllConfigs(bhyve_driver->domains, BHYVE_CONFIG_DIR, - BHYVE_AUTOSTART_DIR, false, + BHYVE_AUTOSTART_DIR, false, false, bhyve_driver->xmlopt, NULL, NULL) < 0) --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -584,6 +584,7 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, const char *autostartDir, bool liveStatus, + bool liveStatusErrorFatal, virDomainXMLOptionPtr xmlopt, virDomainLoadConfigNotify notify, void *opaque) @@ -630,9 +631,14 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, virDomainObjEndAPI(&dom); } else { VIR_ERROR(_("Failed to load config for domain '%s'"), entry->d_name); + if (liveStatus && liveStatusErrorFatal) { + ret = -1; + goto end; + } } } +end: VIR_DIR_CLOSE(dir); virObjectRWUnlock(doms); diff --git a/src/conf/virdomainobjlist.h b/src/conf/virdomainobjlist.h index 6150e13aa4..1ff3679c50 100644 --- a/src/conf/virdomainobjlist.h +++ b/src/conf/virdomainobjlist.h @@ -66,6 +66,7 @@ int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, const char *autostartDir, bool liveStatus, + bool liveStatusErrorFatal, virDomainXMLOptionPtr xmlopt, virDomainLoadConfigNotify notify, void *opaque); ... 3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase. 4. Search the log message: $ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : Thread 1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params 2025-09-22 03:06:12.517+0000: 1124258: debug : remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 : Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1 Migration rpc-worker routine print the message 5. Here we see the status XML: <domstatus state='running' reason='migrated' pid='1092202'> <taint flag='custom-argv'/> <taint flag='custom-ga-command'/> <domain type='kvm' id='1' xmlns:qemu=' http://libvirt.org/schemas/domain/qemu/1.0'> <name>88030027-a5ce-4c19-aacc-5f1decf0b647</name> <uuid>6c677d6f-d140-48ca-b4ae-aff7fa7e4c50</uuid> <description>SmartX</description> <maxMemory slots='255' unit='KiB'>4194304000</maxMemory> ... It does not contain private data. 6. Finally, in our case, Libvirtd service can not startup; And in the upstream case, the behavior is that VM escapes from managing after startup logically. -- Best regards
