On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <[email protected]> wrote:

> On Fri, Sep 19, 2025 at 17:09:07 +0800, [email protected] wrote:
> > From: Hyman Huang <[email protected]>
> >
> > When saving the domain status in the path of the virDomainObjSave
> > function, Libvirtd will omit the private data if the format
> > interface is not provided.
> >
> > When the Libvirtd service is starting up and the driver has not
> > yet been initialized, this behavior is justified.
> >
> > However, when the service is shutting down, the driver is being
> > finalized and the interface is being released, a migration job
> > or another job may call the qemuDomainSaveStatus and save the
> > domain status at the same time. For the latter, this behavior
>
> I had another look and `virQEMUDriverPrivateDataCallbacks` is a global
> variable in the qemu driver which is just populated and never changed.
>
> This struct is then copied (by assignment)  into the virDomainXMLOption
> instance that is used inside the qemu driver. Thus while this does have
> a private copy of the function pointers. The virDomainXMLOption
> instance lives the whole time the virt drivers live, so it's only ever
> cleared in virStateCleanup, which in the daemons happens right before
> exit.
>
> Now looking at the formatter code (let's use your patch as an example):
>
> ---
>  src/conf/domain_conf.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 281846dfbe..74af08e584 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj,
>                                obj->deprecations[i]);
>      }
> =20
> -    if (xmlopt->privateData.format &&
> -        xmlopt->privateData.format(&buf, obj) < 0)
> -        return NULL;
> +    if (xmlopt->privateData.format) {
> +        if (xmlopt->privateData.format(&buf, obj) < 0)
>
> Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside
> of XMLopt you have only pointers that don't change.
>
> As of such the 'format' callback being NULL is only if something
> overwrote the data inside 'xmlopt', which we don't do anywhere because
> it doesn't make sense.
>
> The other possibility is that 'xmlopt' is used after free, thus was
> randomly overwritten, but that would be a much different kind of bug and
> your fix wouldn't fix that at all.
>
>
> > causes the XML to be saved without private information (such as
> > monitor path and qemuCaps), which is required for the Libvirtd
> > service to manage a VM during startup. As a result, after restarting
> > the Libvirtd service, a running VM that is being migrated previously
> > might escape management.
> >
> > Thus, when formatting the status XML for a running virtual machine,
> > we need to presume that the "privateData.format" interface is present.
> >
> > Add a thorough check in the virDomainObjSave path to make sure that
> > private data in the status XML file always exists for the running VM
> > so that we won't lose it after restart the Libvirtd service.
>
> Now this outlines the implications of losing the private data but
> doesn't really give any information how this happened.
>
> Can you please clarify:
>  1) What did you do to trigger this bug? What was the state of the host?
>  2) How did you figure out that what you claim happened?
>  3) Can you reproduce the bug?
>
> If you can reproduce the bug please make sure to collect debug logs of
> libvirtd, and potentially consider running the daemon inside e.g.
> valgrind to see if it reproduces with such heavy instrumentation.
>
>
1. Add a debuginfo print patch on Libvirtd, if privateData.format does not
exist, log it:

--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj,
         xmlopt->privateData.format(&buf, obj) < 0)
         return NULL;

+    if (!xmlopt->privateData.format) {
+        VIR_WARN("PrivateData formatter driver does not exist");
+    }
+

2. Add a private patch(Not the complete code):

    Stop process if malformed config found

    Malformed material may prevent the virtual machine from
    loading in its live state, according to the configuration
    XML. If such a scenario was found by the Libvirt service
    during startup, report an error and terminate.

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index ba5eb7ad1b..9a97fa63fd 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -951,7 +951,7 @@ qemuStateInitialize(bool privileged,
     /* Get all the running persistent or transient configs first */
     if (virDomainObjListLoadAllConfigs(qemu_driver->domains,
                                        cfg->stateDir,
-                                       NULL, true, false,
+                                       NULL, true, true,
                                        qemu_driver->xmlopt,
                                        NULL, NULL) < 0)

--- a/src/bhyve/bhyve_driver.c
+++ b/src/bhyve/bhyve_driver.c
@@ -1286,14 +1286,14 @@ bhyveStateInitialize(bool privileged,

     if (virDomainObjListLoadAllConfigs(bhyve_driver->domains,
                                        BHYVE_STATE_DIR,
-                                       NULL, true,
+                                       NULL, true, false,
                                        bhyve_driver->xmlopt,
                                        NULL, NULL) < 0)
         goto cleanup;

     if (virDomainObjListLoadAllConfigs(bhyve_driver->domains,
                                        BHYVE_CONFIG_DIR,
-                                       BHYVE_AUTOSTART_DIR, false,
+                                       BHYVE_AUTOSTART_DIR, false, false,
                                        bhyve_driver->xmlopt,
                                        NULL, NULL) < 0)

--- a/src/conf/virdomainobjlist.c
+++ b/src/conf/virdomainobjlist.c
@@ -584,6 +584,7 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms,
                                const char *configDir,
                                const char *autostartDir,
                                bool liveStatus,
+                               bool liveStatusErrorFatal,
                                virDomainXMLOptionPtr xmlopt,
                                virDomainLoadConfigNotify notify,
                                void *opaque)
@@ -630,9 +631,14 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr
doms,
             virDomainObjEndAPI(&dom);
         } else {
             VIR_ERROR(_("Failed to load config for domain '%s'"),
entry->d_name);
+            if (liveStatus && liveStatusErrorFatal) {
+                ret = -1;
+                goto end;
+            }
         }
     }

+end:
     VIR_DIR_CLOSE(dir);
     virObjectRWUnlock(doms);

diff --git a/src/conf/virdomainobjlist.h b/src/conf/virdomainobjlist.h
index 6150e13aa4..1ff3679c50 100644
--- a/src/conf/virdomainobjlist.h
+++ b/src/conf/virdomainobjlist.h
@@ -66,6 +66,7 @@ int virDomainObjListLoadAllConfigs(virDomainObjListPtr
doms,
                                    const char *configDir,
                                    const char *autostartDir,
                                    bool liveStatus,
+                                   bool liveStatusErrorFatal,
                                    virDomainXMLOptionPtr xmlopt,
                                    virDomainLoadConfigNotify notify,
                                    void *opaque);

...

3. Launch the migration and use "systemctl restart libvirt" to restart
Libvirtd
once after migration enters the perform phase.


4. Search the log message:

$ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does
not exist|remoteDispatchDomainMigratePerform3Params"
2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : Thread
1124258 (rpc-worker) is now running job
remoteDispatchDomainMigratePerform3Params
2025-09-22 03:06:12.517+0000: 1124258: debug :
remoteDispatchDomainMigratePerform3ParamsHelper:8804 :
server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010
rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0
2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 :
PrivateData formatter driver does not exist
2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 :
PrivateData formatter driver does not exist
2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 :
PrivateData formatter driver does not exist
2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 :
PrivateData formatter driver does not exist
2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 :
Thread 1124258 (rpc-worker) finished job
remoteDispatchDomainMigratePerform3Params with ret=-1

Migration rpc-worker routine print the message

5. Here we see the status XML:

<domstatus state='running' reason='migrated' pid='1092202'>
  <taint flag='custom-argv'/>
  <taint flag='custom-ga-command'/>
  <domain type='kvm' id='1' xmlns:qemu='
http://libvirt.org/schemas/domain/qemu/1.0'>
    <name>88030027-a5ce-4c19-aacc-5f1decf0b647</name>
    <uuid>6c677d6f-d140-48ca-b4ae-aff7fa7e4c50</uuid>
    <description>SmartX</description>
    <maxMemory slots='255' unit='KiB'>4194304000</maxMemory>
...
It does not contain private data.

6. Finally, in our case, Libvirtd service can not startup; And in the
upstream case, the behavior
    is that VM escapes from managing after startup logically.


-- 
Best regards

Reply via email to