On Mon, Sep 22, 2025 at 4:28 PM Peter Krempa <[email protected]> wrote:
> On Mon, Sep 22, 2025 at 16:15:47 +0800, Yong Huang wrote: > > On Mon, Sep 22, 2025 at 2:59 PM Peter Krempa <[email protected]> wrote: > > > > > On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote: > > > > On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <[email protected]> > wrote: > > > > > > > > > On Fri, Sep 19, 2025 at 17:09:07 +0800, [email protected] > wrote: > > > > > > From: Hyman Huang <[email protected]> > > [...] > > > > > 3. Launch the migration and use "systemctl restart libvirt" to > restart > > > > Libvirtd > > > > once after migration enters the perform phase. > > > > > > [...] > > > > > > Okay so my understanding from your description is that an (early > > > startup) failure in virDomainObjListLoadAllConfigs() (and surrounding > > > code) can result in the daemon shutting down before the threads > handling > > > the already loaded(? ... impossible to tell with the abbreviated log > > > below) domains terminate? Right? > > > > > > > Yes, in our productized Libvirt, an early failure > > in virDomainObjListLoadAllConfigs() > > > > can result in the daemon shutting down. > > > > In the upstream Libvirt, the daemon started up successfully but failed to > > manage the VM > > > > (The virDomainObjListLoadAllConfigs returns an error since > > the missing private data > > > > in status XML). > > So I assume a non-upstream version. What is it based on? What else did > you change? > > > > > Thus the other threads trigger a use-after-free on the driver object? > > > > > > Anyways I think it's clear now that just checking if the callbacks are > > > present doesn't make sense. > > > > > > Additionally there's now an upstream issue > > > https://gitlab.com/libvirt/libvirt/-/issues/814 > > > which seems to claim a use-after-free on a different code path but > still > > > triggered by the cleanup code freeing private data. > > > > > > Unfortunately I didn't get any logs or backtrace there either. > > > > > > I'll look into the shutdown code path and see if I can figure it out. > > > > > > > > > > > > > > > 4. Search the log message: > > > > > > > > $ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver > does > > > > not exist|remoteDispatchDomainMigratePerform3Params" > > > > 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : > > > Thread > > > > 1124258 (rpc-worker) is now running job > > > > remoteDispatchDomainMigratePerform3Params > > > > > > > This log indicate that 1124258 thread now execute > > the remoteDispatchDomainMigratePerform3Params > > > > > 2025-09-22 03:06:12.517+0000: 1124258: debug : > > > > remoteDispatchDomainMigratePerform3ParamsHelper:8804 : > > > > server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 > > > > rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 > > > > 2025-09-22 03:06:21.959+0000: 1124258: warning : > > > virDomainObjFormat:30190 : > > > > PrivateData formatter driver does not exist > > > > > > > In the execution path of remoteDispatchDomainMigratePerform3Params, it > > enters the code and the > > > > warning message is logged, while the following warning message is never > > logged in a successful migration: > > > > + if (!xmlopt->privateData.format) { > > + VIR_WARN("PrivateData formatter driver does not exist"); > > + } > > > > The following info shows the backtrace of virDomainObjFormat in an > > successful migration: > > Successful, meaning you didn't hit the bug? > > > #0 virDomainObjFormat (obj=obj@entry=0x7fa3342598e0, > > xmlopt=0x7fa3341c54b0, flags=flags@entry=313) at > > ../../src/conf/domain_conf.c:30166 > > #1 0x00007fa395ae8684 in virDomainObjSave (obj=obj@entry > =0x7fa3342598e0, > > xmlopt=<optimized out>, statusDir=0x7fa33412aec0 "/run/libvirt/qemu") at > > ../../src/conf/domain_conf.c:30375 > > [...] > > > I asked for a backtrace of all threads as I want to see what the other > threads are doing during the shutdown. > > > > > 2025-09-22 03:06:25.141+0000: 1124258: warning : > > > virDomainObjFormat:30190 : > > > > PrivateData formatter driver does not exist > > > > 2025-09-22 03:06:25.141+0000: 1124258: warning : > > > virDomainObjFormat:30190 : > > > > PrivateData formatter driver does not exist > > > > 2025-09-22 03:06:25.153+0000: 1124258: warning : > > > virDomainObjFormat:30190 : > > > > PrivateData formatter driver does not exist > > > > > 2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 : > > > > Thread 1124258 (rpc-worker) finished job > > > > remoteDispatchDomainMigratePerform3Params with ret=-1 > > > > > > This log is so abbreviated that it's useless. Please post the full > thing > > > somewhere. > > > > > > > :( Since we focus on the shutdown code of Libvirtd, getting the backtrace > > is not easy, so I added > > the debug patch. > > > > > > > > > > Additionally if you can reproduce this without the patch I'd be > > > interested in that log as well. > > > > > > Yes, I reproduce this with Libvirt 6.2.0, the latest version in the > > upstream uses the same logic and > > I assume that it also has this issue and reproducing is not that hard. > > So, can you hit this problem with current upstream code? > > While the logic for formatting data is the same, it's not actually the > problem. The problem is in the shutdown logic and I do remember some > changes in the code. Especially if you're claiming to use libvirt-6.2 > which is 5 years old at this point. > > Ok, in case of focusing on a nonexistent issue in the upstream code, I'll try to reproduce this and reply to you once I get the result. Thanks for the reply. -- Best regards
