On Tue, Jan 25, 2011 at 02:43:43PM +0800, Wen Congyang wrote: > The reason of libvirtd cores dump is that: > We add vm->refs when we alloc the memory, and decrease it > in the function qemuHandleMonitorEOF() in other thread. > > We add vm->refs in the function qemuConnectMonitor() and > decrease it when the vm is inactive. > > The libvirtd will block in the function qemuMonitorSetCapabilities() > because the vm is stopped by signal SIGSTOP. Now the vm->refs is 2. > > Then we kill the vm by signal SIGKILL. The function > qemuMonitorSetCapabilities() failed, and then we will decrease vm->refs > in the function qemuMonitorClose(). > In another thread, mon->fd is broken and the function > qemuHandleMonitorEOF() is called. > > If qemuHandleMonitorEOF() decreases vm->refs before qemuConnectMonitor() > returns, vm->refs will be decrease to 0 and the memory is freed. > > We will call qemudShutdownVMDaemon() as qemuConnectMonitor() failed. > The memory has been freed, so qemudShutdownVMDaemon() is too dangerous. > > We will reference NULL pointer in the function > virDomainConfVMNWFilterTeardown(): > ============= > void > virDomainConfVMNWFilterTeardown(virDomainObjPtr vm) { > int i; > > if (nwfilterDriver != NULL) { > for (i = 0; i < vm->def->nnets; i++) > virDomainConfNWFilterTeardown(vm->def->nets[i]); > } > } > ============ > vm->def->nnets is not 0 but vm->def->nets is NULL(We don't set vm->def->nnets > to 0 when we free vm). > > We should add an extra reference of vm to avoid vm to be deleted if > qemuConnectMonitor() failed. > > Signed-off-by: Wen Congyang <we...@cn.fujitsu.com> > > --- > src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++--------- > 1 files changed, 22 insertions(+), 9 deletions(-) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 34cc29f..613c2d4 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -930,6 +930,10 @@ qemuReconnectDomain(void *payload, const char *name > ATTRIBUTE_UNUSED, void *opaq > > priv = obj->privateData; > > + /* Hold an extra reference because we can't allow 'vm' to be > + * deleted if qemuConnectMonitor() failed */ > + virDomainObjRef(obj); > + > /* XXX check PID liveliness & EXE path */ > if (qemuConnectMonitor(driver, obj) < 0) > goto error; > @@ -961,18 +965,27 @@ qemuReconnectDomain(void *payload, const char *name > ATTRIBUTE_UNUSED, void *opaq > if (obj->def->id >= driver->nextvmid) > driver->nextvmid = obj->def->id + 1; > > - virDomainObjUnlock(obj); > + if (virDomainObjUnref(obj) > 0) > + virDomainObjUnlock(obj); > return; > > error: > - /* We can't get the monitor back, so must kill the VM > - * to remove danger of it ending up running twice if > - * user tries to start it again later */ > - qemudShutdownVMDaemon(driver, obj, 0); > - if (!obj->persistent) > - virDomainRemoveInactive(&driver->domains, obj); > - else > - virDomainObjUnlock(obj); > + if (!virDomainObjIsActive(obj)) { > + if (virDomainObjUnref(obj) > 0) > + virDomainObjUnlock(obj); > + return; > + } > + > + if (virDomainObjUnref(obj) > 0) { > + /* We can't get the monitor back, so must kill the VM > + * to remove danger of it ending up running twice if > + * user tries to start it again later */ > + qemudShutdownVMDaemon(driver, obj, 0); > + if (!obj->persistent) > + virDomainRemoveInactive(&driver->domains, obj); > + else > + virDomainObjUnlock(obj); > + } > }
On closer examination I see why this change is required. Normally we would be doing qemuDomainObjBeginJob before doing anything with the monitor and that grabs an extra reference. ACK Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list