Re: [libvirt] [PATCH] libxl: rework reference counting
On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote: Similar to commit 540c339a for the QEMU driver, rework reference counting in the libxl driver to make it more deterministic and the code a bit cleaner. Signed-off-by: Jim Fehlig --- I've been testing this patch on and off for a few weeks now using libvirt-tck domain tests, local test scripts, and some manual tests for good measure. I sent the patch to Anthony Perard (cc'd) nearly two weeks ago for testing in his OpenStack+Xen+libvirt CI loop, although I haven't received any feedback thus far. Also included Martin in the cc since he encouraged this patch I'm sorry I didn't get to it earlier. I'm not familiar with the libxl driver, but I'll at least give it a try since this is on the list for some time. https://www.redhat.com/archives/libvir-list/2015-April/msg00014.html src/libxl/libxl_domain.c| 32 ++ src/libxl/libxl_domain.h| 5 +- src/libxl/libxl_driver.c| 274 ++-- src/libxl/libxl_migration.c | 6 +- 4 files changed, 128 insertions(+), 189 deletions(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index 0652270..ce188ee 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -294,6 +294,25 @@ libxlDomObjFromDomain(virDomainPtr dom) return vm; } +/* + * Finish working with a domain object in an API. This function + * clears whatever was left of a domain that was gathered using + * libxlDomObjFromDomain(). Currently that means only unlocking and + * decrementing the reference counter of that domain. And in order to + * make sure the caller does not access the domain, the pointer is + * cleared. + */ +static void +libxlDomObjEndAPI(virDomainObjPtr *vm) +{ This function already exists as virDomainObjEndAPI(). +if (!*vm) +return; + +virObjectUnlock(*vm); +virObjectUnref(*vm); +*vm = NULL; +} + static int libxlAutostartDomain(virDomainObjPtr vm, void *opaque) @@ -2760,8 +2729,7 @@ libxlDomainUndefineFlags(virDomainPtr dom, cleanup: VIR_FREE(name); -if (vm) -virObjectUnlock(vm); +libxlDomObjEndAPI(&vm); In this function the domain can leak because the reference you get by libxlDomObjFromDomain() is not returned anywhere. You might get to this cleanup phase with a domain that has a reference count, but vm is set to NULL. virDomainObjListRemove() leaves the domain unlocked, Unless I missed something this will leak. Stream of Random thoughts follows. Not setting 'vm = NULL' will help a bit, but in that case the domain will be unlocked twice. At first I thought maybe something similar to how QEMU driver does that in qemuDomainRemoveInactive() might work. But after looking at it for a while, I see this bug is in QEMU, the domain will be unlocked twice there. I need to send a fix for that because now we are resetting a job when the domain is unlocked. That leads me to an idea. Could we somehow make most of the things that are so similar abstracted somehow? That's not related to this patch though. But for example that would fix stuff like the fact that libxl doesn't hold a job when removing the domain. @@ -4963,23 +4915,20 @@ libxlDomainMigrateFinish3Params(virConnectPtr dconn, NULLSTR(dname)); return NULL; } +virObjectRef(vm); -if (virDomainMigrateFinish3ParamsEnsureACL(dconn, vm->def) < 0) { -virDomainObjEndAPI(&vm); -return NULL; -} +if (virDomainMigrateFinish3ParamsEnsureACL(dconn, vm->def) < 0) +goto cleanup; -if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) { -virDomainObjEndAPI(&vm); -return NULL; -} +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) +goto cleanup; ret = libxlDomainMigrationFinish(dconn, vm, flags, cancelled); -if (!libxlDomainObjEndJob(driver, vm)) -vm = NULL; +libxlDomainObjEndJob(driver, vm); -virDomainObjEndAPI(&vm); + cleanup: +libxlDomObjEndAPI(&vm); Wow, and it was called here, how funny :) Although that might mean something is wrong since you're not adding unref, but added ref on top of this hunk. Martin P.S.: Now 'make check' fails for me due to false-positives about mismatched ACL functions from migrate APIs, but that might be due to poor rebase on top of master which I did. signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] libxl: rework reference counting
Anthony PERARD wrote: > On Tue, Jun 16, 2015 at 10:53:15AM +0100, Anthony PERARD wrote: > >> On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote: >> >>> Similar to commit 540c339a for the QEMU driver, rework reference >>> counting in the libxl driver to make it more deterministic and >>> the code a bit cleaner. >>> >>> Signed-off-by: Jim Fehlig >>> --- >>> >>> I've been testing this patch on and off for a few weeks now using >>> libvirt-tck domain tests, local test scripts, and some manual tests >>> for good measure. I sent the patch to Anthony Perard (cc'd) nearly >>> two weeks ago for testing in his OpenStack+Xen+libvirt CI loop, >>> although I haven't received any feedback thus far. Also included >>> Martin in the cc since he encouraged this patch >>> >> But I should be able to test the patch now. >> > > Hi, I gave this patch a try with OpenStack. I have applied the patch on top > of libvirt 1.4.14 + (f86ae40 894d2ff 6dfec1e), which we are using for our CI. > > With the patch, libvirt works fine, Thanks for testing it! > but I can still have the rare error > "domain 205 device model: startup timed out". > I would have been surprised if the patch fixed that mysterious problem. Regards, Jim -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] libxl: rework reference counting
On Tue, Jun 16, 2015 at 10:53:15AM +0100, Anthony PERARD wrote: > On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote: > > Similar to commit 540c339a for the QEMU driver, rework reference > > counting in the libxl driver to make it more deterministic and > > the code a bit cleaner. > > > > Signed-off-by: Jim Fehlig > > --- > > > > I've been testing this patch on and off for a few weeks now using > > libvirt-tck domain tests, local test scripts, and some manual tests > > for good measure. I sent the patch to Anthony Perard (cc'd) nearly > > two weeks ago for testing in his OpenStack+Xen+libvirt CI loop, > > although I haven't received any feedback thus far. Also included > > Martin in the cc since he encouraged this patch > > But I should be able to test the patch now. Hi, I gave this patch a try with OpenStack. I have applied the patch on top of libvirt 1.4.14 + (f86ae40 894d2ff 6dfec1e), which we are using for our CI. With the patch, libvirt works fine, but I can still have the rare error "domain 205 device model: startup timed out". Thanks, -- Anthony PERARD -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] libxl: rework reference counting
On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote: > Similar to commit 540c339a for the QEMU driver, rework reference > counting in the libxl driver to make it more deterministic and > the code a bit cleaner. > > Signed-off-by: Jim Fehlig > --- > > I've been testing this patch on and off for a few weeks now using > libvirt-tck domain tests, local test scripts, and some manual tests > for good measure. I sent the patch to Anthony Perard (cc'd) nearly > two weeks ago for testing in his OpenStack+Xen+libvirt CI loop, > although I haven't received any feedback thus far. Also included > Martin in the cc since he encouraged this patch Sorry, I could not really test the patch, since OpenStack master was broken (deploying OpenStack with devstack). Also I don't want to apply patches to the Xen Project OpenStack CI loop just to test them. But I should be able to test the patch now. -- Anthony PERARD -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] libxl: rework reference counting
Similar to commit 540c339a for the QEMU driver, rework reference counting in the libxl driver to make it more deterministic and the code a bit cleaner. Signed-off-by: Jim Fehlig --- I've been testing this patch on and off for a few weeks now using libvirt-tck domain tests, local test scripts, and some manual tests for good measure. I sent the patch to Anthony Perard (cc'd) nearly two weeks ago for testing in his OpenStack+Xen+libvirt CI loop, although I haven't received any feedback thus far. Also included Martin in the cc since he encouraged this patch https://www.redhat.com/archives/libvir-list/2015-April/msg00014.html src/libxl/libxl_domain.c| 32 ++ src/libxl/libxl_domain.h| 5 +- src/libxl/libxl_driver.c| 274 ++-- src/libxl/libxl_migration.c | 6 +- 4 files changed, 128 insertions(+), 189 deletions(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index 0652270..ce188ee 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -96,13 +96,12 @@ libxlDomainObjFreeJob(libxlDomainObjPrivatePtr priv) #define LIBXL_JOB_WAIT_TIME (1000ull * 30) /* - * obj must be locked before calling, libxlDriverPrivatePtr must NOT be locked + * obj must be locked before calling * * This must be called by anything that will change the VM state - * in any way + * in any way. * - * Upon successful return, the object will have its ref count increased, - * successful calls must be followed by EndJob eventually + * Successful calls must eventually result in a call to EndJob. */ int libxlDomainObjBeginJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED, @@ -117,8 +116,6 @@ libxlDomainObjBeginJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED, return -1; then = now + LIBXL_JOB_WAIT_TIME; -virObjectRef(obj); - while (priv->job.active) { VIR_DEBUG("Wait normal job condition for starting job: %s", libxlDomainJobTypeToString(job)); @@ -149,21 +146,16 @@ libxlDomainObjBeginJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED, virReportSystemError(errno, "%s", _("cannot acquire job mutex")); -virObjectUnref(obj); return -1; } /* - * obj must be locked before calling + * obj must be locked and have a reference before calling * * To be called after completing the work associated with the * earlier libxlDomainBeginJob() call - * - * Returns true if the remaining reference count on obj is - * non-zero, false if the reference count has dropped to zero - * and obj is disposed. */ -bool +void libxlDomainObjEndJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED, virDomainObjPtr obj) { @@ -175,8 +167,6 @@ libxlDomainObjEndJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED, libxlDomainObjResetJob(priv); virCondSignal(&priv->job.cond); - -return virObjectUnref(obj); } static void * @@ -485,12 +475,11 @@ libxlDomainShutdownThread(void *opaque) } endjob: -if (!libxlDomainObjEndJob(driver, vm)) -vm = NULL; +libxlDomainObjEndJob(driver, vm); cleanup: -if (vm) -virObjectUnlock(vm); +virObjectUnlock(vm); +virObjectUnref(vm); if (dom_event) libxlDomainEventQueue(driver, dom_event); libxl_event_free(cfg->ctx, ev); @@ -528,6 +517,7 @@ libxlDomainEventHandler(void *data, VIR_LIBXL_EVENT_CONST libxl_event *event) VIR_INFO("Received event for unknown domain ID %d", event->domid); goto error; } +virObjectRef(vm); /* * Start a thread to handle shutdown. We don't want to be tying up @@ -558,8 +548,10 @@ libxlDomainEventHandler(void *data, VIR_LIBXL_EVENT_CONST libxl_event *event) /* Cast away any const */ libxl_event_free(cfg->ctx, (libxl_event *)event); virObjectUnref(cfg); -if (vm) +if (vm) { virObjectUnlock(vm); +virObjectUnref(vm); +} VIR_FREE(shutdown_info); } diff --git a/src/libxl/libxl_domain.h b/src/libxl/libxl_domain.h index 8c73cc4..714ed91 100644 --- a/src/libxl/libxl_domain.h +++ b/src/libxl/libxl_domain.h @@ -83,10 +83,9 @@ libxlDomainObjBeginJob(libxlDriverPrivatePtr driver, enum libxlDomainJob job) ATTRIBUTE_RETURN_CHECK; -bool +void libxlDomainObjEndJob(libxlDriverPrivatePtr driver, - virDomainObjPtr obj) -ATTRIBUTE_RETURN_CHECK; + virDomainObjPtr obj); void libxlDomainEventQueue(libxlDriverPrivatePtr driver, diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a7be745..c0061b3 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -282,7 +282,7 @@ libxlDomObjFromDomain(virDomainPtr dom) libxlDriverPrivatePtr driver = dom->conn->privateData; char uuidstr[VIR_UUID_STRING_BUFLEN]; -vm = virDomainObjListFindByUUID(driver->domains, dom->uuid); +vm = virDomainObjListFindByUUIDRef(driver->domai