Re: [libvirt] [PATCH] libxl: rework reference counting

2015-07-14 Thread Martin Kletzander

On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote:

Similar to commit 540c339a for the QEMU driver, rework reference
counting in the libxl driver to make it more deterministic and
the code a bit cleaner.

Signed-off-by: Jim Fehlig 
---

I've been testing this patch on and off for a few weeks now using
libvirt-tck domain tests, local test scripts, and some manual tests
for good measure.  I sent the patch to Anthony Perard (cc'd) nearly
two weeks ago for testing in his OpenStack+Xen+libvirt CI loop,
although I haven't received any feedback thus far.  Also included
Martin in the cc since he encouraged this patch



I'm sorry I didn't get to it earlier.  I'm not familiar with the libxl
driver, but I'll at least give it a try since this is on the list for
some time.


https://www.redhat.com/archives/libvir-list/2015-April/msg00014.html

src/libxl/libxl_domain.c|  32 ++
src/libxl/libxl_domain.h|   5 +-
src/libxl/libxl_driver.c| 274 ++--
src/libxl/libxl_migration.c |   6 +-
4 files changed, 128 insertions(+), 189 deletions(-)

diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c
index 0652270..ce188ee 100644
--- a/src/libxl/libxl_domain.c
+++ b/src/libxl/libxl_domain.c
@@ -294,6 +294,25 @@ libxlDomObjFromDomain(virDomainPtr dom)
return vm;
}

+/*
+ * Finish working with a domain object in an API.  This function
+ * clears whatever was left of a domain that was gathered using
+ * libxlDomObjFromDomain().  Currently that means only unlocking and
+ * decrementing the reference counter of that domain.  And in order to
+ * make sure the caller does not access the domain, the pointer is
+ * cleared.
+ */
+static void
+libxlDomObjEndAPI(virDomainObjPtr *vm)
+{


This function already exists as virDomainObjEndAPI().


+if (!*vm)
+return;
+
+virObjectUnlock(*vm);
+virObjectUnref(*vm);
+*vm = NULL;
+}
+
static int
libxlAutostartDomain(virDomainObjPtr vm,
 void *opaque)
@@ -2760,8 +2729,7 @@ libxlDomainUndefineFlags(virDomainPtr dom,

 cleanup:
VIR_FREE(name);
-if (vm)
-virObjectUnlock(vm);
+libxlDomObjEndAPI(&vm);


In this function the domain can leak because the reference you get by
libxlDomObjFromDomain() is not returned anywhere.  You might get to
this cleanup phase with a domain that has a reference count, but vm is
set to NULL.  virDomainObjListRemove() leaves the domain unlocked,
Unless I missed something this will leak.

Stream of Random thoughts follows.

Not setting 'vm = NULL' will help a bit, but in that case the domain
will be unlocked twice.  At first I thought maybe something similar to
how QEMU driver does that in qemuDomainRemoveInactive() might work.
But after looking at it for a while, I see this bug is in QEMU, the
domain will be unlocked twice there.  I need to send a fix for that
because now we are resetting a job when the domain is unlocked.  That
leads me to an idea.  Could we somehow make most of the things that
are so similar abstracted somehow?  That's not related to this patch
though.  But for example that would fix stuff like the fact that libxl
doesn't hold a job when removing the domain.


@@ -4963,23 +4915,20 @@ libxlDomainMigrateFinish3Params(virConnectPtr dconn,
   NULLSTR(dname));
return NULL;
}
+virObjectRef(vm);

-if (virDomainMigrateFinish3ParamsEnsureACL(dconn, vm->def) < 0) {
-virDomainObjEndAPI(&vm);
-return NULL;
-}
+if (virDomainMigrateFinish3ParamsEnsureACL(dconn, vm->def) < 0)
+goto cleanup;

-if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) {
-virDomainObjEndAPI(&vm);
-return NULL;
-}
+if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
+goto cleanup;

ret = libxlDomainMigrationFinish(dconn, vm, flags, cancelled);

-if (!libxlDomainObjEndJob(driver, vm))
-vm = NULL;
+libxlDomainObjEndJob(driver, vm);

-virDomainObjEndAPI(&vm);
+ cleanup:
+libxlDomObjEndAPI(&vm);



Wow, and it was called here, how funny :)  Although that might mean
something is wrong since you're not adding unref, but added ref on top
of this hunk.

Martin

P.S.: Now 'make check' fails for me due to false-positives about
mismatched ACL functions from migrate APIs, but that might be due to
poor rebase on top of master which I did.


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] libxl: rework reference counting

2015-06-17 Thread Jim Fehlig
Anthony PERARD wrote:
> On Tue, Jun 16, 2015 at 10:53:15AM +0100, Anthony PERARD wrote:
>   
>> On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote:
>> 
>>> Similar to commit 540c339a for the QEMU driver, rework reference
>>> counting in the libxl driver to make it more deterministic and
>>> the code a bit cleaner.
>>>
>>> Signed-off-by: Jim Fehlig 
>>> ---
>>>
>>> I've been testing this patch on and off for a few weeks now using
>>> libvirt-tck domain tests, local test scripts, and some manual tests
>>> for good measure.  I sent the patch to Anthony Perard (cc'd) nearly
>>> two weeks ago for testing in his OpenStack+Xen+libvirt CI loop,
>>> although I haven't received any feedback thus far.  Also included
>>> Martin in the cc since he encouraged this patch
>>>   
>> But I should be able to test the patch now.
>> 
>
> Hi, I gave this patch a try with OpenStack. I have applied the patch on top
> of libvirt 1.4.14 + (f86ae40 894d2ff 6dfec1e), which we are using for our CI.
>
> With the patch, libvirt works fine,

Thanks for testing it!

>  but I can still have the rare error
> "domain 205 device model: startup timed out".
>   

I would have been surprised if the patch fixed that mysterious problem.

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] libxl: rework reference counting

2015-06-17 Thread Anthony PERARD
On Tue, Jun 16, 2015 at 10:53:15AM +0100, Anthony PERARD wrote:
> On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote:
> > Similar to commit 540c339a for the QEMU driver, rework reference
> > counting in the libxl driver to make it more deterministic and
> > the code a bit cleaner.
> > 
> > Signed-off-by: Jim Fehlig 
> > ---
> > 
> > I've been testing this patch on and off for a few weeks now using
> > libvirt-tck domain tests, local test scripts, and some manual tests
> > for good measure.  I sent the patch to Anthony Perard (cc'd) nearly
> > two weeks ago for testing in his OpenStack+Xen+libvirt CI loop,
> > although I haven't received any feedback thus far.  Also included
> > Martin in the cc since he encouraged this patch
> 
> But I should be able to test the patch now.

Hi, I gave this patch a try with OpenStack. I have applied the patch on top
of libvirt 1.4.14 + (f86ae40 894d2ff 6dfec1e), which we are using for our CI.

With the patch, libvirt works fine, but I can still have the rare error
"domain 205 device model: startup timed out".

Thanks,

-- 
Anthony PERARD

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] libxl: rework reference counting

2015-06-16 Thread Anthony PERARD
On Mon, Jun 15, 2015 at 08:36:47PM -0600, Jim Fehlig wrote:
> Similar to commit 540c339a for the QEMU driver, rework reference
> counting in the libxl driver to make it more deterministic and
> the code a bit cleaner.
> 
> Signed-off-by: Jim Fehlig 
> ---
> 
> I've been testing this patch on and off for a few weeks now using
> libvirt-tck domain tests, local test scripts, and some manual tests
> for good measure.  I sent the patch to Anthony Perard (cc'd) nearly
> two weeks ago for testing in his OpenStack+Xen+libvirt CI loop,
> although I haven't received any feedback thus far.  Also included
> Martin in the cc since he encouraged this patch

Sorry, I could not really test the patch, since OpenStack master was broken
(deploying OpenStack with devstack). Also I don't want to apply patches
to the Xen Project OpenStack CI loop just to test them.

But I should be able to test the patch now.

-- 
Anthony PERARD

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] libxl: rework reference counting

2015-06-15 Thread Jim Fehlig
Similar to commit 540c339a for the QEMU driver, rework reference
counting in the libxl driver to make it more deterministic and
the code a bit cleaner.

Signed-off-by: Jim Fehlig 
---

I've been testing this patch on and off for a few weeks now using
libvirt-tck domain tests, local test scripts, and some manual tests
for good measure.  I sent the patch to Anthony Perard (cc'd) nearly
two weeks ago for testing in his OpenStack+Xen+libvirt CI loop,
although I haven't received any feedback thus far.  Also included
Martin in the cc since he encouraged this patch

https://www.redhat.com/archives/libvir-list/2015-April/msg00014.html

 src/libxl/libxl_domain.c|  32 ++
 src/libxl/libxl_domain.h|   5 +-
 src/libxl/libxl_driver.c| 274 ++--
 src/libxl/libxl_migration.c |   6 +-
 4 files changed, 128 insertions(+), 189 deletions(-)

diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c
index 0652270..ce188ee 100644
--- a/src/libxl/libxl_domain.c
+++ b/src/libxl/libxl_domain.c
@@ -96,13 +96,12 @@ libxlDomainObjFreeJob(libxlDomainObjPrivatePtr priv)
 #define LIBXL_JOB_WAIT_TIME (1000ull * 30)
 
 /*
- * obj must be locked before calling, libxlDriverPrivatePtr must NOT be locked
+ * obj must be locked before calling
  *
  * This must be called by anything that will change the VM state
- * in any way
+ * in any way.
  *
- * Upon successful return, the object will have its ref count increased,
- * successful calls must be followed by EndJob eventually
+ * Successful calls must eventually result in a call to EndJob.
  */
 int
 libxlDomainObjBeginJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED,
@@ -117,8 +116,6 @@ libxlDomainObjBeginJob(libxlDriverPrivatePtr driver 
ATTRIBUTE_UNUSED,
 return -1;
 then = now + LIBXL_JOB_WAIT_TIME;
 
-virObjectRef(obj);
-
 while (priv->job.active) {
 VIR_DEBUG("Wait normal job condition for starting job: %s",
   libxlDomainJobTypeToString(job));
@@ -149,21 +146,16 @@ libxlDomainObjBeginJob(libxlDriverPrivatePtr driver 
ATTRIBUTE_UNUSED,
 virReportSystemError(errno,
  "%s", _("cannot acquire job mutex"));
 
-virObjectUnref(obj);
 return -1;
 }
 
 /*
- * obj must be locked before calling
+ * obj must be locked and have a reference before calling
  *
  * To be called after completing the work associated with the
  * earlier libxlDomainBeginJob() call
- *
- * Returns true if the remaining reference count on obj is
- * non-zero, false if the reference count has dropped to zero
- * and obj is disposed.
  */
-bool
+void
 libxlDomainObjEndJob(libxlDriverPrivatePtr driver ATTRIBUTE_UNUSED,
  virDomainObjPtr obj)
 {
@@ -175,8 +167,6 @@ libxlDomainObjEndJob(libxlDriverPrivatePtr driver 
ATTRIBUTE_UNUSED,
 
 libxlDomainObjResetJob(priv);
 virCondSignal(&priv->job.cond);
-
-return virObjectUnref(obj);
 }
 
 static void *
@@ -485,12 +475,11 @@ libxlDomainShutdownThread(void *opaque)
 }
 
  endjob:
-if (!libxlDomainObjEndJob(driver, vm))
-vm = NULL;
+libxlDomainObjEndJob(driver, vm);
 
  cleanup:
-if (vm)
-virObjectUnlock(vm);
+virObjectUnlock(vm);
+virObjectUnref(vm);
 if (dom_event)
 libxlDomainEventQueue(driver, dom_event);
 libxl_event_free(cfg->ctx, ev);
@@ -528,6 +517,7 @@ libxlDomainEventHandler(void *data, VIR_LIBXL_EVENT_CONST 
libxl_event *event)
 VIR_INFO("Received event for unknown domain ID %d", event->domid);
 goto error;
 }
+virObjectRef(vm);
 
 /*
  * Start a thread to handle shutdown.  We don't want to be tying up
@@ -558,8 +548,10 @@ libxlDomainEventHandler(void *data, VIR_LIBXL_EVENT_CONST 
libxl_event *event)
 /* Cast away any const */
 libxl_event_free(cfg->ctx, (libxl_event *)event);
 virObjectUnref(cfg);
-if (vm)
+if (vm) {
 virObjectUnlock(vm);
+virObjectUnref(vm);
+}
 VIR_FREE(shutdown_info);
 }
 
diff --git a/src/libxl/libxl_domain.h b/src/libxl/libxl_domain.h
index 8c73cc4..714ed91 100644
--- a/src/libxl/libxl_domain.h
+++ b/src/libxl/libxl_domain.h
@@ -83,10 +83,9 @@ libxlDomainObjBeginJob(libxlDriverPrivatePtr driver,
enum libxlDomainJob job)
 ATTRIBUTE_RETURN_CHECK;
 
-bool
+void
 libxlDomainObjEndJob(libxlDriverPrivatePtr driver,
- virDomainObjPtr obj)
-ATTRIBUTE_RETURN_CHECK;
+ virDomainObjPtr obj);
 
 void
 libxlDomainEventQueue(libxlDriverPrivatePtr driver,
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
index a7be745..c0061b3 100644
--- a/src/libxl/libxl_driver.c
+++ b/src/libxl/libxl_driver.c
@@ -282,7 +282,7 @@ libxlDomObjFromDomain(virDomainPtr dom)
 libxlDriverPrivatePtr driver = dom->conn->privateData;
 char uuidstr[VIR_UUID_STRING_BUFLEN];
 
-vm = virDomainObjListFindByUUID(driver->domains, dom->uuid);
+vm = virDomainObjListFindByUUIDRef(driver->domai