Michal Privoznik wrote: > On 07.02.2014 04:53, Jim Fehlig wrote: >> Large balloon operation can be time consuming. Use the recently >> added job functions and unlock the virDomainObj while ballooning. >> >> Signed-off-by: Jim Fehlig <jfeh...@suse.com> >> --- >> src/libxl/libxl_driver.c | 32 ++++++++++++++++++++++---------- >> 1 file changed, 22 insertions(+), 10 deletions(-) >> >> diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c >> index 7c964c5..4f333bd 100644 >> --- a/src/libxl/libxl_driver.c >> +++ b/src/libxl/libxl_driver.c >> @@ -1646,6 +1646,9 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, >> unsigned long newmem, >> if (virDomainSetMemoryFlagsEnsureACL(dom->conn, vm->def, flags) >> < 0) >> goto cleanup; >> >> + if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) >> + goto cleanup; >> + >> isActive = virDomainObjIsActive(vm); > > Correct, domain needs to be checked if still alive after job was > successfully started. > >> >> if (flags == VIR_DOMAIN_MEM_CURRENT) { >> @@ -1664,19 +1667,19 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, >> unsigned long newmem, >> if (!isActive && (flags & VIR_DOMAIN_MEM_LIVE)) { >> virReportError(VIR_ERR_OPERATION_INVALID, "%s", >> _("cannot set memory on an inactive domain")); >> - goto cleanup; >> + goto endjob; >> } >> >> if (flags & VIR_DOMAIN_MEM_CONFIG) { >> if (!vm->persistent) { >> virReportError(VIR_ERR_OPERATION_INVALID, "%s", >> _("cannot change persistent config of a >> transient domain")); >> - goto cleanup; >> + goto endjob; >> } >> if (!(persistentDef = virDomainObjGetPersistentDef(cfg->caps, >> >> driver->xmlopt, >> vm))) >> - goto cleanup; >> + goto endjob; >> } >> >> if (flags & VIR_DOMAIN_MEM_MAXIMUM) { >> @@ -1688,7 +1691,7 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, >> unsigned long newmem, >> virReportError(VIR_ERR_INTERNAL_ERROR, >> _("Failed to set maximum memory for >> domain '%d'" >> " with libxenlight"), dom->id); >> - goto cleanup; >> + goto endjob; >> } >> } >> >> @@ -1699,7 +1702,7 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, >> unsigned long newmem, >> if (persistentDef->mem.cur_balloon > newmem) >> persistentDef->mem.cur_balloon = newmem; >> ret = virDomainSaveConfig(cfg->configDir, persistentDef); >> - goto cleanup; >> + goto endjob; >> } >> >> } else { >> @@ -1708,17 +1711,23 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, >> unsigned long newmem, >> if (newmem > vm->def->mem.max_balloon) { >> virReportError(VIR_ERR_INVALID_ARG, "%s", >> _("cannot set memory higher than max >> memory")); >> - goto cleanup; >> + goto endjob; >> } >> >> if (flags & VIR_DOMAIN_MEM_LIVE) { >> + int res; >> + >> priv = vm->privateData; >> - if (libxl_set_memory_target(priv->ctx, dom->id, newmem, 0, >> - /* force */ 1) < 0) { >> + /* Unlock virDomainObj while ballooning memory */ >> + virObjectUnlock(vm); > > 1: ^^ > >> + res = libxl_set_memory_target(priv->ctx, dom->id, >> newmem, 0, >> + /* force */ 1); >> + virObjectLock(vm); > > 2: ^^ > >> + if (res < 0) { >> virReportError(VIR_ERR_INTERNAL_ERROR, >> _("Failed to set memory for domain >> '%d'" >> " with libxenlight"), dom->id); >> - goto cleanup; >> + goto endjob; >> } >> } >> >> @@ -1726,12 +1735,15 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, >> unsigned long newmem, >> sa_assert(persistentDef); >> persistentDef->mem.cur_balloon = newmem; >> ret = virDomainSaveConfig(cfg->configDir, persistentDef); >> - goto cleanup; >> + goto endjob; >> } >> } >> >> ret = 0; >> >> +endjob: >> + libxlDomainObjEndJob(driver, vm); >> + > > Hm. I wonder if we should rather check for libxlDomainObjEndJob return > value. Currently, as you unlock the domain at [1] another thread > waiting on the domain lock may come and lock it for itself. Then we > will wait at [2] until the domain is unlocked again. This is possibly > dangerous if the other thread was executing libxlDomainDestroyFlags(). > If that's the case libxlDomainObjEndJob shall return false, as we held > the last reference to @vm. But the memory @vm points to has been > already freed (in the EndJob()) ...
Thanks for the review and finding a bug! I've managed to cause a crash by attempting to concurrently save the same transient domain. In this case, one thread starts the job, saves the domain, and it is destroyed. The other thread then runs the job, fails, ends the job, and kaboom when unlocking the vm. Another twist of what you describe. > >> cleanup: >> if (vm) >> virObjectUnlock(vm); >> > > ... what means we are playing with fire here (SIGSEGV). > > Therefore I think we should check for libxlDomainObjEndJob() return > value: > > endjob: > if (!libxlDomainObjEndJob(driver, vm)) > vm = NULL; Yes, agreed. I'll rework the cleanup code in this series. > > BTW: the domain is unlocked and locked in libxlDomainObjBeginJob() > too. Yes, it's for a very short time, but it may be sufficient to > another thread hop in and do something bad. This reveals even more > interesting bug: > > libxlVmCleanup() sets vm->dom->id = -1 only for persistent domains. So > if the SetMemoryFlags() is running over a transient domain, after job > was successfully acquired, virDomainObjIsActive() will return true. > And since libxlVmCleanup() replaces vm->def with vm->newDef, the > SetMemoryFlags() will continue working on (in general) completely > different domain definition than the one when the API started. Sigh. Hrm, seems the domid should be unconditionally set to -1 once it is shutdown. But I'm not sure if that solves the general case you describe. Perhaps it will with fixes to patch5. Regards, Jim -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list