Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
On Tue, Jan 26, 2016 at 01:31:25PM +0300, Maxim Nestratov wrote: 26.01.2016 12:29, Martin Kletzander пишет: On Mon, Jan 25, 2016 at 12:16:12PM +0300, Maxim Nestratov wrote: A pretty nasty deadlock occurs while trying to rename a VM in parallel with virDomainObjListNumOfDomains. The short description of the problem is as follows: Thread #1: qemuDomainRename: --> aquires domain lock by qemuDomObjFromDomain -> waits for domain list lock in any of the listed functions: - virDomainObjListFindByName - virDomainObjListRenameAddNew - virDomainObjListRenameRemove Thread #2: virDomainObjListNumOfDomains: --> aquires domain list lock -> waits for domain lock in virDomainObjListCount Ouch, the rename API should hold the list lock all the time it's doing something and not acquire it when any domain is locked. But you are duplicating a lot of code, Not that much I meant to edit that out after reading the whole e-mail, my apologies. The patch establishes a single order of taking locks: driver->domains list first, then a particular VM. This is done by implementing a set of virDomainObjListXxxLocked functions working with driver->domains that assume that list lock is already aquired by calling functions. Signed-off-by: Maxim Nestratov--- src/conf/virdomainobjlist.c | 74 + src/conf/virdomainobjlist.h | 9 ++ src/libvirt_private.syms| 4 +++ src/qemu/qemu_driver.c | 40 +--- 4 files changed, 110 insertions(+), 17 deletions(-) diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c index 7568f93..89e28a8 100644 --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -132,21 +132,19 @@ virDomainObjPtr virDomainObjListFindByID(virDomainObjListPtr doms, static virDomainObjPtr -virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, +virDomainObjListFindByUUIDInternalLocked(virDomainObjListPtr doms, const unsigned char *uuid, bool ref) Indentation is off for renamed functions. Ok. I'll fix it if we decide to go on with my approach. { char uuidstr[VIR_UUID_STRING_BUFLEN]; virDomainObjPtr obj; -virObjectLock(doms); virUUIDFormat(uuid, uuidstr); obj = virHashLookup(doms->objs, uuidstr); -if (ref) { +if (ref) virObjectRef(obj); -virObjectUnlock(doms); -} + if (obj) { virObjectLock(obj); if (obj->removing) { @@ -156,8 +154,19 @@ virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, obj = NULL; } } -if (!ref) -virObjectUnlock(doms); +return obj; +} + +static virDomainObjPtr +virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, + const unsigned char *uuid, + bool ref) +{ +virDomainObjPtr obj; + +virObjectLock(doms); +obj = virDomainObjListFindByUUIDInternalLocked(doms, uuid, ref); +virObjectUnlock(doms); This won't work for ref == true as in that case the unlock is not the last thing done in virDomainObjListFindByUUIDInternalLocked(). I still think that this won't cause any problem as far as list lock is aquired. Otherwise we would have seen them in virDomainObjListNumOfDomains That's kind of why there's the difference between ref=true and ref=false. The thing is that ref=false is prone to deadlocks even though they are not as easy to reproduce. We'll see what the decision will be. Martin signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] virsh: improve waiting for block job readiness
On Mon, 25 Jan 2016, John Ferlan wrote: Since this has been sitting for a bit, I'll poke the bear, but also CC Peter since he reviewed V1 and had some specific concerns... Thanks for picking up this review. The concern before was how the code would handle the user passing a different, equivalent path when querying the block job. I'm pretty sure we came to the conclusion that that wouldn't ever have worked: the block job wouldn't be able to be identified if the path was different. On 01/06/2016 10:03 PM, Michael Chapman wrote: After a block job hits 100%, we only need to apply a timeout waiting for a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2 callbacks were able to be registered. If neither callback could be registered, there's clearly no need for a timeout. If both callbacks were registered, then we're guaranteed to eventually get one of the events. The path being used by virsh must be exactly the source path or target device in the domain's disk definition, and these are the respective strings sent back in these two events. Signed-off-by: Michael Chapman--- v1 discussion at: http://www.redhat.com/archives/libvir-list/2016-January/msg00031.html Changes since v1: - Fixed bugs in cb_id/cb_id2 conditionals - Consistently used break to exit loop, dropped cleanup label - Clarified the logic and behaviour in comments - Improved commit message tools/virsh-domain.c | 60 1 file changed, 32 insertions(+), 28 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index edbbc34..853416c 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1875,14 +1875,17 @@ virshBlockJobWaitFree(virshBlockJobWaitDataPtr data) * virshBlockJobWait: * @data: private data initialized by virshBlockJobWaitInit * - * Waits for the block job to complete. This function prefers to get an event - * from libvirt but still has fallback means if the device name can't be matched + * Waits for the block job to complete. This function prefers to wait for a + * matching VIR_DOMAIN_EVENT_ID_BLOCK_JOB or VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 + * event from libvirt, however it has a fallback mode should either of these nit: It's usually "...; however, ..." + * events not be available. * - * This function returns values from the virConnectDomainEventBlockJobStatus enum - * or -1 in case of a internal error. Fallback states if a block job vanishes - * without triggering the event is VIR_DOMAIN_BLOCK_JOB_COMPLETED. For two phase - * jobs after the retry count for waiting for the event expires is - * VIR_DOMAIN_BLOCK_JOB_READY. + * This function returns values from the virConnectDomainEventBlockJobStatus + * enum or -1 in case of a internal error. nit: "...of an internal..." (usage of an instead of a) + * + * If the fallback mode is activated the returned event is + * VIR_DOMAIN_BLOCK_JOB_COMPLETED if the block job vanishes, or nit: "...block job vanishes or..." (no need for a comma) + * VIR_DOMAIN_BLOCK_JOB_READY if the block job reaches 100%. */ static int virshBlockJobWait(virshBlockJobWaitDataPtr data) @@ -1932,28 +1935,32 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) if (result < 0) { vshError(data->ctl, _("failed to query job for disk %s"), data->dev); -goto cleanup; +break; Usage of 'goto' is a libvirt coding convention, even though perhaps not a convention of all programming styles. I think use of goto here should be kept (here and other places that follow in this function). If you feel strongly about usage of break vs. goto, then this should be a two patch series - the first to convert "goto" usage into "break" usage and the second to resolve the issue of continuing with the timeout logic even if both callbacks could not be registered. I am inclined to change the gotos to breaks. My rationale is that the code really has two different success cases, ret = READY and ret = COMPLETED, but having only one of these handled by a normal break out of the loop seems odd. With my changes, both success cases are handled in the same way. Perhaps just the error cases should jump to cleanup? That could mean the label could be moved after the "print 100% completed" block (and the ret tests could be dropped there). } -/* if we've got an event for the device we are waiting for we can end - * the waiting loop */ +/* If either callback could be registered and we've got an event, we can + * can end the waiting loop */ if ((data->cb_id >= 0 || data->cb_id2 >= 0) && data->status != -1) { ret = data->status; -goto cleanup; +break; } -/* since virsh can't guarantee that the path provided by the user will - * later be matched in the event we will need to keep the fallback - * approach and claim success if the block
Re: [libvirt] [PATCH] migration: add option to set target ndb server port
I'm looking forward to be reviewed) Sincerely yours, Patch Series On 13.01.2016 14:01, Nikolay Shirokovskiy wrote: > Current libvirt + qemu pair lacks secure migrations in case of > VMs with non-shared disks. The only option to migrate securely > natively is to use tunneled mode and some kind of secure > destination URI. But tunelled mode does not support non-shared > disks. > > The other way to make migration secure is to organize a tunnel > by external means. This is possible in case of shared disks > migration thru use of proper combination of destination URI, > migration URI and VIR_MIGRATE_PARAM_LISTEN_ADDRESS migration > param. But again this is not possible in case of non shared disks > migration as we have no option to control target nbd server port. > But fixing this much more simplier that supporting non-shared > disks in tunneled mode. > > So this patch adds option to set target ndb port. > > Finally all qemu migration connections will be secured AFAIK but > even in this case this patch could be convinient if one wants > all migration traffic be put in a single connection. > --- > include/libvirt/libvirt-domain.h | 10 +++ > src/qemu/qemu_driver.c | 25 --- > src/qemu/qemu_migration.c| 138 > +++ > src/qemu/qemu_migration.h| 3 + > tools/virsh-domain.c | 12 > tools/virsh.pod | 5 +- > 6 files changed, 141 insertions(+), 52 deletions(-) > > diff --git a/include/libvirt/libvirt-domain.h > b/include/libvirt/libvirt-domain.h > index d26faa5..e577f26 100644 > --- a/include/libvirt/libvirt-domain.h > +++ b/include/libvirt/libvirt-domain.h > @@ -757,6 +757,16 @@ typedef enum { > */ > # define VIR_MIGRATE_PARAM_MIGRATE_DISKS"migrate_disks" > > +/** > + * VIR_MIGRATE_PARAM_NBD_PORT: > + * > + * virDomainMigrate* params field: port that destination nbd server should > use > + * for incoming disks migration. Type is VIR_TYPED_PARAM_INT. If set to 0 or > + * omitted, libvirt will choose a suitable default. At the moment this is > only > + * supported by the QEMU driver. > + */ > +# define VIR_MIGRATE_PARAM_NBD_PORT"nbd_port" > + > /* Domain migration. */ > virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn, > unsigned long flags, const char *dname, > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 8ccf68b..4d00ba4 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -12082,7 +12082,7 @@ qemuDomainMigratePrepare2(virConnectPtr dconn, > ret = qemuMigrationPrepareDirect(driver, dconn, > NULL, 0, NULL, NULL, /* No cookies */ > uri_in, uri_out, > - , origname, NULL, 0, NULL, flags); > + , origname, NULL, 0, NULL, 0, > flags); > > cleanup: > VIR_FREE(origname); > @@ -12135,7 +12135,7 @@ qemuDomainMigratePerform(virDomainPtr dom, > * Consume any cookie we were able to decode though > */ > ret = qemuMigrationPerform(driver, dom->conn, vm, > - NULL, dconnuri, uri, NULL, NULL, 0, NULL, > + NULL, dconnuri, uri, NULL, NULL, 0, NULL, 0, > cookie, cookielen, > NULL, NULL, /* No output cookies in v2 */ > flags, dname, resource, false); > @@ -12308,7 +12308,7 @@ qemuDomainMigratePrepare3(virConnectPtr dconn, > cookiein, cookieinlen, > cookieout, cookieoutlen, > uri_in, uri_out, > - , origname, NULL, 0, NULL, flags); > + , origname, NULL, 0, NULL, 0, > flags); > > cleanup: > VIR_FREE(origname); > @@ -12334,6 +12334,7 @@ qemuDomainMigratePrepare3Params(virConnectPtr dconn, > const char *dname = NULL; > const char *uri_in = NULL; > const char *listenAddress = cfg->migrationAddress; > +int nbdPort = 0; > int nmigrate_disks; > const char **migrate_disks = NULL; > char *origname = NULL; > @@ -12354,7 +12355,10 @@ qemuDomainMigratePrepare3Params(virConnectPtr dconn, > _in) < 0 || > virTypedParamsGetString(params, nparams, > VIR_MIGRATE_PARAM_LISTEN_ADDRESS, > -) < 0) > +) < 0 || > +virTypedParamsGetInt(params, nparams, > +VIR_MIGRATE_PARAM_NBD_PORT, > +) < 0) > goto cleanup; > > nmigrate_disks = virTypedParamsGetStringList(params, nparams, > @@ -12385,7 +12389,8 @@
Re: [libvirt] [PATCH] virsh: fix cpu-stats command output format issue
On 01/26/2016 04:24 PM, Peter Krempa wrote: On Tue, Jan 26, 2016 at 10:25:05 +0800, Luyao Huang wrote: After commit 57177f1, the cpu-stats command format change to: CPU0: cpu_time 14401.507878990 seconds vcpu_time14378732785511 vcpu_time is not user friendly. After this patch, it will change back: CPU0: cpu_time 14401.507878990 seconds vcpu_time14378.732785511 seconds https://bugzilla.redhat.com/show_bug.cgi?id=1301807 Signed-off-by: Luyao Huang--- tools/virsh-domain.c | 1 + 1 file changed, 1 insertion(+) ACK, I'll push it shortly. Thanks a lot for your quick review, Luyao Peter -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] vz: remove unused struct field
In commit 7039bb3c we have removed code that saves uuid to vzDomObj.uuid So this field is no longer needed. --- src/vz/vz_sdk.c | 5 - src/vz/vz_utils.h | 1 - 2 files changed, 6 deletions(-) diff --git a/src/vz/vz_sdk.c b/src/vz/vz_sdk.c index d610979..5e4c4ac 100644 --- a/src/vz/vz_sdk.c +++ b/src/vz/vz_sdk.c @@ -413,7 +413,6 @@ prlsdkDomObjFreePrivate(void *p) PrlHandle_Free(pdom->sdkdom); PrlHandle_Free(pdom->cache.stats); virCondDestroy(>cache.cond); -VIR_FREE(pdom->uuid); VIR_FREE(pdom->home); VIR_FREE(p); }; @@ -1281,10 +1280,6 @@ prlsdkLoadDomain(vzConnPtr privconn, def->id = -1; -/* we will remove this field in the near future, so let's set it - * to NULL temporarily */ -pdom->uuid = NULL; - pdom->cache.stats = PRL_INVALID_HANDLE; pdom->cache.count = -1; if (virCondInit(>cache.cond) < 0) { diff --git a/src/vz/vz_utils.h b/src/vz/vz_utils.h index b7a4c81..417f821 100644 --- a/src/vz/vz_utils.h +++ b/src/vz/vz_utils.h @@ -76,7 +76,6 @@ typedef struct _vzCountersCache vzCountersCache; struct vzDomObj { int id; -char *uuid; char *home; PRL_HANDLE sdkdom; vzCountersCache cache; -- 1.8.3.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v2] qemu: add reporting of vCPU wait time
The VIR_DOMAIN_STATS_VCPU flag to virDomainListGetStats enables reporting of stats about vCPUs. Currently we only report the cumulative CPU running time and the execution state. This adds reporting of the wait time - time the vCPU wants to run, but the host schedular has something else running ahead of it. The data is reported per-vCPU eg $ virsh domstats --vcpu demo Domain: 'demo' vcpu.current=4 vcpu.maximum=4 vcpu.0.state=1 vcpu.0.time=142000 vcpu.0.wait=18403928 vcpu.1.state=1 vcpu.1.time=13000 vcpu.1.wait=10612111 vcpu.2.state=1 vcpu.2.time=11000 vcpu.2.wait=12759501 vcpu.3.state=1 vcpu.3.time=9000 vcpu.3.wait=21825087 In implementing this I notice our reporting of CPU execute time has very poor granularity, since we are getting it from /proc/$PID/stat. As a future enhancement we should prefer to get CPU execute time from /proc/$PID/schedstat or /proc/$PID/sched (if either exist on the running kernel) Signed-off-by: Daniel P. Berrange--- src/qemu/qemu_driver.c | 100 +++-- 1 file changed, 96 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 36fb047..40c52d4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1361,6 +1361,80 @@ static char *qemuConnectGetCapabilities(virConnectPtr conn) { static int +qemuGetSchedInfo(unsigned long long *cpuWait, + pid_t pid, pid_t tid) +{ +char *proc = NULL; +char *data = NULL; +char **lines = NULL; +size_t i; +int ret = -1; +double val; + +*cpuWait = 0; + +/* In general, we cannot assume pid_t fits in int; but /proc parsing + * is specific to Linux where int works fine. */ +if (tid) +ret = virAsprintf(, "/proc/%d/task/%d/sched", (int)pid, (int)tid); +else +ret = virAsprintf(, "/proc/%d/sched", (int)pid); +if (ret < 0) +goto cleanup; +ret = -1; + +/* The file is not guaranteed to exist (needs CONFIG_SCHED_DEBUG) */ +if (access(proc, R_OK) < 0) +return 0; + +if (virFileReadAll(proc, (1<<16), ) < 0) +goto cleanup; + +lines = virStringSplit(data, "\n", 0); +if (!lines) +goto cleanup; + +for (i = 0; lines[i] != NULL; i++) { +const char *line = lines[i]; + +/* Needs CONFIG_SCHEDSTATS. The second check + * is the old name the kernel used in past */ +if (STRPREFIX(line, "se.statistics.wait_sum") || +STRPREFIX(line, "se.wait_sum")) { +line = strchr(line, ':'); +if (!line) { +virReportError(VIR_ERR_INTERNAL_ERROR, + _("Missing separate in sched info '%s'"), + lines[i]); +goto cleanup; +} +line++; +while (*line == ' ') +line++; + +if (virStrToDouble(line, NULL, ) < 0) { +virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to parse sched info value '%s'"), + line); +goto cleanup; +} + +*cpuWait = (unsigned long long)(val * 100); +break; +} +} + +ret = 0; + + cleanup: +VIR_FREE(data); +VIR_FREE(proc); +VIR_FREE(lines); +return ret; +} + + +static int qemuGetProcessInfo(unsigned long long *cpuTime, int *lastCpu, long *vm_rss, pid_t pid, int tid) { @@ -1424,6 +1498,7 @@ qemuGetProcessInfo(unsigned long long *cpuTime, int *lastCpu, long *vm_rss, static int qemuDomainHelperGetVcpus(virDomainObjPtr vm, virVcpuInfoPtr info, + unsigned long long *cpuwait, int maxinfo, unsigned char *cpumaps, int maplen) @@ -1476,6 +1551,11 @@ qemuDomainHelperGetVcpus(virDomainObjPtr vm, virBitmapFree(map); } +if (cpuwait) { +if (qemuGetSchedInfo(&(cpuwait[i]), vm->pid, vcpupid) < 0) +return -1; +} + ncpuinfo++; } @@ -5490,7 +5570,7 @@ qemuDomainGetVcpus(virDomainPtr dom, goto cleanup; } -ret = qemuDomainHelperGetVcpus(vm, info, maxinfo, cpumaps, maplen); +ret = qemuDomainHelperGetVcpus(vm, info, NULL, maxinfo, cpumaps, maplen); cleanup: virDomainObjEndAPI(); @@ -19036,6 +19116,7 @@ qemuDomainGetStatsVcpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, int ret = -1; char param_name[VIR_TYPED_PARAM_FIELD_LENGTH]; virVcpuInfoPtr cpuinfo = NULL; +unsigned long long *cpuwait = NULL; if (virTypedParamsAddUInt(>params, >nparams, @@ -19051,10 +19132,12 @@ qemuDomainGetStatsVcpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
On 25.01.2016 10:16, Maxim Nestratov wrote: > A pretty nasty deadlock occurs while trying to rename a VM in parallel > with virDomainObjListNumOfDomains. > The short description of the problem is as follows: > > Thread #1: > > qemuDomainRename: > --> aquires domain lock by qemuDomObjFromDomain >-> waits for domain list lock in any of the listed functions: > - virDomainObjListFindByName > - virDomainObjListRenameAddNew > - virDomainObjListRenameRemove > > Thread #2: > > virDomainObjListNumOfDomains: > --> aquires domain list lock >-> waits for domain lock in virDomainObjListCount > > The patch establishes a single order of taking locks: driver->domains list > first, then a particular VM. This is done by implementing a set of > virDomainObjListXxxLocked functions working with driver->domains that assume > that list lock is already aquired by calling functions. > > Signed-off-by: Maxim Nestratov> --- > src/conf/virdomainobjlist.c | 74 > + > src/conf/virdomainobjlist.h | 9 ++ > src/libvirt_private.syms| 4 +++ > src/qemu/qemu_driver.c | 40 +--- > 4 files changed, 110 insertions(+), 17 deletions(-) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index e46404b..b012516 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -206,6 +206,36 @@ struct qemuAutostartData { > virConnectPtr conn; > }; > > +/** > + * qemuDomObjFromDomainLocked: > + * @domain: Domain pointer that has to be looked up > + * > + * This function looks up @domain and returns the appropriate virDomainObjPtr > + * that has to be released by calling virDomainObjEndAPI(). > + * It assumes that driver->domains list is locked already, thus it uses > Locked > + * variant of virDomainObjListFindByUUIDRef function. > + * > + * Returns the domain object with incremented reference counter which is > locked > + * on success, NULL otherwise. > + */ > +static virDomainObjPtr > +qemuDomObjFromDomainLocked(virDomainPtr domain) > +{ > +virDomainObjPtr vm; > +virQEMUDriverPtr driver = domain->conn->privateData; > +char uuidstr[VIR_UUID_STRING_BUFLEN]; > + > +vm = virDomainObjListFindByUUIDRefLocked(driver->domains, domain->uuid); > +if (!vm) { > +virUUIDFormat(domain->uuid, uuidstr); > +virReportError(VIR_ERR_NO_DOMAIN, > + _("no domain with matching uuid '%s' (%s)"), > + uuidstr, domain->name); > +return NULL; > +} > + > +return vm; > +} > > /** > * qemuDomObjFromDomain: > @@ -19892,7 +19922,8 @@ static int qemuDomainRename(virDomainPtr dom, > > virCheckFlags(0, ret); > > -if (!(vm = qemuDomObjFromDomain(dom))) > +virObjectLock(driver->domains); This is rather ugly. While driver->domains is a lockable object, it's internals are intentionally hidden from the rest of the code so that nobody from outside should touch them. That's why we have locking APIs over virDomainObjList object. I know this will work, I just find it hackish. However, I find your approach understandable and kind of agree with it. What we should do is: 1) lock list 2) lookup domain 3) begin MODIFY job on the domain (*) 4) rename the domain (here are hidden all the checks, moving the XML file, etc. - basically everything from the qemuDomainRename()) 5) end job 6) unlock domain 7) unlock list * - here we will need BeginJob() without timeout - we don't want to keep the list locked longer than needed. Either setting the job is done instantly or not at all. What if, instead of introducing bunch of Locked() functions we introduce a new internal API to virDomainObjList that will look like this: int virDomainObjListRename(virDomainObjListPtr domains, virDomainObjPtr dom, const char *new_name, int (*driverRename)(virDomainObjPtr, ...), int (*rollBack)(virDomainObjPtr, ...)); This function will encapsulate the renaming and at the correct spot it will call driverRename() so that driver can adjust its internal state. If the driver is successful, just remove the old entry. If it is not, call rollBack() callback which will cancel partially performed operation and restore original state of the driver. Moreover, in the virDomainObjListRename() we can ensure the locking order and we don't need to introduce new BeginJob() API with no timeout. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
On Mon, Jan 25, 2016 at 12:16:12PM +0300, Maxim Nestratov wrote: A pretty nasty deadlock occurs while trying to rename a VM in parallel with virDomainObjListNumOfDomains. The short description of the problem is as follows: Thread #1: qemuDomainRename: --> aquires domain lock by qemuDomObjFromDomain -> waits for domain list lock in any of the listed functions: - virDomainObjListFindByName - virDomainObjListRenameAddNew - virDomainObjListRenameRemove Thread #2: virDomainObjListNumOfDomains: --> aquires domain list lock -> waits for domain lock in virDomainObjListCount Ouch, the rename API should hold the list lock all the time it's doing something and not acquire it when any domain is locked. But you are duplicating a lot of code, The patch establishes a single order of taking locks: driver->domains list first, then a particular VM. This is done by implementing a set of virDomainObjListXxxLocked functions working with driver->domains that assume that list lock is already aquired by calling functions. Signed-off-by: Maxim Nestratov--- src/conf/virdomainobjlist.c | 74 + src/conf/virdomainobjlist.h | 9 ++ src/libvirt_private.syms| 4 +++ src/qemu/qemu_driver.c | 40 +--- 4 files changed, 110 insertions(+), 17 deletions(-) diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c index 7568f93..89e28a8 100644 --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -132,21 +132,19 @@ virDomainObjPtr virDomainObjListFindByID(virDomainObjListPtr doms, static virDomainObjPtr -virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, +virDomainObjListFindByUUIDInternalLocked(virDomainObjListPtr doms, const unsigned char *uuid, bool ref) Indentation is off for renamed functions. { char uuidstr[VIR_UUID_STRING_BUFLEN]; virDomainObjPtr obj; -virObjectLock(doms); virUUIDFormat(uuid, uuidstr); obj = virHashLookup(doms->objs, uuidstr); -if (ref) { +if (ref) virObjectRef(obj); -virObjectUnlock(doms); -} + if (obj) { virObjectLock(obj); if (obj->removing) { @@ -156,8 +154,19 @@ virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, obj = NULL; } } -if (!ref) -virObjectUnlock(doms); +return obj; +} + +static virDomainObjPtr +virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, + const unsigned char *uuid, + bool ref) +{ +virDomainObjPtr obj; + +virObjectLock(doms); +obj = virDomainObjListFindByUUIDInternalLocked(doms, uuid, ref); +virObjectUnlock(doms); This won't work for ref == true as in that case the unlock is not the last thing done in virDomainObjListFindByUUIDInternalLocked(). I would rather we didn't invent bunch of new functions whose names aren't readable, but mayb even preferred open-coding some of them (or their parts) in the rename itself. The reasoning behind it is that rename is special API and it touches code that's not exposed normally. Or the other way would be moving parts of the rename code into this file to make it available for other drivers as well. Adding Michal to Cc as the original idea-haver =) to see if we can agree on that or I am mistaken and your approach is the way to go. Martin signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCHv2] util: keep/use a bitmap of in-use macvtap devices
On Mon, Jan 25, 2016 at 09:00:27PM -0500, Laine Stump wrote: > >> +retries = MACVLAN_MAX_ID + 1; > >> +while (!ifnameCreated && retries) { > >> virMutexLock(); > >> -for (c = 0; c < 8192; c++) { > >> -snprintf(ifname, sizeof(ifname), pattern, c); > >> -if ((ret = virNetDevExists(ifname)) < 0) { > >> -virMutexUnlock(); > >> +reservedID = virNetDevMacVLanReserveID(reservedID, flags, false, > >> true); > >> +if (reservedID < 0) { > >> +virMutexUnlock(); > >> +return -1; > >> +} > >> +snprintf(ifname, sizeof(ifname), pattern, reservedID); > > Since you're changing this, snprintf returns -1 in case of error. > > Well, the man page says -1 is returned by the *printf functions "if an > output error is encountered", which would include an I/O error (moot for > snprintf() since it doesn't do any I/O) or a bad format string (but > we're calling it with a literal format string that only includes a > single %d, so it is known to be good). Checking the returned length is > pointless too, since by definition the longest possible result of > macvtap%d where the %d is guaranteed between 0 and 8192 is > "macvtap8192", which is 12 characters - safely below the limit of > IFNAMSIZ (16). Sure, I've looked at the man page and also into gnulib code and just wanted to point it out, that we may want to somehow handle -1, but if it happens probably the whole libvirt is going to die with OOM. Let's leave it as it is. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v5 6/7] Implement qemuSetupGlobalCpuCgroup
18.01.2016 13:08, Alexander Burluka пишет: This functions setups per-domain cpu bandwidth parameters Signed-off-by: Alexander Burluka--- src/qemu/qemu_cgroup.c | 54 + src/qemu/qemu_cgroup.h | 1 + src/qemu/qemu_process.c | 4 3 files changed, 59 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 452d71d..1638bfe 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -1002,6 +1002,60 @@ qemuSetupCgroupCpusetCpus(virCgroupPtr cgroup, } int +qemuSetupGlobalCpuCgroup(virDomainObjPtr vm) +{ +qemuDomainObjPrivatePtr priv = vm->privateData; +unsigned long long period = vm->def->cputune.global_period; +long long quota = vm->def->cputune.global_quota; +char *mem_mask = NULL; +virDomainNumatuneMemMode mem_mode; + +if ((period || quota) && +!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cgroup cpu is required for scheduler tuning")); +return -1; +} + +/* + * If CPU cgroup controller is not initialized here, then we need + * neither period nor quota settings. And if CPUSET controller is + * not initialized either, then there's nothing to do anyway. + */ +if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU) && +!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) +return 0; + +/* We are trying to setup cgroups for CPU pinning, which can also be done + * with virProcessSetAffinity, thus the lack of cgroups is not fatal here. + */ +if (priv->cgroup == NULL) +return 0; This check is unnecessary because virCgroupHasController already has it. + +if (virDomainNumatuneGetMode(vm->def->numa, -1, _mode) == 0 && +mem_mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && +virDomainNumatuneMaybeFormatNodeset(vm->def->numa, +priv->autoNodeset, +_mask, -1) < 0) +goto cleanup; I'd like to see the reason in error message here similar to error path at the beginning of the function. + +if (period || quota) { +if (qemuSetupBandwidthCgroup(priv->cgroup, period, quota) < 0) +goto cleanup; +} + +VIR_FREE(mem_mask); + +return 0; + + cleanup: +VIR_FREE(mem_mask); + +return -1; +} + + +int qemuSetupCgroupForVcpu(virDomainObjPtr vm) { virCgroupPtr cgroup_vcpu = NULL; diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 17da920..75f9eb7 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -54,6 +54,7 @@ int qemuSetupBandwidthCgroup(virCgroupPtr cgroup, long long quota); int qemuSetupCgroupCpusetCpus(virCgroupPtr cgroup, virBitmapPtr cpumask); int qemuSetupCgroupForVcpu(virDomainObjPtr vm); +int qemuSetupGlobalCpuCgroup(virDomainObjPtr vm); int qemuSetupCgroupForIOThreads(virDomainObjPtr vm); int qemuSetupCgroupForEmulator(virDomainObjPtr vm); int qemuRemoveCgroup(virQEMUDriverPtr driver, virDomainObjPtr vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05cbda2..96c7a6a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4982,6 +4982,10 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuProcessDetectIOThreadPIDs(driver, vm, asyncJob) < 0) goto cleanup; +VIR_DEBUG("Setting global CPU cgroup (if required)"); +if (qemuSetupGlobalCpuCgroup(vm) < 0) +goto cleanup; + VIR_DEBUG("Setting cgroup for each VCPU (if required)"); if (qemuSetupCgroupForVcpu(vm) < 0) goto cleanup; -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Call for mentors and project ideas for Google Summer of Code 2016
On 25.01.2016 18:28, Stefan Hajnoczi wrote: > The QEMU wiki page for Google Summer of Code 2016 is now available here: > > http://qemu-project.org/Google_Summer_of_Code_2016 > > QEMU will apply for Google Summer of Code 2016 (https://g.co/gsoc/). > If QEMU is accepted there will be funding for students to work on > 12-week full-time open source projects remotely from May to August > 2016. QEMU provides a mentor for each student who gives advice and > evaluates their progress. > > If you have a project idea, especially if you are a regular > contributor to QEMU and are willing to mentor this summer, please go > to this wiki page and fill out the project idea template: > > http://qemu-project.org/Google_Summer_of_Code_2016 > > The project ideas list is part of the application so that QEMU can > participate in GSoC. It's useful to have your project ideas on the > wiki by February 8th 2016. > > If you have any questions about project ideas or QEMU applying to > GSoC, please reply to this thread. Hey Stefan, so as we spoke earlier in person, I think it's time for libvirt to try and apply as a separate organization. I went ahead and created similar GSoC ideas page for libvirt: http://wiki.libvirt.org/page/Google_Summer_of_Code_2016 My question is, is qemu willing to back libvirt in case we don't get selected and if so, should we duplicate the idea list into qemu wiki too? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
26.01.2016 12:29, Martin Kletzander пишет: On Mon, Jan 25, 2016 at 12:16:12PM +0300, Maxim Nestratov wrote: A pretty nasty deadlock occurs while trying to rename a VM in parallel with virDomainObjListNumOfDomains. The short description of the problem is as follows: Thread #1: qemuDomainRename: --> aquires domain lock by qemuDomObjFromDomain -> waits for domain list lock in any of the listed functions: - virDomainObjListFindByName - virDomainObjListRenameAddNew - virDomainObjListRenameRemove Thread #2: virDomainObjListNumOfDomains: --> aquires domain list lock -> waits for domain lock in virDomainObjListCount Ouch, the rename API should hold the list lock all the time it's doing something and not acquire it when any domain is locked. But you are duplicating a lot of code, Not that much The patch establishes a single order of taking locks: driver->domains list first, then a particular VM. This is done by implementing a set of virDomainObjListXxxLocked functions working with driver->domains that assume that list lock is already aquired by calling functions. Signed-off-by: Maxim Nestratov--- src/conf/virdomainobjlist.c | 74 + src/conf/virdomainobjlist.h | 9 ++ src/libvirt_private.syms| 4 +++ src/qemu/qemu_driver.c | 40 +--- 4 files changed, 110 insertions(+), 17 deletions(-) diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c index 7568f93..89e28a8 100644 --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -132,21 +132,19 @@ virDomainObjPtr virDomainObjListFindByID(virDomainObjListPtr doms, static virDomainObjPtr -virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, +virDomainObjListFindByUUIDInternalLocked(virDomainObjListPtr doms, const unsigned char *uuid, bool ref) Indentation is off for renamed functions. Ok. I'll fix it if we decide to go on with my approach. { char uuidstr[VIR_UUID_STRING_BUFLEN]; virDomainObjPtr obj; -virObjectLock(doms); virUUIDFormat(uuid, uuidstr); obj = virHashLookup(doms->objs, uuidstr); -if (ref) { +if (ref) virObjectRef(obj); -virObjectUnlock(doms); -} + if (obj) { virObjectLock(obj); if (obj->removing) { @@ -156,8 +154,19 @@ virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, obj = NULL; } } -if (!ref) -virObjectUnlock(doms); +return obj; +} + +static virDomainObjPtr +virDomainObjListFindByUUIDInternal(virDomainObjListPtr doms, + const unsigned char *uuid, + bool ref) +{ +virDomainObjPtr obj; + +virObjectLock(doms); +obj = virDomainObjListFindByUUIDInternalLocked(doms, uuid, ref); +virObjectUnlock(doms); This won't work for ref == true as in that case the unlock is not the last thing done in virDomainObjListFindByUUIDInternalLocked(). I still think that this won't cause any problem as far as list lock is aquired. Otherwise we would have seen them in virDomainObjListNumOfDomains I would rather we didn't invent bunch of new functions whose names aren't readable, but mayb even preferred open-coding some of them (or their parts) in the rename itself. The reasoning behind it is that rename is special API and it touches code that's not exposed normally. Or the other way would be moving parts of the rename code into this file to make it available for other drivers as well. I don't insist on the way I'm proposing, let the patch be the reference of the problem. Adding Michal to Cc as the original idea-haver =) to see if we can agree on that or I am mistaken and your approach is the way to go. Martin -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ability to add arbitrary data to snapshots
On Tue, Jan 26, 2016 at 12:59:00 +0300, Alexander Burluka wrote: > On 01/25/2016 04:16 PM, Daniel P. Berrange wrote: > > On Mon, Jan 25, 2016 at 02:55:30PM +0300, Alexander Burluka wrote: > >> For example, we want to store suspended state of VM. > >> I'm aware that some other careless application dealing with libvirt > >> may erase metadata section and info about additional snapshot data will be > >> lost. > > What do you mean by suspended state of the VM ? Are you referring to > > the VM memory & device state ? If so, I think that something that > > should be explicitly represented in the API, not a opaque blob. > We distinguish paused and suspended to disk domain states. For suspend > to disk we are using virDomainSaveFlags API call and stop domain. So you are saving the domain to a file which is out of libvirt's control and your management app (I guess) tracks the file for each domain. > This API call stores VM memory and device states to some file. If user > then calls virDomainSnapshotCreateXML, the resulted snapshot lacks > VM memory and device states because it looks like domain is stopped and > it is not aware about state file. Thus, switch to this snapshot does not > work as expected. It is the responsibility of your management app to associate the state files with snapshots and to properly restore them when switching between snapshots. I don't think libvirt should behave as a general purpose database for management apps. The situation would be quite different if you used virDomainManagedSave to save its state. In this case, it's libvirt's responsibility to be aware of a saved state when creating/reverting snapshots. And currently libvirt doesn't handle this correctly. Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] Dont set ABI update flag during migration cookie parsing for persistent copy
The migration would fail if the checks in virDomainDefPostParseMemory() fail after ABI update. The ABI update normally done during the guest start and not during define. If someone is using --persistent with the virsh migrate it should be treated equivalent to domain define and not start. The furture restart would/should anyway do the ABI updates and flag correct errors. Signed-off-by: Shivaprasad G Bhat--- src/qemu/qemu_migration.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 51e7125..39e259a 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1284,8 +1284,7 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, } mig->persistent = virDomainDefParseNode(doc, nodes[0], caps, driver->xmlopt, -VIR_DOMAIN_DEF_PARSE_INACTIVE | - VIR_DOMAIN_DEF_PARSE_ABI_UPDATE); +VIR_DOMAIN_DEF_PARSE_INACTIVE); if (!mig->persistent) { /* virDomainDefParseNode already reported * an error for us */ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Dont set ABI update flag during migration cookie parsing for persistent copy
Please drop this patch. I sent it too early. I think the existing migration behavior with --persistent is expected. Thanks, Shiva On Tue, Jan 26, 2016 at 1:31 PM, Shivaprasad G Bhat < shivaprasadb...@gmail.com> wrote: > The migration would fail if the checks in virDomainDefPostParseMemory() > fail > after ABI update. The ABI update normally done during the guest start and > not > during define. If someone is using --persistent with the virsh > migrate it should be treated equivalent to domain define and not start. The > furture restart would/should anyway do the ABI updates and flag correct > errors. > > Signed-off-by: Shivaprasad G Bhat> --- > src/qemu/qemu_migration.c |3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c > index 51e7125..39e259a 100644 > --- a/src/qemu/qemu_migration.c > +++ b/src/qemu/qemu_migration.c > @@ -1284,8 +1284,7 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr > mig, > } > mig->persistent = virDomainDefParseNode(doc, nodes[0], > caps, driver->xmlopt, > - > VIR_DOMAIN_DEF_PARSE_INACTIVE | > - > VIR_DOMAIN_DEF_PARSE_ABI_UPDATE); > + > VIR_DOMAIN_DEF_PARSE_INACTIVE); > if (!mig->persistent) { > /* virDomainDefParseNode already reported > * an error for us */ > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] virsh: fix cpu-stats command output format issue
On Tue, Jan 26, 2016 at 10:25:05 +0800, Luyao Huang wrote: > After commit 57177f1, the cpu-stats command format change to: > > CPU0: > cpu_time 14401.507878990 seconds > vcpu_time14378732785511 > > vcpu_time is not user friendly. After this patch, it will > change back: > CPU0: > cpu_time 14401.507878990 seconds > vcpu_time14378.732785511 seconds > > https://bugzilla.redhat.com/show_bug.cgi?id=1301807 > > Signed-off-by: Luyao Huang> --- > tools/virsh-domain.c | 1 + > 1 file changed, 1 insertion(+) ACK, I'll push it shortly. Peter signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
On Tue, Jan 26, 2016 at 13:26:32 +0100, Michal Privoznik wrote: > On 25.01.2016 10:16, Maxim Nestratov wrote: ... > > > > virCheckFlags(0, ret); > > > > -if (!(vm = qemuDomObjFromDomain(dom))) > > +virObjectLock(driver->domains); > > This is rather ugly. While driver->domains is a lockable object, it's > internals are intentionally hidden from the rest of the code so that > nobody from outside should touch them. That's why we have locking APIs > over virDomainObjList object. I know this will work, I just find it hackish. Yeah, I don't like this either. > However, I find your approach understandable and kind of agree with it. > What we should do is: > > 1) lock list > 2) lookup domain > 3) begin MODIFY job on the domain (*) > > 4) rename the domain (here are hidden all the checks, moving the XML > file, etc. - basically everything from the qemuDomainRename()) > > 5) end job > 6) unlock domain > 7) unlock list > > * - here we will need BeginJob() without timeout - we don't want to keep > the list locked longer than needed. Either setting the job is done > instantly or not at all. This sounds very ugly and fragile too. > What if, instead of introducing bunch of Locked() functions we introduce > a new internal API to virDomainObjList that will look like this: > > int virDomainObjListRename(virDomainObjListPtr domains, >virDomainObjPtr dom, >const char *new_name, >int (*driverRename)(virDomainObjPtr, ...), >int (*rollBack)(virDomainObjPtr, ...)); > > This function will encapsulate the renaming and at the correct spot it > will call driverRename() so that driver can adjust its internal state. > If the driver is successful, just remove the old entry. If it is not, > call rollBack() callback which will cancel partially performed operation > and restore original state of the driver. > Moreover, in the virDomainObjListRename() we can ensure the locking > order and we don't need to introduce new BeginJob() API with no timeout. But very much this approach I like :-) Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: turn on virtlockd by default
On Fri, Jan 22, 2016 at 03:56:08PM +, Daniel P. Berrange wrote: > We have had virtlockd available for a long time now but > have always defaulted to the 'nop' lock driver which does > no locking. This gives users an unsafe deployment by > default unless they know to turn on lockd. Does the default setup of virtlockd offer any safety? After looking at: https://bugzilla.redhat.com/show_bug.cgi?id=1191802 It seems that with dynamic_ownership enabled we first apply the security labels, then check the disk locks by calling virDomainLockProcessResume right before starting the CPUs in virDomainLockProcessResume. After that fails, resetting the uid:gid back to root:root cuts off the access to the disk for the already running domain. Is there a reason why the locks are checked so late? I assume this only ever worked for the case of migration with shared storage (and shared lockspace). Jan > virtlockd will > auto-activate via systemd when guests launch, so setting > it on by default will only cause upgrade pain for people > on non-systemd distros. > > Signed-off-by: Daniel P. Berrange> --- > src/qemu/qemu.conf | 3 ++- > src/qemu/qemu_driver.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
On Tue, Jan 26, 2016 at 02:14:23PM +0100, Jiri Denemark wrote: On Tue, Jan 26, 2016 at 13:26:32 +0100, Michal Privoznik wrote: On 25.01.2016 10:16, Maxim Nestratov wrote: ... > > virCheckFlags(0, ret); > > -if (!(vm = qemuDomObjFromDomain(dom))) > +virObjectLock(driver->domains); This is rather ugly. While driver->domains is a lockable object, it's internals are intentionally hidden from the rest of the code so that nobody from outside should touch them. That's why we have locking APIs over virDomainObjList object. I know this will work, I just find it hackish. Yeah, I don't like this either. However, I find your approach understandable and kind of agree with it. What we should do is: 1) lock list 2) lookup domain 3) begin MODIFY job on the domain (*) 4) rename the domain (here are hidden all the checks, moving the XML file, etc. - basically everything from the qemuDomainRename()) 5) end job 6) unlock domain 7) unlock list * - here we will need BeginJob() without timeout - we don't want to keep the list locked longer than needed. Either setting the job is done instantly or not at all. This sounds very ugly and fragile too. What if, instead of introducing bunch of Locked() functions we introduce a new internal API to virDomainObjList that will look like this: int virDomainObjListRename(virDomainObjListPtr domains, virDomainObjPtr dom, const char *new_name, int (*driverRename)(virDomainObjPtr, ...), int (*rollBack)(virDomainObjPtr, ...)); This function will encapsulate the renaming and at the correct spot it will call driverRename() so that driver can adjust its internal state. If the driver is successful, just remove the old entry. If it is not, call rollBack() callback which will cancel partially performed operation and restore original state of the driver. Moreover, in the virDomainObjListRename() we can ensure the locking order and we don't need to introduce new BeginJob() API with no timeout. But very much this approach I like :-) That's what I meant with: "Or the other way would be moving parts of the rename code into this file to make it available for other drivers as well." but you explained it in more detail and understandably. ACK for that idea as well. Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/4] logical: Fix comment examples for virStorageBackendLogicalFindLVs
On Fri, Jan 22, 2016 at 05:21:03PM -0500, John Ferlan wrote: > When commit id '82c1740a' made changes to the output format (changing from > using a ',' separator to '#'), the examples in the lvs output from the > comments weren't changed. > > Additionally, the two new fields added ('segtype' and 'stripes') were > not included in the output, leaving it well confusing. > > This patch fixes the sample output, adds a 'striped' example, and makes > other comment related adjuments for long line and spacing between followup s/adjuments/adjustments/ > 'NB' remarks (while I'm there). > > Signed-off-by: John Ferlan> --- > src/storage/storage_backend_logical.c | 19 +++ > 1 file changed, 11 insertions(+), 8 deletions(-) ACK -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: turn on virtlockd by default
On Tue, Jan 26, 2016 at 02:28:56PM +0100, Ján Tomko wrote: > On Fri, Jan 22, 2016 at 03:56:08PM +, Daniel P. Berrange wrote: > > We have had virtlockd available for a long time now but > > have always defaulted to the 'nop' lock driver which does > > no locking. This gives users an unsafe deployment by > > default unless they know to turn on lockd. > > Does the default setup of virtlockd offer any safety? > > After looking at: > https://bugzilla.redhat.com/show_bug.cgi?id=1191802 > It seems that with dynamic_ownership enabled we first apply > the security labels, then check the disk locks by calling > virDomainLockProcessResume right before starting the CPUs > in virDomainLockProcessResume. After that fails, resetting > the uid:gid back to root:root cuts off the access to the disk > for the already running domain. > > Is there a reason why the locks are checked so late? > > I assume this only ever worked for the case of migration with > shared storage (and shared lockspace). NB, the virtlockd integration is intended to protect the *content* of the disk image from being written to by two processes at once. Protecting the image metadata not a goal, since that is something that should be dealt with by the security drivers. ie the security driver should be acquiring suitable locks of its own so that it does not block away in-use labels for other VMs. We've had patches floating around for the security drivers for a while, but never got as far as merging them yet. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
26.01.2016 16:33, Martin Kletzander пишет: On Tue, Jan 26, 2016 at 02:14:23PM +0100, Jiri Denemark wrote: On Tue, Jan 26, 2016 at 13:26:32 +0100, Michal Privoznik wrote: On 25.01.2016 10:16, Maxim Nestratov wrote: ... [snip] What if, instead of introducing bunch of Locked() functions we introduce a new internal API to virDomainObjList that will look like this: int virDomainObjListRename(virDomainObjListPtr domains, virDomainObjPtr dom, const char *new_name, int (*driverRename)(virDomainObjPtr, ...), int (*rollBack)(virDomainObjPtr, ...)); This function will encapsulate the renaming and at the correct spot it will call driverRename() so that driver can adjust its internal state. If the driver is successful, just remove the old entry. If it is not, call rollBack() callback which will cancel partially performed operation and restore original state of the driver. Moreover, in the virDomainObjListRename() we can ensure the locking order and we don't need to introduce new BeginJob() API with no timeout. But very much this approach I like :-) That's what I meant with: "Or the other way would be moving parts of the rename code into this file to make it available for other drivers as well." but you explained it in more detail and understandably. ACK for that idea as well. I like it too. I can rework the patch following this way if no one wants to volunteer. Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: turn on virtlockd by default
On Fri, Jan 22, 2016 at 03:56:08PM +, Daniel P. Berrange wrote: > We have had virtlockd available for a long time now but > have always defaulted to the 'nop' lock driver which does > no locking. This gives users an unsafe deployment by > default unless they know to turn on lockd. virtlockd will > auto-activate via systemd when guests launch, so setting > it on by default will only cause upgrade pain for people > on non-systemd distros. > > Signed-off-by: Daniel P. Berrange> --- > src/qemu/qemu.conf | 3 ++- > src/qemu/qemu_driver.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > ACK Jan signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] qemu: add reporting of vCPU wait time
On Tue, Jan 26, 2016 at 11:20:59AM +, Daniel P. Berrange wrote: > The VIR_DOMAIN_STATS_VCPU flag to virDomainListGetStats > enables reporting of stats about vCPUs. Currently we > only report the cumulative CPU running time and the > execution state. > > This adds reporting of the wait time - time the vCPU > wants to run, but the host schedular has something else *scheduler > running ahead of it. > > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 36fb047..40c52d4 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > +if (tid) > +ret = virAsprintf(, "/proc/%d/task/%d/sched", (int)pid, > (int)tid); > +else > +ret = virAsprintf(, "/proc/%d/sched", (int)pid); > +if (ret < 0) > +goto cleanup; > +ret = -1; > + > +/* The file is not guaranteed to exist (needs CONFIG_SCHED_DEBUG) */ > +if (access(proc, R_OK) < 0) ret = 0; goto cleanup; to also free proc > +return 0; > + > +if (virFileReadAll(proc, (1<<16), ) < 0) > +goto cleanup; > + > +lines = virStringSplit(data, "\n", 0); > +if (!lines) > +goto cleanup; > + > +for (i = 0; lines[i] != NULL; i++) { > +const char *line = lines[i]; > + > +/* Needs CONFIG_SCHEDSTATS. The second check > + * is the old name the kernel used in past */ > +if (STRPREFIX(line, "se.statistics.wait_sum") || > +STRPREFIX(line, "se.wait_sum")) { > +line = strchr(line, ':'); > +if (!line) { > +virReportError(VIR_ERR_INTERNAL_ERROR, > + _("Missing separate in sched info '%s'"), The error message is either missing a word, or it should be separator. > + lines[i]); > +goto cleanup; > +} ACK Jan signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Compiling libvirt on Cygwin
Hello, It would seem this mailing list is pretty busy - however I have not seen any responses to my email? Is there anybody who can take a peek at this and give me some suggestions? Thanks, Maarten Jacobs From: maarten...@hotmail.com To: libvir-list@redhat.com Date: Sun, 24 Jan 2016 12:57:51 -0500 Subject: [libvirt] Compiling libvirt on Cygwin Hello, I have been trying to get libvirt compiled on Cygwin but I have run into a rather perplexing issue. I am running a 32-bit installation of Cygwin on Windows 10: CYGWIN_NT-10.0-WOW dell-desktop 2.4.0(0.293/5/3) 2016-01-15 16:14 i686 Cygwin I have had to "fix" a number of problems in the Makefile that prevented the code from linking properly. I am not sure if the fixes are entirely correct - but as long as the linker was ok I pressed on. After applying those changes to the Makefile, I ran into the following issue when creating libvirtd.exe: CCLD libvirtd.exe../src/.libs/libvirt_driver_remote.a(libvirt_net_rpc_la-virnetmessage.o): In function `virNetMessageNew':/home/maart/source/libvirt/src/rpc/virnetmessage.c:39: multiple definition of `virNetMessageNew'/home/maart/source/libvirt/src/.libs/libvirt.dll.a(d002143.o):(.text+0x0): first defined here...collect2: error: ld returned 1 exit statusMakefile:2152: recipe for target 'libvirtd.exe' failedmake[3]: *** [libvirtd.exe] Error 1make[3]: Leaving directory '/cygdrive/c/cygwin64/home/maart/source/libvirt/daemon'Makefile:2050: recipe for target 'all' failedmake[2]: *** [all] Error 2make[2]: Leaving directory '/cygdrive/c/cygwin64/home/maart/source/libvirt/daemon'Makefile:1993: recipe for target 'all-recursive' failedmake[1]: *** [all-recursive] Error 1make[1]: Leaving directory '/cygdrive/c/cygwin64/home/maart/source/libvirt'Makefile:1887: recipe for target 'all' failedmake: *** [all] Error 2 It appears to me that somehow when the executable is linked together, it finds two separate definitions of the virNetMessageNew method, however I cannot figure out how that could be. If anybody has any suggestions on how to debug this I'd appreciate it. For clarity: my aim is to get the code to compile first, whether or not it will actually work under Cygwin. I do not believe there should be a specific reason why this code could not be made to compile/link - even if the end result may not work as intended... But I hope to find that out once I successfully create the resulting executables. Thanks, Maarten Jacobs -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: turn on virtlockd by default
On Tue, Jan 26, 2016 at 01:47:02PM +, Daniel P. Berrange wrote: > On Tue, Jan 26, 2016 at 02:28:56PM +0100, Ján Tomko wrote: > > On Fri, Jan 22, 2016 at 03:56:08PM +, Daniel P. Berrange wrote: > > > We have had virtlockd available for a long time now but > > > have always defaulted to the 'nop' lock driver which does > > > no locking. This gives users an unsafe deployment by > > > default unless they know to turn on lockd. > > > > Does the default setup of virtlockd offer any safety? > > > > After looking at: > > https://bugzilla.redhat.com/show_bug.cgi?id=1191802 > > It seems that with dynamic_ownership enabled we first apply > > the security labels, then check the disk locks by calling > > virDomainLockProcessResume right before starting the CPUs > > in virDomainLockProcessResume. After that fails, resetting > > the uid:gid back to root:root cuts off the access to the disk > > for the already running domain. > > > > Is there a reason why the locks are checked so late? > > > > I assume this only ever worked for the case of migration with > > shared storage (and shared lockspace). > > NB, the virtlockd integration is intended to protect the > *content* of the disk image from being written to by two > processes at once. > > Protecting the image metadata not a goal, since that is > something that should be dealt with by the security drivers. > ie the security driver should be acquiring suitable locks of > its own so that it does not block away in-use labels for other > VMs. We've had patches floating around for the security drivers > for a while, but never got as far as merging them yet. Having said all that, I wonder if we should actually *not* turn on virtlockd, until we have addressed the problem in the security driver. Without the latter fix, it seems we'll get a never ending stream of bug reports about this issue. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] lxc: don't try to hide parent cgroups inside container
On the host when we start a container, it will be placed in a cgroup path of /machine.slice/machine-lxc\x2ddemo.scope under /sys/fs/cgroup/* Inside the containers' namespace we need to setup /sys/fs/cgroup mounts, and currently will bind mount /machine.slice/machine-lxc\x2ddemo.scope on the host to appear as / in the container. While this may sound nice, it confuses applications dealing with cgroups, because /proc/$PID/cgroup now does not match the directory in /sys/fs/cgroup This particularly causes problems for systems and will make it create repeated path components in the cgroup for apps run in the container eg /machine.slice/machine-lxc\x2ddemo.scope/machine.slice/machine-lxc\x2ddemo.scope/user.slice/user-0.slice/session-61.scope This also causes any systemd service that uses sd-notify to fail to start, because when systemd receives the notification it won't be able to identify the corresponding unit it came from. In particular this break rabbitmq-server startup Future kernels will provide proper cgroup namespacing which will handle this problem, but until that time we should not try to play games with hiding parent cgroups. Signed-off-by: Daniel P. Berrange--- src/libvirt_private.syms | 2 +- src/lxc/lxc_container.c | 2 +- src/util/vircgroup.c | 9 - src/util/vircgroup.h | 6 +++--- 4 files changed, 9 insertions(+), 10 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5e05a98..1732421 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1194,6 +1194,7 @@ virCgroupAllowDevice; virCgroupAllowDeviceMajor; virCgroupAllowDevicePath; virCgroupAvailable; +virCgroupBindMount; virCgroupControllerAvailable; virCgroupControllerTypeFromString; virCgroupControllerTypeToString; @@ -1231,7 +1232,6 @@ virCgroupGetMemSwapUsage; virCgroupGetPercpuStats; virCgroupHasController; virCgroupHasEmptyTasks; -virCgroupIsolateMount; virCgroupKill; virCgroupKillPainfully; virCgroupKillRecursive; diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index c5a70a1..a6805ac 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -1827,7 +1827,7 @@ static int lxcContainerSetupPivotRoot(virDomainDefPtr vmDef, /* Now we can re-mount the cgroups controllers in the * same configuration as before */ -if (virCgroupIsolateMount(cgroup, "/.oldroot/", sec_mount_options) < 0) +if (virCgroupBindMount(cgroup, "/.oldroot/", sec_mount_options) < 0) goto cleanup; /* Mounts /dev */ diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 7584ee4..d7f4065 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -3917,8 +3917,8 @@ virCgroupGetFreezerState(virCgroupPtr group, char **state) int -virCgroupIsolateMount(virCgroupPtr group, const char *oldroot, - const char *mountopts) +virCgroupBindMount(virCgroupPtr group, const char *oldroot, + const char *mountopts) { int ret = -1; size_t i; @@ -3954,10 +3954,9 @@ virCgroupIsolateMount(virCgroupPtr group, const char *oldroot, if (!virFileExists(group->controllers[i].mountPoint)) { char *src; -if (virAsprintf(, "%s%s%s", +if (virAsprintf(, "%s%s", oldroot, -group->controllers[i].mountPoint, -group->controllers[i].placement) < 0) +group->controllers[i].mountPoint) < 0) goto cleanup; VIR_DEBUG("Create mount point '%s'", diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index 63a9e1c..d754b1f 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -286,9 +286,9 @@ int virCgroupKill(virCgroupPtr group, int signum); int virCgroupKillRecursive(virCgroupPtr group, int signum); int virCgroupKillPainfully(virCgroupPtr group); -int virCgroupIsolateMount(virCgroupPtr group, - const char *oldroot, - const char *mountopts); +int virCgroupBindMount(virCgroupPtr group, + const char *oldroot, + const char *mountopts); bool virCgroupSupportsCpuBW(virCgroupPtr cgroup); -- 2.5.0 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Compiling libvirt on Cygwin
On Sun, Jan 24, 2016 at 12:57:51PM -0500, Maarten Jacobs wrote: > Hello, > I have been trying to get libvirt compiled on Cygwin but I have run into a > rather perplexing issue. > I am running a 32-bit installation of Cygwin on Windows 10: > CYGWIN_NT-10.0-WOW dell-desktop 2.4.0(0.293/5/3) 2016-01-15 16:14 i686 Cygwin > I have had to "fix" a number of problems in the Makefile that prevented the > code from linking properly. I am not sure if the fixes are entirely correct - > but as long as the linker was ok I pressed on. > After applying those changes to the Makefile, I ran into the following issue > when creating libvirtd.exe: > CCLD > libvirtd.exe../src/.libs/libvirt_driver_remote.a(libvirt_net_rpc_la-virnetmessage.o): > In function > `virNetMessageNew':/home/maart/source/libvirt/src/rpc/virnetmessage.c:39: > multiple definition of > `virNetMessageNew'/home/maart/source/libvirt/src/.libs/libvirt.dll.a(d002143.o):(.text+0x0): > first defined here...collect2: error: ld returned 1 exit > statusMakefile:2152: recipe for target 'libvirtd.exe' failedmake[3]: *** > [libvirtd.exe] Error 1make[3]: Leaving directory > '/cygdrive/c/cygwin64/home/maart/source/libvirt/daemon'Makefile:2050: recipe > for target 'all' failedmake[2]: *** [all] Error 2make[2]: Leaving directory > '/cygdrive/c/cygwin64/home/maart/source/libvirt/daemon'Makefile:1993: recipe > for target 'all-recursive' failedmake[1]: *** [all-recursive] Error 1make[1]: > Leaving directory > '/cygdrive/c/cygwin64/home/maart/source/libvirt'Makefile:1887: recipe for > target 'all' failedmake: *** [all] Error 2 > It appears to me that somehow when the executable is linked together, it > finds two separate definitions of the virNetMessageNew method, however I > cannot figure out how that could be. > If anybody has any suggestions on how to debug this I'd appreciate it. > For clarity: my aim is to get the code to compile first, whether or not > it will actually work under Cygwin. I do not believe there should be a > specific reason why this code could not be made to compile/link - even > if the end result may not work as intended... But I hope to find that > out once I successfully create the resulting executables. We don't do any testing on the Cygwin platform, so that you see errors is not entirely surprising. Our supported Win32 builds are cross compiled with the Mingw32 toolchain, so I'd really suggest using the latter unless you're interested in digging into the build system to figure out why Cygwin is failing. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2/4] logical: Create helper virStorageBackendLogicalParseVolDevice
On Fri, Jan 22, 2016 at 05:21:04PM -0500, John Ferlan wrote: > Create a helper routine in order to parse the 'device' string contained > within the generated 'lvs' output string. > > A future patch would then be able to avoid the code more cleanly > > Signed-off-by: John Ferlan> --- > src/storage/storage_backend_logical.c | 186 > +++--- > 1 file changed, 104 insertions(+), 82 deletions(-) > > diff --git a/src/storage/storage_backend_logical.c > b/src/storage/storage_backend_logical.c > index 76ea00a..bf67faf 100644 > --- a/src/storage/storage_backend_logical.c > +++ b/src/storage/storage_backend_logical.c > @@ -72,21 +72,115 @@ struct virStorageBackendLogicalPoolVolData { > }; > > static int > -virStorageBackendLogicalMakeVol(char **const groups, > -void *opaque) > +virStorageBackendLogicalParseVolDevice(virStorageVolDefPtr vol, > + char **const groups, > + int nextents, > + unsigned long long size, > + unsigned long long length) > { You've called the new helper *ParseVolDevice, but it actually does more than that. I would rather see it like ParseExtents and also move all the extents related code into this function (parsing length and size) [...] > +/* vars[0] is skipped */ > +for (i = 0; i < nextents; i++) { > +size_t j; > +int len; > +char *offset_str = NULL; > + > +j = (i * 2) + 1; > +len = vars[j].rm_eo - vars[j].rm_so; > +p[vars[j].rm_eo] = '\0'; > + > +if (VIR_STRNDUP(vol->source.extents[vol->source.nextent].path, > +p + vars[j].rm_so, len) < 0) > +goto cleanup; > + > +len = vars[j + 1].rm_eo - vars[j + 1].rm_so; > +if (VIR_STRNDUP(offset_str, p + vars[j + 1].rm_so, len) < 0) > +goto cleanup; > + > +if (virStrToLong_ull(offset_str, NULL, 10, ) < 0) { > +virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("malformed volume extent offset value")); > +VIR_FREE(offset_str); > +goto cleanup; > +} > + > +VIR_FREE(offset_str); > + > +vol->source.extents[vol->source.nextent].start = offset * size; > +vol->source.extents[vol->source.nextent].end = (offset * size) + > length; > +vol->source.nextent++; This would be much nicer to be done using VIR_APPEND_ELEMENT(). I know that this commit is just a code movement so this should be done in separate commit. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 4/4] logical: Adjust regex for devices
On Fri, Jan 22, 2016 at 05:21:06PM -0500, John Ferlan wrote: > From: Joe Harvell> > Since our 'devices' parsing logic now will use the 'nextents' (or > lvs 'stripes' output) to decide whether or not to parse the field, > use the regex of "(\\S*)" (e.g. zero or more) instead of "(\\S+)" > (1 or more) when grabbing the 'groups[3]' or 'devices' field. > > Signed-off-by: John Ferlan > --- > src/storage/storage_backend_logical.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/storage/storage_backend_logical.c > b/src/storage/storage_backend_logical.c > index 3010f58..2af3e69 100644 > --- a/src/storage/storage_backend_logical.c > +++ b/src/storage/storage_backend_logical.c > @@ -333,7 +333,8 @@ virStorageBackendLogicalFindLVs(virStoragePoolObjPtr pool, > *striped, so "," is not a suitable separator either (rhbz 727474). > */ > const char *regexes[] = { > - > "^\\s*(\\S+)#(\\S*)#(\\S+)#(\\S+)#([0-9]+)#(\\S+)#([0-9]+)#([0-9]+)#(\\S+)#?\\s*$" > +/* name orig uuid devs stripes segsz vgextsz sz > lvattr */ > + > "^\\s*(\\S+)#(\\S*)#(\\S+)#(\\S*)#([0-9]+)#(\\S+)#([0-9]+)#([0-9]+)#(\\S+)#?\\s*$" > }; > int vars[] = { > 9 This could be squashed into the previous commit and if you are creating a comment with labels, please use the exact names that lvs uses. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 0/4] various test fixes and handling of input devices
ping -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] lxc: don't try to hide parent cgroups inside container
On 26.01.2016 15:37, Daniel P. Berrange wrote: > On the host when we start a container, it will be > placed in a cgroup path of > >/machine.slice/machine-lxc\x2ddemo.scope > > under /sys/fs/cgroup/* > > Inside the containers' namespace we need to setup > /sys/fs/cgroup mounts, and currently will bind > mount /machine.slice/machine-lxc\x2ddemo.scope on > the host to appear as / in the container. > > While this may sound nice, it confuses applications > dealing with cgroups, because /proc/$PID/cgroup > now does not match the directory in /sys/fs/cgroup > > This particularly causes problems for systems and > will make it create repeated path components in > the cgroup for apps run in the container eg > > > /machine.slice/machine-lxc\x2ddemo.scope/machine.slice/machine-lxc\x2ddemo.scope/user.slice/user-0.slice/session-61.scope > > This also causes any systemd service that uses > sd-notify to fail to start, because when systemd > receives the notification it won't be able to > identify the corresponding unit it came from. > In particular this break rabbitmq-server startup > > Future kernels will provide proper cgroup namespacing > which will handle this problem, but until that time > we should not try to play games with hiding parent > cgroups. > > Signed-off-by: Daniel P. Berrange> --- > src/libvirt_private.syms | 2 +- > src/lxc/lxc_container.c | 2 +- > src/util/vircgroup.c | 9 - > src/util/vircgroup.h | 6 +++--- > 4 files changed, 9 insertions(+), 10 deletions(-) ACK Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 4/4] tests: update test XML files to follow input changes
On 12.01.2016 13:06, Pavel Hrdina wrote: > Ok, so consider this squashed in, I've forgot to regenerate the tests after > rebase to current HEAD and there seems to be new tests. > > --- > Not only that. This needs to be squashed in too: diff --git a/tests/xlconfigdata/test-paravirt-cmdline-bogus-extra-root.xml b/tests/xlconfigdata/test-paravirt-cmdline-bogus-extra-root.xml index f4ab9e6..fdf84c3 100644 --- a/tests/xlconfigdata/test-paravirt-cmdline-bogus-extra-root.xml +++ b/tests/xlconfigdata/test-paravirt-cmdline-bogus-extra-root.xml @@ -28,5 +28,7 @@ + + diff --git a/tests/xlconfigdata/test-paravirt-cmdline-extra-root.xml b/tests/xlconfigdata/test-paravirt-cmdline-extra-root.xml index f4ab9e6..fdf84c3 100644 --- a/tests/xlconfigdata/test-paravirt-cmdline-extra-root.xml +++ b/tests/xlconfigdata/test-paravirt-cmdline-extra-root.xml @@ -28,5 +28,7 @@ + + diff --git a/tests/xlconfigdata/test-paravirt-cmdline.xml b/tests/xlconfigdata/test-paravirt-cmdline.xml index f4ab9e6..fdf84c3 100644 --- a/tests/xlconfigdata/test-paravirt-cmdline.xml +++ b/tests/xlconfigdata/test-paravirt-cmdline.xml @@ -28,5 +28,7 @@ + + Moreover, this patch MUST be merged into the previous one as otherwise it would introduce a build breakage. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 0/4] various test fixes and handling of input devices
On 12.01.2016 12:59, Pavel Hrdina wrote: > Some of the patches from v1 series are already pushed, only those 4 left for > review. > > Pavel Hrdina (4): > tests: use virtTestDifferenceFull in tests where we have output file > tests: add some missing tests to qemuxml2xmltest > device: cleanup input device code > tests: update test XML files to follow input changes > > src/Makefile.am| 4 +- > src/conf/domain_conf.c | 82 > +++--- > src/libxl/libxl_domain.c | 5 ++ > src/qemu/qemu_domain.c | 23 ++ > src/xen/xen_driver.c | 5 ++ > src/xenapi/xenapi_driver.c | 5 ++ > src/xenconfig/xen_common.c | 22 ++ > src/xenconfig/xen_common.h | 2 + > .../disk_snapshot_redefine.xml | 2 + > .../external_vm_redefine.xml | 2 + > tests/domainsnapshotxml2xmlout/full_domain.xml | 2 + > tests/domainsnapshotxml2xmlout/metadata.xml| 2 + > tests/domainsnapshotxml2xmltest.c | 2 +- > tests/interfacexml2xmltest.c | 2 +- > tests/lxcconf2xmltest.c| 2 +- > tests/nodedevxml2xmltest.c | 2 +- > tests/qemuhotplugtest.c| 11 ++- > .../qemuhotplug-hotplug-base+disk-scsi.xml | 2 + > .../qemuhotplug-hotplug-base+disk-usb.xml | 2 + > .../qemuhotplug-hotplug-base+disk-virtio.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-bios-nvram.xml | 2 + > .../qemuxml2argvdata/qemuxml2argv-blkdeviotune.xml | 2 + > .../qemuxml2argv-blkiotune-device.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-blkiotune.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-boot-cdrom.xml | 2 + > .../qemuxml2argvdata/qemuxml2argv-boot-floppy.xml | 2 + > .../qemuxml2argv-boot-menu-disable.xml | 2 + > .../qemuxml2argv-boot-menu-enable-with-timeout.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-boot-multi.xml | 2 + > .../qemuxml2argvdata/qemuxml2argv-boot-network.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-boot-order.xml | 2 + > .../qemuxml2argv-channel-guestfwd.xml | 2 + > .../qemuxml2argv-channel-virtio.xml| 2 + > .../qemuxml2argv-chardev-label.xml | 2 + > .../qemuxml2argv-clock-catchup.xml | 2 + > .../qemuxml2argv-clock-localtime.xml | 2 + > .../qemuxml2argv-clock-timer-hyperv-rtc.xml| 2 + > tests/qemuxml2argvdata/qemuxml2argv-clock-utc.xml | 2 + > .../qemuxml2argv-console-compat.xml| 2 + > .../qemuxml2argv-console-virtio-many.xml | 2 + > .../qemuxml2argv-cpu-eoi-disabled.xml | 2 + > .../qemuxml2argv-cpu-eoi-enabled.xml | 2 + > .../qemuxml2argv-cpu-host-kvmclock.xml | 2 + > .../qemuxml2argv-cpu-host-model-features.xml | 2 + > .../qemuxml2argv-cpu-host-passthrough-features.xml | 2 + > .../qemuxml2argvdata/qemuxml2argv-cpu-kvmclock.xml | 2 + > .../qemuxml2argv-cpu-numa-disjoint.xml | 2 + > .../qemuxml2argv-cpu-numa-memshared.xml| 2 + > ...xml2argv-cputune-iothreadsched-zeropriority.xml | 2 + > .../qemuxml2argv-cputune-numatune.xml | 2 + > .../qemuxml2argv-cputune-zero-shares.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-cputune.xml| 2 + > .../qemuxml2argv-disk-active-commit.xml| 2 + > tests/qemuxml2argvdata/qemuxml2argv-disk-aio.xml | 2 + > .../qemuxml2argv-disk-cdrom-empty.xml | 2 + > tests/qemuxml2argvdata/qemuxml2argv-disk-cdrom.xml | 2 + > .../qemuxml2argv-disk-copy_on_read.xml | 2 + > .../qemuxml2argv-disk-drive-boot-cdrom.xml | 3 + > .../qemuxml2argv-disk-drive-boot-disk.xml | 3 + > .../qemuxml2argv-disk-drive-cache-directsync.xml | 2 + > .../qemuxml2argv-disk-drive-cache-unsafe.xml | 2 + > .../qemuxml2argv-disk-drive-cache-v2-none.xml | 2 + > .../qemuxml2argv-disk-drive-cache-v2-wb.xml| 2 + > .../qemuxml2argv-disk-drive-cache-v2-wt.xml| 2 + > .../qemuxml2argv-disk-drive-copy-on-read.xml | 2 + > ...muxml2argv-disk-drive-error-policy-enospace.xml | 2 + > .../qemuxml2argv-disk-drive-error-policy-stop.xml | 2 + > ...rgv-disk-drive-error-policy-wreport-rignore.xml | 2 + > .../qemuxml2argv-disk-drive-fat.xml| 2 + > .../qemuxml2argv-disk-drive-fmt-qcow.xml | 2 + > .../qemuxml2argv-disk-drive-network-gluster.xml| 2 + > .../qemuxml2argv-disk-drive-network-iscsi-auth.xml | 2 + > .../qemuxml2argv-disk-drive-network-iscsi.xml | 2 + > .../qemuxml2argv-disk-drive-network-nbd-export.xml | 2 + > ...xml2argv-disk-drive-network-nbd-ipv6-export.xml | 2 + >
Re: [libvirt] [PATCH v2 3/4] device: cleanup input device code
On 12.01.2016 12:59, Pavel Hrdina wrote: > The current code was a little bit odd. At first we've removed all > possible implicit input devices from domain definition to add them later > back if there was any graphics device defined while parsing XML > description. That's not all, while formating domain definition to XML > description we at first ignore any input devices with bus different to > USB and VIRTIO and few lines later we add implicit input devices to XML. > > This seems to me as a lot of code for nothing. This patch may look > to be more complicated than original approach, but this is a preferred > way to modify/add driver specific stuff only in those drivers and not > deal with them in common parsing/formating functions. > > The update is to add those implicit input devices into config XML to > follow the real HW configuration visible by guest OS. > > There was also inconsistence between our behavior and QEMU's in the way, > that in QEMU there is no way how to disable those implicit input devices > for x86 architecture and they are available always, even without graphics > device. This applies also to XEN hypervisor. VZ driver already does its > part by putting correct implicit devices into live XML. > > Signed-off-by: Pavel Hrdina> --- > src/Makefile.am| 4 +-- > src/conf/domain_conf.c | 82 > +- > src/libxl/libxl_domain.c | 5 +++ > src/qemu/qemu_domain.c | 23 + > src/xen/xen_driver.c | 5 +++ > src/xenapi/xenapi_driver.c | 5 +++ > src/xenconfig/xen_common.c | 22 + > src/xenconfig/xen_common.h | 2 ++ > 8 files changed, 72 insertions(+), 76 deletions(-) > > diff --git a/src/Makefile.am b/src/Makefile.am > index aa5ab69..522c56d 100644 > --- a/src/Makefile.am > +++ b/src/Makefile.am > @@ -1211,7 +1211,7 @@ libvirt_driver_xen_impl_la_CFLAGS = > \ > -I$(srcdir)/xenconfig \ > $(AM_CFLAGS) > libvirt_driver_xen_impl_la_LDFLAGS = $(AM_LDFLAGS) > -libvirt_driver_xen_impl_la_LIBADD = $(XEN_LIBS) > +libvirt_driver_xen_impl_la_LIBADD = $(XEN_LIBS) libvirt_xenconfig.la > libvirt_driver_xen_impl_la_SOURCES = $(XEN_DRIVER_SOURCES) > endif WITH_XEN > > @@ -1272,7 +1272,7 @@ if WITH_XENAPI > noinst_LTLIBRARIES += libvirt_driver_xenapi.la > libvirt_la_BUILT_LIBADD += libvirt_driver_xenapi.la > libvirt_driver_xenapi_la_CFLAGS = $(LIBXENSERVER_CFLAGS) $(CURL_CFLAGS) \ > - -I$(srcdir)/conf $(AM_CFLAGS) > + -I$(srcdir)/conf -I$(srcdir)/xenconfig $(AM_CFLAGS) > libvirt_driver_xenapi_la_LDFLAGS = $(AM_LDFLAGS) > libvirt_driver_xenapi_la_LIBADD = $(LIBXENSERVER_LIBS) $(CURL_LIBS) > libvirt_driver_xenapi_la_SOURCES = $(XENAPI_DRIVER_SOURCES) > diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c > index cf5c9f6..ee49ba7 100644 > --- a/src/libxl/libxl_domain.c > +++ b/src/libxl/libxl_domain.c > @@ -35,6 +35,7 @@ > #include "virstring.h" > #include "virtime.h" > #include "locking/domain_lock.h" > +#include "xenconfig/xen_common.h" You don't need to do this crossdir include. -Ixenconfig is already on the compiler's command line. > > #define VIR_FROM_THIS VIR_FROM_LIBXL > > @@ -396,6 +397,10 @@ libxlDomainDefPostParse(virDomainDefPtr def, > def->consoles[0] = chrdef; > } > > +/* add implicit input devices */ > +if (xenDomainDefAddImplicitInputDevice(def) < 0) > +return -1; > + > /* memory hotplug tunables are not supported by this driver */ > if (virDomainDefCheckUnsupportedMemoryHotplug(def) < 0) > return -1; Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v7 00/13] qemu: Add quorum support to libvirt
On Wed, Jan 20, 2016 at 16:47:56 +0200, Alberto Garcia wrote: > Hi Peter, Hi Alberto, > > I'm the current maintainer of Quorum in QEMU and I'd like to try to > answer some of your comments. > > On Fri, Jan 08, 2016 at 06:20:04PM +0100, Peter Krempa wrote: > > > So I have a few comments/observations regarding the quorum block > > driver in qemu and it's usability. > > > > At first I'd like to as you to describe your use case a bit > > more. I'm currently lacking the motivation to do anything about > > this, as the series is just partial and I don't really see any > > advantage of using the qorum driver at all and can't come up with > > any useful use case. > > > > Also a good use case is usually a good reason to drive development > > of a feature and I'm afraid that this could become abandoned without > > any real use. > > The original use case for which Quorum was designed was a data center > doing redundancy with storage in multiple separate rooms shared using > NFS. There are quite a few existing networked storage cluster solutions, wouldn't that be a more reasonable option? > One of the issues that the customer was facing was not only problems > in the file servers themselves but -mainly- data corruption accross > the network. Quorum can correct this on the fly and is able to Whoah. Data corruption accross network? I'm not quite sure whether I'd use this to cover up a problem with the storage technology or network rather than just fix the root cause. If you have 3 copies, and manage to have a sector where all 3 differ then the quorum driver won't help. And it will make it even harder to find any possible problems. > identify which one of the file servers is causing the problem without > having to rebuild a whole array (like it would be the case with RAID). Libvirt tries to stay out of doing any usage policy, so this might be considered a feature. The series needs then polishing to add the rebuild capability and quorum event handling so that sub-quorate failed operations are properly reported. I think the rebuild is actually a useful in most cases, since it ensures that all copies are the same. > > Quorum is also used for the COLO block replication functionality > currently being discussed in QEMU: > >http://wiki.qemu.org/Features/BlockReplication Oh, so it actually uses the FIFO mode of quorum which I didn't know about. So basically the quorum driver for COLO serves as a block duplicator so that one write is sent to the "primary disk" and second write is sent using nbd to the arbiter rather than using a blockdev-mirror job. Interresting approach, but COLO stuf was not really yet considered in libvirt. Btw, this series explicitly forbids using less than 2 as vote threshold. > > > 1) No traking of integrity > > As the quorum members don't have headers, failed quorum members > > are not recorded and remembered. The user or management app then > > has to do this externally for given storage devices. > > > > 2) No internal tracking of quorum members > > Members of the quorum don't have any header marking them > > as such and thus any images may be mixed together with > > unforseen/catastrophic results. Higher level management then > > needs to take the role of remembering which images belong > > together. Reimplementing this looks like reimplementing a > > distriuted storage system to me. > > That's right, Quorum does not have its own file format and was > designed to work with any driver or protocol that QEMU supports, so > I'm not sure if there's much that can be done about this. > > > 3) Lack of auto-resync: > > Once the quorum get's few inconsistencies it does not > > automatically resync like the linux MD driver. With the current > > implementation the only way to resync this would be to issue a > > block-mirror (blockCopy) to /dev/null so that all blocks are > > read and rewritten to the identical copy. This also requires a > > user action. > > > > Additionally the member of the quorum is not ignored if it was > > out of sync in any previous time without being resynced allowing > > for split-brain/corruption scenarios. > > Quorum can fix errors on the fly (there's the 'rewrite-corrupted' flag > for that), so in those cases no manual intervention is required. > > If we want a way to auto-resync a complete image that should be > doable, I believe it's relatively simple to implement in QEMU > (depending on the semantics). > > For the manual resync I also agree that it would be good to have a > simple API to do that in case the user wants to do it manually. That > can be done. This would be beneficial to have if you don't have 'rewrite-corrupted' enabled. In that case you want a way to enable it and then perhaps initiate a full read so that every block gets checked. > > > 4) Necessity for at least 3 copies > > Since a majority needs to win in a vote, you need at least 3 > > member disks for this
[libvirt] high outage times for qemu virtio network links during live migration, trying to debug
Hi, I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack. If I live-migrate a guest with virtio network interfaces, I see a ~1200msec delay in processing the network packets, and several hundred of them get dropped. I get the dropped packets, but I'm not sure why the delay is there. I instrumented qemu and libvirt, and the strange thing is that this delay seems to happen before qemu actually starts doing any migration-related work. (i.e. before qmp_migrate() is called) Looking at my timestamps, the start of the glitch seems to coincide with libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of the glitch occurs when the migration is complete and we're up and running on the destination. My question is, why doesn't qemu continue processing virtio packets while the dirty page scanning and memory transfer over the network is proceeding? Thanks, Chris (Please CC me on responses, I'm not subscribed to the lists.) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 7/8] qemu: hotplug: Drop !QEMU_CAPS_DEVICE code
On 01/22/2016 02:11 PM, Cole Robinson wrote: On 01/22/2016 02:09 PM, Cole Robinson wrote: Nowadays we only support qemu 0.12.0+ which provides QEMU_CAPS_DEVICE, so this is all dead code. --- src/qemu/qemu_hotplug.c | 480 +++- 1 file changed, 144 insertions(+), 336 deletions(-) git show -w attached Just nitpicking, but seeing it this way allowed me to randomly notice that the changes have created several instances of code like this: if (!detach->info.alias) { if (qemuAssignDeviceControllerAlias(vm->def, priv->qemuCaps, detach) < 0) goto cleanup; } Maybe combine those like this: if (!detach->info.alias && (qemuAssignDeviceControllerAlias(vm->def, priv->qemuCaps, detach) < 0)) goto cleanup; -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] vircgroup: Finish renaming of virCgroupIsolateMount
In dc576025c360 we renamed virCgroupIsolateMount function to virCgroupBindMount. However, we forgot about one occurrence in section of the code which provides stubs for platforms without support for CGroups like *BSD for instance. Signed-off-by: Michal Privoznik--- Pushed under build-breaker and trivial rules. src/util/vircgroup.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index d7f4065..9b56e27 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -4882,9 +4882,9 @@ virCgroupGetFreezerState(virCgroupPtr group ATTRIBUTE_UNUSED, int -virCgroupIsolateMount(virCgroupPtr group ATTRIBUTE_UNUSED, - const char *oldroot ATTRIBUTE_UNUSED, - const char *mountopts ATTRIBUTE_UNUSED) +virCgroupBindMount(virCgroupPtr group ATTRIBUTE_UNUSED, + const char *oldroot ATTRIBUTE_UNUSED, + const char *mountopts ATTRIBUTE_UNUSED) { virReportSystemError(ENOSYS, "%s", _("Control groups not supported on this platform")); -- 2.4.10 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
On Tue, Jan 26, 2016 at 10:41:12AM -0600, Chris Friesen wrote: > Hi, > > I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack. > > If I live-migrate a guest with virtio network interfaces, I see a ~1200msec > delay in processing the network packets, and several hundred of them get > dropped. I get the dropped packets, but I'm not sure why the delay is > there. > > I instrumented qemu and libvirt, and the strange thing is that this delay > seems to happen before qemu actually starts doing any migration-related > work. (i.e. before qmp_migrate() is called) > > Looking at my timestamps, the start of the glitch seems to coincide with > libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of the > glitch occurs when the migration is complete and we're up and running on the > destination. > > My question is, why doesn't qemu continue processing virtio packets while > the dirty page scanning and memory transfer over the network is proceeding? The qemuDomainMigratePrepareTunnel3Params() method is responsible for starting the QEMU process on the target host. This should not normally have any impact on host networking connectivity, since the CPUs on that target QEMU wouldn't be running at that point. Perhaps the mere act of starting QEMU and plugging the TAP dev into the network on the target host causes some issue though ? eg are you using a bridge that is doing STP or something like that. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] high outage times for qemu virtio network links during live migration, trying to debug
On 26/01/2016 17:41, Chris Friesen wrote: > I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack. > > If I live-migrate a guest with virtio network interfaces, I see a > ~1200msec delay in processing the network packets, and several hundred > of them get dropped. I get the dropped packets, but I'm not sure why > the delay is there. > > I instrumented qemu and libvirt, and the strange thing is that this > delay seems to happen before qemu actually starts doing any > migration-related work. (i.e. before qmp_migrate() is called) > > Looking at my timestamps, the start of the glitch seems to coincide with > libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of > the glitch occurs when the migration is complete and we're up and > running on the destination. > > My question is, why doesn't qemu continue processing virtio packets > while the dirty page scanning and memory transfer over the network is > proceeding? QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd have no delay---only dropped packets. Or am I missing something? Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] high outage times for qemu virtio network links during live migration, trying to debug
On 01/26/2016 10:50 AM, Paolo Bonzini wrote: On 26/01/2016 17:41, Chris Friesen wrote: I'm using libvirt (1.2.12) with qemu (2.2.0) in the context of OpenStack. If I live-migrate a guest with virtio network interfaces, I see a ~1200msec delay in processing the network packets, and several hundred of them get dropped. I get the dropped packets, but I'm not sure why the delay is there. I instrumented qemu and libvirt, and the strange thing is that this delay seems to happen before qemu actually starts doing any migration-related work. (i.e. before qmp_migrate() is called) Looking at my timestamps, the start of the glitch seems to coincide with libvirtd calling qemuDomainMigratePrepareTunnel3Params(), and the end of the glitch occurs when the migration is complete and we're up and running on the destination. My question is, why doesn't qemu continue processing virtio packets while the dirty page scanning and memory transfer over the network is proceeding? QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd have no delay---only dropped packets. Or am I missing something? I have separate timestamps embedded in the packet for when it was sent and when it was echoed back by the target (which is the one being migrated). What I'm seeing is that packets to the guest are being sent every msec, but they get delayed somewhere for over a second on the way to the destination VM while the migration is in progress. Once the migration is over, a bunch of packets get delivered to the app in the guest and are then processed all at once and echoed back to the sender in a big burst (and a bunch of packets are dropped, presumably due to a buffer overflowing somewhere). For comparison, we have a DPDK-based fastpath NIC type that we added (sort of like vhost-net), and it continues to process packets while the dirty page scanning is going on. Only the actual cutover affects it. Chris -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] high outage times for qemu virtio network links during live migration, trying to debug
On 26/01/2016 18:21, Chris Friesen wrote: >>> >>> My question is, why doesn't qemu continue processing virtio packets >>> while the dirty page scanning and memory transfer over the network is >>> proceeding? >> >> QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd >> have no delay---only dropped packets. Or am I missing something? > > I have separate timestamps embedded in the packet for when it was sent > and when it was echoed back by the target (which is the one being > migrated). What I'm seeing is that packets to the guest are being sent > every msec, but they get delayed somewhere for over a second on the way > to the destination VM while the migration is in progress. Once the > migration is over, a bunch of packets get delivered to the app in the > guest and are then processed all at once and echoed back to the sender > in a big burst (and a bunch of packets are dropped, presumably due to a > buffer overflowing somewhere). That doesn't exclude a bug somewhere in net/ code. It doesn't pinpoint it to QEMU or vhost-net. In any case, what I would do is to use tracing at all levels (guest kernel, QEMU, host kernel) for packet rx and tx, and find out at which layer the hiccup appears. Paolo > For comparison, we have a DPDK-based fastpath NIC type that we added > (sort of like vhost-net), and it continues to process packets while the > dirty page scanning is going on. Only the actual cutover affects it. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] high outage times for qemu virtio network links during live migration, trying to debug
On 01/26/2016 11:31 AM, Paolo Bonzini wrote: On 26/01/2016 18:21, Chris Friesen wrote: My question is, why doesn't qemu continue processing virtio packets while the dirty page scanning and memory transfer over the network is proceeding? QEMU (or vhost) _are_ processing virtio traffic, because otherwise you'd have no delay---only dropped packets. Or am I missing something? I have separate timestamps embedded in the packet for when it was sent and when it was echoed back by the target (which is the one being migrated). What I'm seeing is that packets to the guest are being sent every msec, but they get delayed somewhere for over a second on the way to the destination VM while the migration is in progress. Once the migration is over, a bunch of packets get delivered to the app in the guest and are then processed all at once and echoed back to the sender in a big burst (and a bunch of packets are dropped, presumably due to a buffer overflowing somewhere). That doesn't exclude a bug somewhere in net/ code. It doesn't pinpoint it to QEMU or vhost-net. In any case, what I would do is to use tracing at all levels (guest kernel, QEMU, host kernel) for packet rx and tx, and find out at which layer the hiccup appears. Is there a straightforward way to trace packet processing in qemu (preferably with millisecond-accurate timestamps)? Chris -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] high outage times for qemu virtio network links during live migration, trying to debug
On 26/01/2016 18:49, Chris Friesen wrote: >> >> That doesn't exclude a bug somewhere in net/ code. It doesn't pinpoint >> it to QEMU or vhost-net. >> >> In any case, what I would do is to use tracing at all levels (guest >> kernel, QEMU, host kernel) for packet rx and tx, and find out at which >> layer the hiccup appears. > > Is there a straightforward way to trace packet processing in qemu > (preferably with millisecond-accurate timestamps)? You can use tracing (docs/tracing.txt). There are two possibilities: 1) use existing low-level virtio tracepoints: virtqueue_fill (end of tx and rx operation) and virtqueue_pop (beginning of tx operation). 2) add tracepoints to hw/net/virtio-net.c (virtio_net_flush_tx, virtio_net_tx_complete, virtio_net_receive) or net/tap.c (tap_receive, tap_receive_iov, tap_send). Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] Fix libvirtd free() segfault when migrating guest with deleted open vswitch port
libvirtd crashes on free()ing portData for an open vswitch port if that port was deleted. To reproduce: ovs-vsctl del-port vnet0 virsh migrate --live kvm1 qemu+ssh://dstHost/system Error message: libvirtd: *** Error in `/usr/sbin/libvirtd': free(): invalid pointer: 0x03ff90001e20 *** The problem is that virCommandRun can return an empty string in the event that the port being queried does not exist. When this happens then we are unconditionally overwriting a newline character at position strlen()-1. When strlen is 0, we overwrite memory that does not belong to the string. The fix: Only overwrite the newline if the string is not empty. Reviewed-by: Bjoern WalkSigned-off-by: Jason J. Herne --- src/util/virnetdevopenvswitch.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/util/virnetdevopenvswitch.c b/src/util/virnetdevopenvswitch.c index 6780fb5..0f640d0 100644 --- a/src/util/virnetdevopenvswitch.c +++ b/src/util/virnetdevopenvswitch.c @@ -222,8 +222,10 @@ int virNetDevOpenvswitchGetMigrateData(char **migrate, const char *ifname) goto cleanup; } -/* Wipeout the newline */ -(*migrate)[strlen(*migrate) - 1] = '\0'; +/* Wipeout the newline, if it exists */ +if (strlen(*migrate) > 0) { +(*migrate)[strlen(*migrate) - 1] = '\0'; +} ret = 0; cleanup: virCommandFree(cmd); -- 1.9.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] high outage times for qemu virtio network links during live migration, trying to debug
On 01/26/2016 10:45 AM, Daniel P. Berrange wrote: On Tue, Jan 26, 2016 at 10:41:12AM -0600, Chris Friesen wrote: My question is, why doesn't qemu continue processing virtio packets while the dirty page scanning and memory transfer over the network is proceeding? The qemuDomainMigratePrepareTunnel3Params() method is responsible for starting the QEMU process on the target host. This should not normally have any impact on host networking connectivity, since the CPUs on that target QEMU wouldn't be running at that point. Perhaps the mere act of starting QEMU and plugging the TAP dev into the network on the target host causes some issue though ? eg are you using a bridge that is doing STP or something like that. Well, looks like your suspicions were correct. Our fast-path backend was mistakenly sending out a GARP when the backend was initialized as part of creating the qemu process on the target host. Oops. Thanks for your help. Chris -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 2/9] hostdev: Use common reattach code to rollback prepare errors
Need something here... Perhaps to the effect of instead of inlining most of virHostdevOnlyReattachPCIDevice, call virHostdevOnlyReattachPCIDevice. One extra "call" or check being made by doing this is the virPCIDeviceGetStubDriver check and virPCIDeviceWaitForCleanup. Are there any "timing" considerations from this path? Now for every PCI hostdev found, we'll call *WaitForCleanup. That should perhaps be noted. This looks reasonable otherwise. John On 01/25/2016 11:20 AM, Andrea Bolognani wrote: > --- > src/util/virhostdev.c | 14 +- > 1 file changed, 1 insertion(+), 13 deletions(-) > > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c > index 66629b4..586937e 100644 > --- a/src/util/virhostdev.c > +++ b/src/util/virhostdev.c > @@ -799,19 +799,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); > > -if (virPCIDeviceGetManaged(dev)) { > -/* NB: This doesn't actually re-bind to original driver, just > - * unbinds from the stub driver > - */ > -VIR_DEBUG("Reattaching managed PCI device %s", > - virPCIDeviceGetName(dev)); > -ignore_value(virPCIDeviceReattach(dev, > - hostdev_mgr->activePCIHostdevs, > - > hostdev_mgr->inactivePCIHostdevs)); > -} else { > -VIR_DEBUG("Not reattaching unmanaged PCI device %s", > - virPCIDeviceGetName(dev)); > -} > +ignore_value(virHostdevOnlyReattachPCIDevice(hostdev_mgr, dev, > true)); > } > > cleanup: > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 1/9] hostdev: Add virHostdevOnlyReattachPCIDevice()
w/r/t: your [0/7] from initial series... As much as you don't want to keep living Groundhog Day - resolution of bugs like this are job security :-)... The subtleties of hostdev device Attach, Reattach, ReAttach, and Detach are such a tangle morass of shinola (ref: Steve Martin in "The Jerk")... w/r/t Suggestions for deamon restart issues... Seems like we need a way to save/restore the hostdev_mgr active/inactive lists using XML/JSON similar to how domains, storage, etc. handle it. Guess I just assumed that was already there ;-) since /var/run/libvirt/hostdevmgr exists. It seems that network stuff can be restored - virHostdevNetConfigRestore. Do you really think this series should be "held up" waiting to create some sort of status tracking? On 01/25/2016 11:20 AM, Andrea Bolognani wrote: > This function replaces virHostdevReattachPCIDevice() and, unlike it, > does not perform list manipulation, leaving it to the calling function. > > This means virHostdevReAttachPCIDevices() had to be updated to cope > with the updated requirements. > --- > src/util/virhostdev.c | 136 > +- > 1 file changed, 90 insertions(+), 46 deletions(-) > Since I reviewed them all... I think the comment changes from 7/9 should just be inlined here and patch 4 instead of a separate patch > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c > index f31ad41..66629b4 100644 > --- a/src/util/virhostdev.c > +++ b/src/util/virhostdev.c > @@ -526,6 +526,74 @@ virHostdevNetConfigRestore(virDomainHostdevDefPtr > hostdev, > return ret; > } > > +/** > + * virHostdevOnlyReattachPCIDevice: Why not just reuse the function name you deleted? IOW: Is there a reason for "Only"? (not that I'm one that can complain about naming functions, but this just seems strange). > + * @mgr: hostdev manager > + * @pci: PCI device to be reattached Interesting ... In 2 instances, this will be a pointer to the "copy" element, while in the third instance this will be the "actual" on inactive list element. For a copy element, we'd *have* to search inactive; however, for an 'actual' we don't "need" to. > + * @skipUnmanaged: whether to skip unmanaged devices > + * > + * Reattach a PCI device to the host. > + * > + * This function only performs the base reattach steps that are required > + * regardless of whether the device is being detached from a domain or > + * had been simply detached from the host earlier. > + * > + * @pci must have already been marked as inactive, and the PCI related > + * parts of @mgr (inactivePCIHostdevs, activePCIHostdevs) must have been > + * locked beforehand using virObjectLock(). > + * > + * Returns: 0 on success, <0 on failure > + */ > +static int > +virHostdevOnlyReattachPCIDevice(virHostdevManagerPtr mgr, > +virPCIDevicePtr pci, > +bool skipUnmanaged) > +{ > +virPCIDevicePtr actual; > +int ret = -1; > + > +/* Retrieve the actual device from the inactive list */ > +if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci))) { > +VIR_DEBUG("PCI device %s is not marked as inactive", > + virPCIDeviceGetName(pci)); This is tricky - the only time we care about the return status (now) is when skipUnmanaged == false, a/k/a the path where we pass the onlist element.. In my first pass through the changes I thought - oh no if we return -1, then this would be a path that could get that generic libvirt failed for some reason message; however, closer inspection noted that we guaranteed it was on the inactive list before calling here. > +goto out; > +} > + > +/* Skip unmanaged devices if asked to do so */ > +if (!virPCIDeviceGetManaged(actual) && skipUnmanaged) { , unrelated and existing - virPCIDeviceGetManaged probably should return bool instead of unsigned int > +VIR_DEBUG("Not reattaching unmanaged PCI device %s", > + virPCIDeviceGetName(actual)); > +ret = 0; > +goto out; > +} > + > +/* Wait for device cleanup if it is qemu/kvm */ > +if (virPCIDeviceGetStubDriver(actual) == VIR_PCI_STUB_DRIVER_KVM) { > +int retries = 100; > +while (virPCIDeviceWaitForCleanup(actual, "kvm_assigned_device") > + && retries) { > +usleep(100*1000); > +retries--; > +} > +} Existing, but if retries == 0, then cleanup never succeeded; however, perhaps one can assume that the subsequent call would fall over and play dead? Not that it really gets checked... > + > +VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(actual)); > +if (virPCIDeviceReattach(actual, mgr->activePCIHostdevs, > + mgr->inactivePCIHostdevs) < 0) { > +virErrorPtr err = virGetLastError(); > +VIR_ERROR(_("Failed to reattach PCI device %s: %s"), > + virPCIDeviceGetName(actual), > +
Re: [libvirt] [PATCH v2 3/9] hostdev: Use common reattach code in virHostdevPCINodeDeviceReAttach()
On 01/25/2016 11:20 AM, Andrea Bolognani wrote: > This ensures the behavior for reattach is consistent, no matter how it > was triggered (eg. 'virsh nodedev-reattach', 'virsh detach-device' or > shutdown of a domain that is configured to use hostdevs). Similar to 2/9 - using virHostdevOnlyReattachPCIDevice will result in a call to virPCIDeviceGetStubDriver and virPCIDeviceWaitForCleanup - for this path I assume this is thus desired. > --- > src/util/virhostdev.c | 33 ++--- > 1 file changed, 30 insertions(+), 3 deletions(-) > > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c > index 586937e..f40d636 100644 > --- a/src/util/virhostdev.c > +++ b/src/util/virhostdev.c > @@ -1637,10 +1637,24 @@ virHostdevPCINodeDeviceDetach(virHostdevManagerPtr > hostdev_mgr, > return ret; > } > > +/** > + * virHostdevPCINodeDeviceReAttach: ^^ Oy ReAttach vs. Reattach is an eye test ;-) > + * @hostdev_mgr: hostdev manager > + * @pci: PCI device Perhaps better to indicate a "new"ly generated PCI device that does not track the internal reattach states and other state information such as the stub driver. IOW: this is not a copy of an [in]activePCIHostdevs element > + * > + * Reattach a specific PCI device to the host. > + * > + * This function makes sure the device is not in use before reattaching it > + * to the host; if the device has already been reattached to the host, the > + * operation is considered successful. > + * > + * Returns: 0 on success, <0 on failure > + */ > int > virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, > virPCIDevicePtr pci) > { > +virPCIDevicePtr actual; > struct virHostdevIsPCINodeDeviceUsedData data = {hostdev_mgr, NULL, > false}; > int ret = -1; > @@ -1651,10 +1665,23 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr > hostdev_mgr, > if (virHostdevIsPCINodeDeviceUsed(virPCIDeviceGetAddress(pci), )) > goto out; > > -virPCIDeviceReattachInit(pci); > +/* We want this function to be idempotent, so if the device has already > + * been removed from the inactive list (and is not active either, as > + * per the check above) just return right away. We also need to retrieve > + * the actual device from the inactive list because that's the one that > + * contains state information such as the stub driver */ > +if (!(actual = virPCIDeviceListFind(hostdev_mgr->inactivePCIHostdevs, > +pci))) { As noted in my 1/9 review - this code path passes the 'actual' dev onlist... The call we're about to make is going to repeat this search. Something we could avoid if performance of list searches were a concern. No other issues - John > +VIR_DEBUG("PCI device %s is already attached to the host", > + virPCIDeviceGetName(pci)); > +ret = 0; > +goto out; > +} > > -if (virPCIDeviceReattach(pci, hostdev_mgr->activePCIHostdevs, > - hostdev_mgr->inactivePCIHostdevs) < 0) > +/* Reattach the device. We don't want to skip unmanaged devices in > + * this case, because the user explicitly asked for the device to > + * be reattached to the host driver */ > +if (virHostdevOnlyReattachPCIDevice(hostdev_mgr, actual, false) < 0) > goto out; > > ret = 0; > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 4/9] hostdev: Add virHostdevOnlyDetachPCIDevice()
On 01/25/2016 11:20 AM, Andrea Bolognani wrote: > This function mirrors virHostdevOnlyDetachPCIDevice(). > > The handling of active and inactive devices is updated and made more > explicit, which means virHostdevPreparePCIDevices() has to be > updated as well. > --- > src/util/virhostdev.c | 87 > --- > 1 file changed, 61 insertions(+), 26 deletions(-) > I think parts of patch 7 to adjust comments should just be inlined here > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c > index f40d636..6f14574 100644 > --- a/src/util/virhostdev.c > +++ b/src/util/virhostdev.c > @@ -594,6 +594,56 @@ virHostdevOnlyReattachPCIDevice(virHostdevManagerPtr mgr, > return ret; > } > > +/** > + * virHostdevOnlyDetachPCIDevice: Similar to 1/9 comments, you could drop the "Only" > + * @mgr: hostdev manager > + * @pci: PCI device to be detached ... Copy of a 'hostdev' - expected to be added to inactivelist... > + * @skipUnmanaged: whether to skip unmanaged devices > + * > + * Detach a PCI device from the host. > + * > + * This function only performs the base detach steps that are required > + * regardless of whether the device is being attached to a domain or > + * is simply being detached from the host for later use. > + * > + * The PCI related parts of @mgr (inactivePCIHostdevs, activePCIHostdevs) > + * must have been locked beforehand using virObjectLock(). > + * > + * Returns: 0 on success, <0 on failure > + */ > +static int > +virHostdevOnlyDetachPCIDevice(virHostdevManagerPtr mgr, > + virPCIDevicePtr pci, > + bool skipUnmanaged) > +{ > +int ret = -1; > + > +/* Skip unmanaged devices if asked to do so */ > +if (!virPCIDeviceGetManaged(pci) && skipUnmanaged) { > +VIR_DEBUG("Not detaching unmanaged PCI device %s", > + virPCIDeviceGetName(pci)); > +ret = 0; > +goto out; > +} > + > +VIR_DEBUG("Detaching PCI device %s", virPCIDeviceGetName(pci)); > +if (virPCIDeviceDetach(pci, > + mgr->activePCIHostdevs, > + mgr->inactivePCIHostdevs) < 0) { > +virErrorPtr err = virGetLastError(); > +VIR_ERROR(_("Failed to detach PCI device %s: %s"), > + virPCIDeviceGetName(pci), > + err ? err->message : _("unknown error")); > +virResetError(err); > +goto out; > +} > + > +ret = 0; > + > + out: > +return ret; > +} > + > int > virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, > const char *drv_name, > @@ -664,17 +714,8 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); > > -if (virPCIDeviceGetManaged(dev)) { > -VIR_DEBUG("Detaching managed PCI device %s", > - virPCIDeviceGetName(dev)); > -if (virPCIDeviceDetach(dev, > - hostdev_mgr->activePCIHostdevs, > - hostdev_mgr->inactivePCIHostdevs) < 0) > -goto reattachdevs; > -} else { > -VIR_DEBUG("Not detaching unmanaged PCI device %s", > - virPCIDeviceGetName(dev)); > -} > +if (virHostdevOnlyDetachPCIDevice(hostdev_mgr, dev, true) < 0) > +goto reattachdevs; > } > > /* At this point, all devices are attached to the stub driver and have > @@ -704,23 +745,21 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > last_processed_hostdev_vf = i; > } > > -/* Loop 5: Now mark all the devices as active */ > +/* Step 5: move all devices from the inactive list to the active list */ I know patch 7 adjusts other comments, but things are still described as loops in other comments - probably should follow... Or of course, make the comment changes now because [1a & 1b] > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); > +virPCIDevicePtr actual; > > -VIR_DEBUG("Adding PCI device %s to active list", > +VIR_DEBUG("Removing PCI device %s from inactive list", >virPCIDeviceGetName(dev)); > -if (virPCIDeviceListAdd(hostdev_mgr->activePCIHostdevs, dev) < 0) > +if (!(actual = > virPCIDeviceListSteal(hostdev_mgr->inactivePCIHostdevs, > + dev))) > goto inactivedevs; > -} > - > -/* Loop 6: Now remove the devices from inactive list. */ > -for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { [1a] This removes Loop 6 (or Step 6), but does not renumber the following loops/steps nor does it address the comment much earlier "We have to use 9 loops here." > -
Re: [libvirt] [PATCH v2 5/9] hostdev: Use common detach code in virHostdevPCINodeDeviceDetach()
On 01/25/2016 11:21 AM, Andrea Bolognani wrote: > This ensures the behavior for detach is consistent, no matter how it > was triggered (eg. 'virsh nodedev-detach', 'virsh attach-device' or > startup of a domain that is configured to use hostdevs). > --- > src/util/virhostdev.c | 28 ++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c > index 6f14574..5ded1e9 100644 > --- a/src/util/virhostdev.c > +++ b/src/util/virhostdev.c > @@ -1646,6 +1646,19 @@ virHostdevReAttachSCSIDevices(virHostdevManagerPtr > hostdev_mgr, > virObjectUnlock(hostdev_mgr->activeSCSIHostdevs); > } > > +/** > + * virHostdevPCINodeDeviceDetach: > + * @hostdev_mgr: hostdev manager > + * @pci: PCI device A new PCI device to be detach from the host IOW: Not currently on some active/inactive list, right? Seems reasonable otherwise though. John > + * > + * Detach a specific PCI device from the host. > + * > + * This function makes sure the device is not in use before detaching it > + * from the host; if the device has already been detached from the host, > + * the operation is considered successful. > + * > + * Returns: 0 on success, <0 on failure > + */ > int > virHostdevPCINodeDeviceDetach(virHostdevManagerPtr hostdev_mgr, >virPCIDevicePtr pci) > @@ -1660,11 +1673,22 @@ virHostdevPCINodeDeviceDetach(virHostdevManagerPtr > hostdev_mgr, > if (virHostdevIsPCINodeDeviceUsed(virPCIDeviceGetAddress(pci), )) > goto out; > > -if (virPCIDeviceDetach(pci, hostdev_mgr->activePCIHostdevs, > - hostdev_mgr->inactivePCIHostdevs) < 0) { > +/* We want this function to be idempotent, so if the device has already > + * been added to the inactive list (and is not active, as per the check > + * above) just return right away */ > +if (virPCIDeviceListFind(hostdev_mgr->inactivePCIHostdevs, pci)) { > +VIR_DEBUG("PCI device %s is already detached from the host", > + virPCIDeviceGetName(pci)); > +ret = 0; > goto out; > } > > +/* Detach the device. We don't want to skip unmanaged devices in > + * this case, because the user explicitly asked for the device to > + * be detached from the host driver */ > +if (virHostdevOnlyDetachPCIDevice(hostdev_mgr, pci, false) < 0) > +goto out; > + > ret = 0; > out: > virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 8/9] pci: Phase out virPCIDeviceReattachInit()
On 01/25/2016 11:21 AM, Andrea Bolognani wrote: > The name is confusing, and the only use left is in a test case. > --- > src/libvirt_private.syms | 1 - > src/util/virpci.c| 8 > src/util/virpci.h| 1 - > tests/virpcitest.c | 5 - > 4 files changed, 4 insertions(+), 11 deletions(-) > Seems reasonable - ACK John -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 9/9] pci: Add debug messages when unbinding from stub driver
On 01/25/2016 11:21 AM, Andrea Bolognani wrote: > Unbinding a PCI device from the stub driver can require several steps, > and it can be useful for debugging to be able to trace which of these > steps are performed and which are skipped for each device. > --- > src/util/virpci.c | 22 ++ > 1 file changed, 18 insertions(+), 4 deletions(-) > > diff --git a/src/util/virpci.c b/src/util/virpci.c > index f56dbe6..51a3f88 100644 > --- a/src/util/virpci.c > +++ b/src/util/virpci.c > @@ -1106,26 +1106,37 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) > > if (!driver) { > /* The device is not bound to any driver and we are almost done. */ > +VIR_DEBUG("PCI device %s is not bound to any driver", dev->name); > goto reprobe; > } > > -if (!dev->unbind_from_stub) > +if (!dev->unbind_from_stub) { > +VIR_DEBUG("Unbind from stub skipped for PCI device %s", dev->name); > goto remove_slot; > +} > > /* If the device isn't bound to a known stub, skip the unbind. */ > if (virPCIStubDriverTypeFromString(driver) < 0 || > -virPCIStubDriverTypeFromString(driver) == VIR_PCI_STUB_DRIVER_NONE) > +virPCIStubDriverTypeFromString(driver) == VIR_PCI_STUB_DRIVER_NONE) { > +VIR_DEBUG("Unbind from stub skipped for PCI device %s because of " > + "unknown stub driver", dev->name); > goto remove_slot; > +} > > -VIR_DEBUG("Found stub driver %s", driver); > +VIR_DEBUG("Found stub driver %s for PCI device %s", driver, dev->name); > +VIR_DEBUG("Unbinding PCI device %s", dev->name); Redundant - How about "Found stub driver %s to unbind PCI device %s" or "Unbinding PCI device %s for stub driver %s" (don't forget to change order of args ;-)) IOW: No need to have two messages. Hope they help some day! ACK - John > > if (virPCIDeviceUnbind(dev) < 0) > goto cleanup; > dev->unbind_from_stub = false; > > remove_slot: > -if (!dev->remove_slot) > +if (!dev->remove_slot) { > +VIR_DEBUG("Slot removal skipped for PCI device %s", dev->name); > goto reprobe; > +} > + > +VIR_DEBUG("Removing slot for PCI device %s", dev->name); > > /* Xen's pciback.ko wants you to use remove_slot on the specific device > */ > if (!(path = virPCIDriverFile(driver, "remove_slot"))) > @@ -1141,10 +1152,13 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) > > reprobe: > if (!dev->reprobe) { > +VIR_DEBUG("Reprobe skipped for PCI device %s", dev->name); > result = 0; > goto cleanup; > } > > +VIR_DEBUG("Reprobing for PCI device %s", dev->name); > + > /* Trigger a re-probe of the device is not in the stub's dynamic > * ID table. If the stub is available, but 'remove_id' isn't > * available, then re-probing would just cause the device to be > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 7/9] hostdev: Update comments
On 01/25/2016 11:21 AM, Andrea Bolognani wrote: > Some of the comments are no longer accurate after the recent changes, > others have been expanded and hopefully improved. > > The initial summary of the operations has been removed so that two > separate edits are not needed when making changes. > --- > src/util/virhostdev.c | 68 > +++ > 1 file changed, 30 insertions(+), 38 deletions(-) > 6 of one, half dozen of another, but I think these should have been done at the time of change... I'll make specific comments about changes here still though... > diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c > index ab17547..0892dbf 100644 > --- a/src/util/virhostdev.c > +++ b/src/util/virhostdev.c This comment block starts "We have to use 9 loops here." - that should be adjusted too as there are now 7 steps. (Is it any coincidence that there are also 7 stages of grief and 7 steps to great health? ;-)) > @@ -675,11 +675,10 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > * must be reset before being marked as active. > */ > > -/* Loop 1: validate that non-managed device isn't in use, eg > - * by checking that device is either un-bound, or bound > - * to pci-stub.ko > - */ > +/* Detaching devices from the host involves several steps; each of them > + * is described at length below */ > > +/* Step 1: perform safety checks, eg. ensure the devices are assignable > */ > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); > bool strict_acs_check = !!(flags & VIR_HOSTDEV_STRICT_ACS_CHECK); > @@ -710,7 +709,8 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > } > } > > -/* Loop 2: detach managed devices (i.e. bind to appropriate stub driver) > */ > +/* Step 2: detach managed devices (i.e. bind to appropriate stub driver). > + * detachPCIDevices() will also mark devices as inactive */ s/detachPCIDevices/virHostdevOnlyDetachPCIDevice Or whatever it ends up being named. s/will also mark devices as inactive/will place device on inactive list */ > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); > > @@ -718,11 +718,10 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > goto reattachdevs; > } > > -/* At this point, all devices are attached to the stub driver and have > +/* At this point, devices are attached to the stub driver and have > * been marked as inactive */ > > -/* Loop 3: Now that all the PCI hostdevs have been detached, we > - * can safely reset them */ > +/* Step 3: perform a PCI reset on all devices */ > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); > > @@ -732,8 +731,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > goto reattachdevs; > } > > -/* Loop 4: For SRIOV network devices, Now that we have detached the > - * the network device, set the netdev config */ > +/* Step 4: set the netdev config for SRIOV network devices */ > for (i = 0; i < nhostdevs; i++) { > virDomainHostdevDefPtr hostdev = hostdevs[i]; > if (!virHostdevIsPCINetDevice(hostdev)) > @@ -762,9 +760,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > goto inactivedevs; > } > Made my Step 5 comment earlier > -/* Loop 7: Now set the used_by_domain of the device in > - * activePCIHostdevs as domain name. > - */ > +/* Step 6: set driver and domain information */ > for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { > virPCIDevicePtr dev, activeDev; > > @@ -777,7 +773,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > virPCIDeviceSetUsedBy(activeDev, drv_name, dom_name); > } > > -/* Loop 8: Now set the original states for hostdev def */ > +/* Step 7: set the original states for hostdev def */ > for (i = 0; i < nhostdevs; i++) { > virPCIDevicePtr dev; > virPCIDevicePtr pcidev; > @@ -792,10 +788,9 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr > hostdev_mgr, > dev = virPCIDeviceNew(pcisrc->addr.domain, pcisrc->addr.bus, >pcisrc->addr.slot, pcisrc->addr.function); > > -/* original states "unbind_from_stub", "remove_slot", > - * "reprobe" were already set by pciDettachDevice in > - * loop 2. > - */ > +/* original states for "unbind_from_stub", "remove_slot" and > + * "reprobe" (used when reattaching) were already set by > + * detachPCIDevices() in a previous step */ > VIR_DEBUG("Saving network configuration of PCI device %s", >
Re: [libvirt] [PATCH v2 6/9] hostdev: Minor style adjustments
On 01/25/2016 11:21 AM, Andrea Bolognani wrote: > Mostly labels names and whitespace. > > No functional changes. > --- > src/util/virhostdev.c | 23 ++- > 1 file changed, 14 insertions(+), 9 deletions(-) > Seems reasonable, ACK John -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 3/4] virsh: ensure SIGINT action is reset on all errors
If virTimeMillisNow() fails, the SIGINT action must be reset back to its previous state. Signed-off-by: Michael Chapman--- tools/virsh-domain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index cdeccac..750b273 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1857,7 +1857,7 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) if (data->timeout && virTimeMillisNow() < 0) { vshSaveLibvirtError(); -return -1; +goto cleanup; } last.cur = last.end = 0; -- 2.4.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 2/4] virsh: be consistent with style of loop exit
When waiting for a block job, the various statuses (COMPLETED, READY, CANCELED, etc.) should all be treated consistently by having the loop be exited with "break". Use "goto cleanup" for the error cases only, when no block job status is available. Signed-off-by: Michael Chapman--- tools/virsh-domain.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 83de02a..cdeccac 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1876,21 +1876,23 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) * the waiting loop */ if ((data->cb_id >= 0 || data->cb_id2 >= 0) && data->status != -1) { ret = data->status; -goto cleanup; +break; } /* since virsh can't guarantee that the path provided by the user will * later be matched in the event we will need to keep the fallback * approach and claim success if the block job finishes or vanishes. */ -if (result == 0) +if (result == 0) { +ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED; break; +} /* for two-phase jobs we will try to wait in the synchronized phase * for event arrival since 100% completion doesn't necessarily mean that * the block job has finished and can be terminated with success */ if (info.end == info.cur && --retries == 0) { ret = VIR_DOMAIN_BLOCK_JOB_READY; -goto cleanup; +break; } if (data->verbose && (info.cur != last.cur || info.end != last.end)) @@ -1911,21 +1913,19 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) } ret = VIR_DOMAIN_BLOCK_JOB_CANCELED; -goto cleanup; +break; } usleep(500 * 1000); } -ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED; - - cleanup: /* print 100% completed */ if (data->verbose && (ret == VIR_DOMAIN_BLOCK_JOB_COMPLETED || ret == VIR_DOMAIN_BLOCK_JOB_READY)) virshPrintJobProgress(data->job_name, 0, 1); + cleanup: sigaction(SIGINT, _sig_action, NULL); return ret; } -- 2.4.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 0/4] virshBlockJobWait improvements
v2 discussion at: http://www.redhat.com/archives/libvir-list/2016-January/msg01063.html As requested I've split this into separate commits. All of the comment nitpicks have been applied. With regards to using "break" or "goto cleanup" to exit the loop, I decided to use "goto" on all the error cases and "break" everywhere else. I had noticed another bug where the SIGINT signal action wasn't reset if the virTimeMillisNow() call failed; that's fixed in patch 3 in this series. Michael Chapman (4): virsh: avoid unnecessary progress updates virsh: be consistent with style of loop exit virsh: ensure SIGINT action is reset on all errors virsh: improve waiting for block job readiness tools/virsh-domain.c | 65 ++-- 1 file changed, 37 insertions(+), 28 deletions(-) -- 2.4.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 1/4] virsh: avoid unnecessary progress updates
There is no need to call virshPrintJobProgress() unless the block job's cur or end cursors have changed since the last iteration. Signed-off-by: Michael Chapman--- tools/virsh-domain.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index c2146d2..83de02a 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1837,7 +1837,7 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) unsigned int abort_flags = 0; int ret = -1; -virDomainBlockJobInfo info; +virDomainBlockJobInfo info, last; int result; if (!data) @@ -1860,6 +1860,8 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) return -1; } +last.cur = last.end = 0; + while (true) { pthread_sigmask(SIG_BLOCK, , ); result = virDomainGetBlockJobInfo(data->dom, data->dev, , 0); @@ -1891,9 +1893,10 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) goto cleanup; } -if (data->verbose) +if (data->verbose && (info.cur != last.cur || info.end != last.end)) virshPrintJobProgress(data->job_name, info.end - info.cur, info.end); +last = info; if (data->timeout && virTimeMillisNow() < 0) { vshSaveLibvirtError(); -- 2.4.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 4/4] virsh: improve waiting for block job readiness
After a block job hits 100%, we only need to apply a timeout waiting for a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2 callbacks were able to be registered. If neither callback could be registered, there's clearly no need for a timeout. If both callbacks were registered, then we're guaranteed to eventually get one of the events. The path being used by virsh must be exactly the source path or target device in the domain's disk definition, and these are the respective strings sent back in these two events. Signed-off-by: Michael Chapman--- tools/virsh-domain.c | 50 -- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 750b273..bf65a60 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1810,14 +1810,17 @@ virshBlockJobWaitFree(virshBlockJobWaitDataPtr data) * virshBlockJobWait: * @data: private data initialized by virshBlockJobWaitInit * - * Waits for the block job to complete. This function prefers to get an event - * from libvirt but still has fallback means if the device name can't be matched + * Waits for the block job to complete. This function prefers to wait for a + * matching VIR_DOMAIN_EVENT_ID_BLOCK_JOB or VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 + * event from libvirt; however, it has a fallback mode should either of these + * events not be available. * - * This function returns values from the virConnectDomainEventBlockJobStatus enum - * or -1 in case of a internal error. Fallback states if a block job vanishes - * without triggering the event is VIR_DOMAIN_BLOCK_JOB_COMPLETED. For two phase - * jobs after the retry count for waiting for the event expires is - * VIR_DOMAIN_BLOCK_JOB_READY. + * This function returns values from the virConnectDomainEventBlockJobStatus + * enum or -1 in case of an internal error. + * + * If the fallback mode is activated the returned event is + * VIR_DOMAIN_BLOCK_JOB_COMPLETED if the block job vanishes or + * VIR_DOMAIN_BLOCK_JOB_READY if the block job reaches 100%. */ static int virshBlockJobWait(virshBlockJobWaitDataPtr data) @@ -1872,27 +1875,30 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data) goto cleanup; } -/* if we've got an event for the device we are waiting for we can end - * the waiting loop */ +/* If either callback could be registered and we've got an event, we can + * can end the waiting loop */ if ((data->cb_id >= 0 || data->cb_id2 >= 0) && data->status != -1) { ret = data->status; break; } -/* since virsh can't guarantee that the path provided by the user will - * later be matched in the event we will need to keep the fallback - * approach and claim success if the block job finishes or vanishes. */ -if (result == 0) { -ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED; -break; -} +/* Fallback behaviour is only needed if one or both callbacks could not + * be registered */ +if (data->cb_id < 0 || data->cb_id2 < 0) { +/* If the block job vanishes, synthesize a COMPLETED event */ +if (result == 0) { +ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED; +break; +} -/* for two-phase jobs we will try to wait in the synchronized phase - * for event arrival since 100% completion doesn't necessarily mean that - * the block job has finished and can be terminated with success */ -if (info.end == info.cur && --retries == 0) { -ret = VIR_DOMAIN_BLOCK_JOB_READY; -break; +/* If the block job hits 100%, wait a little while for a possible + * event from libvirt unless both callbacks could not be registered + * in order to synthesize our own READY event */ +if (info.end == info.cur && +((data->cb_id < 0 && data->cb_id2 < 0) || --retries == 0)) { +ret = VIR_DOMAIN_BLOCK_JOB_READY; +break; +} } if (data->verbose && (info.cur != last.cur || info.end != last.end)) -- 2.4.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] util: fix build without macvtap
Commit 370608b added new functions: - virNetDevMacVLanReleaseName, - virNetDevMacVLanReserveName. Add stubbed versions of them to fix build without macvtap. --- src/util/virnetdevmacvlan.c | 17 + 1 file changed, 17 insertions(+) diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index da71eef..9293510 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -1330,4 +1330,21 @@ int virNetDevMacVLanVPortProfileRegisterCallback(const char *ifname ATTRIBUTE_UN _("Cannot create macvlan devices on this platform")); return -1; } + +int +virNetDevMacVLanReserveName(const char *name ATTRIBUTE_UNUSED, +bool quietFail ATTRIBUTE_UNUSED) +{ +virReportSystemError(ENOSYS, "%s", + _("The macvlan devices are not supported on this platform")); +return -1; +} + +int +virNetDevMacVLanReleaseName(const char *name ATTRIBUTE_UNUSED) +{ +virReportSystemError(ENOSYS, "%s", + _("The macvlan devices are not supported on this platform")); +return -1; +} #endif /* ! WITH_MACVTAP */ -- 2.4.6 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ability to add arbitrary data to snapshots
On 01/25/2016 04:16 PM, Daniel P. Berrange wrote: On Mon, Jan 25, 2016 at 02:55:30PM +0300, Alexander Burluka wrote: For example, we want to store suspended state of VM. I'm aware that some other careless application dealing with libvirt may erase metadata section and info about additional snapshot data will be lost. What do you mean by suspended state of the VM ? Are you referring to the VM memory & device state ? If so, I think that something that should be explicitly represented in the API, not a opaque blob. We distinguish paused and suspended to disk domain states. For suspend to disk we are using virDomainSaveFlags API call and stop domain. This API call stores VM memory and device states to some file. If user then calls virDomainSnapshotCreateXML, the resulted snapshot lacks VM memory and device states because it looks like domain is stopped and it is not aware about state file. Thus, switch to this snapshot does not work as expected. We need to store this file somewhere, so what API change could be done here? Or maybe we can emulate suspend to disk in other way? Please advice. Thank you! Regards, Daniel -- Regards, Alexander Burluka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list