Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu
On Thu, Mar 12, 2015 at 14:23:48 -0600, Eric Blake wrote: https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a - b - c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a - b - c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot. Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic: error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place. As a safety valve, we validate that any name returned by qemu still maps to the same local file as we have tracked it, so that a compromised qemu cannot accidentally cause us to act on an incorrect file. It would still allow to act on remote storage though. Also if qemu is corrupted in a way that it'd lie to us correctly via monitor it would be most probably also able to act on the file itself. As the labelling is done from the internal structures it should not allow to do anything besides what the instance is already allowed. A bigger problem though would be that since we don't store the backing chain internally all the time, qemu could rewrite the metadata in the image and libvirt would happily accept those. Corrupting qemu in that way is very unprobable though IMO. * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup) (qemuMonitorJSONDiskNameLookupOne): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCommit) (qemuDomainBlockJobImpl): Use it. Signed-off-by: Eric Blake ebl...@redhat.com --- v2: as suggested by Dan, add a sanity checking valve to ensure we don't use qemu's string until vetting that it resolves to the same local name we are already tracking src/qemu/qemu_driver.c | 28 ++--- src/qemu/qemu_monitor.c | 20 - src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 97 +++- src/qemu/qemu_monitor_json.h | 9 +++- 5 files changed, 144 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3263ac..f0e530d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c ... @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } qemuDomainObjEnterMonitor(driver, vm); -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath, - speed, mode, async); +if (baseSource) +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src, I remember that at some point accessing of domain definition while in the monitor was not okay for some reason, but I can't now remember why nor whether it was fixed. + baseSource); +if (!baseSource || basePath) +ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath, + speed, mode, async); if (qemuDomainObjExitMonitor(driver, vm) 0) ret = -1; if (ret 0) { ... diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d869a72..cf7dc5e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. Shouldn't we employ something as in gnulib, where copyrights would be bumped at once everywhere? * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or ... diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b30da34..e67d800 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1,7 +1,7 @@ /* *
Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu
On 03/13/2015 02:02 AM, Peter Krempa wrote: @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } qemuDomainObjEnterMonitor(driver, vm); -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath, - speed, mode, async); +if (baseSource) +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src, I remember that at some point accessing of domain definition while in the monitor was not okay for some reason, but I can't now remember why nor whether it was fixed. Oh, right. You're thinking of CVE-2013-6458. That problem was that as soon as we enter the monitor, we drop locks. If we do not already own a block job, then some other parallel API could be hot-unplugging a disk before we regain control, freeing 'disk' before we dereference it. But we fixed that problem by guaranteeing that we always own the job early enough (no other thread can hot-unplug the disk as long as we own the job), so it is not an issue for this patch. - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. Shouldn't we employ something as in gnulib, where copyrights would be bumped at once everywhere? Might be nice, but one wrinkle. Gnulib has a single copyright holder (FSF), so they can afford to bump all files at once (the bump is also owned by FSF, so FSF adding another year to its copyright is appropriate). But libvirt is split among multiple copyright holders - Red Hat can't claim copyright over all files, so it wouldn't be wise to bump all files, just the ones that Red Hat has already touched. Personally, I've just got an emacs hook that checks if any file I touch has an up-to-date copyright line. +static char * +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ +virJSONValuePtr backing; +char *ret; + +if (!top) +return NULL; In case the backing chain as remembered by libvirt is shorter than what qemu sees you don't report error. Since the caller checks whether an error was set and if not then adds one, please state this fact in a comment here as it's not obvious until you follow the call chain. Will do. +if (top != target) { +backing = virJSONValueObjectGet(image, backing-image); +return qemuMonitorJSONDiskNameLookupOne(backing, top-backingStore, +target); Also the recursion doesn't take into account that for some reason qemu might report a shorter chain than libvirt thinks, which would crash here. Oh, good catch (and looks like it explains what Shanzhi reported). +if (!dev || dev-type != VIR_JSON_TYPE_OBJECT) { [1] +virReportError(VIR_ERR_INTERNAL_ERROR, %s, + _(block info device entry was not in expected format)); +goto cleanup; +} + +if ((thisdev = virJSONValueObjectGetString(dev, device)) == NULL) { You are mixing styles of cheching of the pointer to be non-null within a few lines ([1]) Copy-and-paste from another recursive parser of query-block information, but I can make it more consistent. ACK if you add the comment and fix the potential crash. I'm currently OK with accessing domain definition while it's unlocked (but guarded via the domain job) as I don't have an counter example where it wouldn't work correctly. I'm still a bit worried by Shanzhi's report of a crash; maybe I still have a race condition. That is, we change libvirt's notion of the chain length after a commit based on response to a qemu event rather than a user command - I was thinking that libvirt's chain and qemu's chain will always be the same length, but since Shanzhi provided a stack trace where it is not true, I'm wondering if the qemu chain being shorter than the libvirt chain might mean that we have some sort of window where a qemu event happens at the wrong moment when repeatedly hammering on consecutive commits. So I'll post a v3 after more testing rather than just blindly going on this ACK. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu
On Fri, Mar 13, 2015 at 07:01:06AM -0600, Eric Blake wrote: On 03/13/2015 02:02 AM, Peter Krempa wrote: @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } qemuDomainObjEnterMonitor(driver, vm); -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath, - speed, mode, async); +if (baseSource) +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src, I remember that at some point accessing of domain definition while in the monitor was not okay for some reason, but I can't now remember why nor whether it was fixed. Oh, right. You're thinking of CVE-2013-6458. That problem was that as soon as we enter the monitor, we drop locks. If we do not already own a block job, then some other parallel API could be hot-unplugging a disk before we regain control, freeing 'disk' before we dereference it. But we fixed that problem by guaranteeing that we always own the job early enough (no other thread can hot-unplug the disk as long as we own the job), so it is not an issue for this patch. - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. Shouldn't we employ something as in gnulib, where copyrights would be bumped at once everywhere? Might be nice, but one wrinkle. Gnulib has a single copyright holder (FSF), so they can afford to bump all files at once (the bump is also owned by FSF, so FSF adding another year to its copyright is appropriate). But libvirt is split among multiple copyright holders - Red Hat can't claim copyright over all files, so it wouldn't be wise to bump all files, just the ones that Red Hat has already touched. Personally, I've just got an emacs hook that checks if any file I touch has an up-to-date copyright line. Technically there is no need to actually assert copyright over the code at all, since copyright is an automatic right you get the moment you author the code. Given that the copyright notice is not even required in the first place, asserting a year alongside the copyright notice is by implication not required either, nor is updating the year when you change code. Adding the Copyright lines is at most an informative step, to assist those reading the code in seeing its providence ownership. Of course GIT history is much more useful for that purpose, but not everyone will receive a copy of GIT repo when they receive the code. In essence, the Copyright lines had a moderate benefit in clarifying ownership, but no legal benefit. By all means include a date when first starting a new file, but I think updating existing dates is pretty much a waste of time. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu
I do meet libvirtd crash sometime when test this patch(I also met it when test v1 yesterday, but can not reproduce it 100%.) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe9d39700 (LWP 25413)] virJSONValueObjectGetString (object=0x0, key=key@entry=0x7fffe4f72429 filename) at util/virjson.c:1074 1074if (object-type != VIR_JSON_TYPE_OBJECT) (gdb) t a a bt Thread 6 (Thread 0x7fffe9d39700 (LWP 25413)): #0 virJSONValueObjectGetString (object=0x0, key=key@entry=0x7fffe4f72429 filename) at util/virjson.c:1074 #1 0x7fffe4f2a1f4 in qemuMonitorJSONDiskNameLookupOne (image=optimized out, top=0x7fffd40013b0, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3901 #2 0x7fffe4f2a1bc in qemuMonitorJSONDiskNameLookupOne (image=optimized out, top=top@entry=0x7fffdc0fc940, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3898 #3 0x7fffe4f31800 in qemuMonitorJSONDiskNameLookup (mon=optimized out, device=0x7fffd429cee0 drive-virtio-disk0, top=0x7fffdc0fc940, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3963 #4 0x7fffe4f1f87e in qemuMonitorDiskNameLookup (mon=optimized out, device=optimized out, top=optimized out, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor.c:3475 #5 0x7fffe4f55775 in qemuDomainBlockCommit (dom=optimized out, path=optimized out, base=optimized out, top=optimized out, bandwidth=optimized out, flags=optimized out) at qemu/qemu_driver.c:16937 #6 0x775ff933 in virDomainBlockCommit (dom=dom@entry=0x7fffd429d630, disk=0x7fffd40010a0 vda, base=0x0, top=0x0, bandwidth=0, flags=5) at libvirt-domain.c:10218 #7 0x555736fe in remoteDispatchDomainBlockCommit (server=optimized out, msg=optimized out, args=0x7fffd429d9c0, rerr=0x7fffe9d38cb0, client=optimized out) at remote_dispatch.h:2594 #8 remoteDispatchDomainBlockCommitHelper (server=optimized out, client=optimized out, msg=optimized out, rerr=0x7fffe9d38cb0, args=0x7fffd429d9c0, ret=optimized out) at remote_dispatch.h:2564 #9 0x77653db9 in virNetServerProgramDispatchCall (msg=0x557d8240, client=0x557ce4a0, server=0x557cc820, prog=0x557d4a40) at rpc/virnetserverprogram.c:437 #10 virNetServerProgramDispatch (prog=0x557d4a40, server=server@entry=0x557cc820, client=0x557ce4a0, msg=0x557d8240) at rpc/virnetserverprogram.c:307 #11 0x555989d8 in virNetServerProcessMsg (msg=optimized out, prog=optimized out, client=optimized out, srv=0x557cc820) at rpc/virnetserver.c:172 #12 virNetServerHandleJob (jobOpaque=optimized out, opaque=0x557cc820) at rpc/virnetserver.c:193 #13 0x7755ed8e in virThreadPoolWorker (opaque=opaque@entry=0x557d8370) at util/virthreadpool.c:144 #14 0x7755e72e in virThreadHelper (data=optimized out) at util/virthread.c:197 #15 0x75de252a in start_thread (arg=0x7fffe9d39700) at pthread_create.c:310 #16 0x75b1e22d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 On 03/13/2015 04:23 AM, Eric Blake wrote: https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a - b - c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a - b - c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot. Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic: error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place. As a safety valve, we validate that any name returned by qemu still maps to the same local file as we have tracked it, so that a compromised qemu cannot accidentally cause us to act on an incorrect file. * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c
[libvirt] [PATCH v2] qemu: read backing chain names from qemu
https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a - b - c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a - b - c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot. Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic: error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place. As a safety valve, we validate that any name returned by qemu still maps to the same local file as we have tracked it, so that a compromised qemu cannot accidentally cause us to act on an incorrect file. * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup) (qemuMonitorJSONDiskNameLookupOne): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCommit) (qemuDomainBlockJobImpl): Use it. Signed-off-by: Eric Blake ebl...@redhat.com --- v2: as suggested by Dan, add a sanity checking valve to ensure we don't use qemu's string until vetting that it resolves to the same local name we are already tracking src/qemu/qemu_driver.c | 28 ++--- src/qemu/qemu_monitor.c | 20 - src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 97 +++- src/qemu/qemu_monitor_json.h | 9 +++- 5 files changed, 144 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3263ac..f0e530d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16132,9 +16132,6 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, goto endjob; if (baseSource) { -if (qemuGetDriveSourceString(baseSource, NULL, basePath) 0) -goto endjob; - if (flags VIR_DOMAIN_BLOCK_REBASE_RELATIVE) { if (!virQEMUCapsGet(priv-qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, %s, @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } qemuDomainObjEnterMonitor(driver, vm); -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath, - speed, mode, async); +if (baseSource) +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src, + baseSource); +if (!baseSource || basePath) +ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath, + speed, mode, async); if (qemuDomainObjExitMonitor(driver, vm) 0) ret = -1; if (ret 0) { @@ -16903,12 +16904,6 @@ qemuDomainBlockCommit(virDomainPtr dom, VIR_DISK_CHAIN_READ_WRITE) 0)) goto endjob; -if (qemuGetDriveSourceString(topSource, NULL, topPath) 0) -goto endjob; - -if (qemuGetDriveSourceString(baseSource, NULL, basePath) 0) -goto endjob; - if (flags VIR_DOMAIN_BLOCK_COMMIT_RELATIVE topSource != disk-src) { if (!virQEMUCapsGet(priv-qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { @@ -16939,9 +16934,14 @@ qemuDomainBlockCommit(virDomainPtr dom, disk-mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; } qemuDomainObjEnterMonitor(driver, vm); -ret = qemuMonitorBlockCommit(priv-mon, device, - topPath, basePath, backingPath, - speed); +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src, + baseSource); +topPath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src, +topSource); +if (basePath topPath) +ret = qemuMonitorBlockCommit(priv-mon, device, + topPath,