Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu

2015-03-13 Thread Peter Krempa
On Thu, Mar 12, 2015 at 14:23:48 -0600, Eric Blake wrote:
 https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that
 after a series of disk snapshots into existing destination images,
 followed by active commits of the top image, it is possible for
 qemu 2.2 and earlier to end up tracking a different name for the
 image than what it would have had when opening the chain afresh.
 That is, when starting with the chain 'a - b - c', the name
 associated with 'b' is how it was spelled in the metadata of 'c',
 but when starting with 'a', taking two snapshots into 'a - b - c',
 then committing 'c' back into 'b', the name associated with 'b' is
 now the name used when taking the first snapshot.
 
 Sadly, older qemu doesn't know how to treat different spellings of
 the same filename as identical files (it uses strcmp() instead of
 checking for the same inode), which means libvirt's attempt to
 commit an image using solely the names learned from qcow2 metadata
 fails with a cryptic:
 
 error: internal error: unable to execute QEMU command 'block-commit': Top 
 image file /tmp/images/c/../b/b not found
 
 even though the file exists.  Trying to teach libvirt the rules on
 which name qemu will expect is not worth the effort (besides, we'd
 have to remember it across libvirtd restarts, and track whether a
 file was opened via metadata or via snapshot creation for a given
 qemu process); it is easier to just always directly ask qemu what
 string it expects to see in the first place.
 
 As a safety valve, we validate that any name returned by qemu
 still maps to the same local file as we have tracked it, so that
 a compromised qemu cannot accidentally cause us to act on an
 incorrect file.

It would still allow to act on remote storage though. Also if qemu is
corrupted in a way that it'd lie to us correctly via monitor it would be
most probably also able to act on the file itself.

As the labelling is done from the internal structures it should not
allow to do anything besides what the instance is already allowed.

A bigger problem though would be that since we don't store the backing
chain internally all the time, qemu could rewrite the metadata in the
image and libvirt would happily accept those.

Corrupting qemu in that way is very unprobable though IMO.

 
 * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New
 prototype.
 * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup):
 Likewise.
 * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function.
 * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup)
 (qemuMonitorJSONDiskNameLookupOne): Likewise.
 * src/qemu/qemu_driver.c (qemuDomainBlockCommit)
 (qemuDomainBlockJobImpl): Use it.
 
 Signed-off-by: Eric Blake ebl...@redhat.com
 ---
 
 v2: as suggested by Dan, add a sanity checking valve to ensure we
 don't use qemu's string until vetting that it resolves to the same
 local name we are already tracking
 
  src/qemu/qemu_driver.c   | 28 ++---
  src/qemu/qemu_monitor.c  | 20 -
  src/qemu/qemu_monitor.h  |  8 +++-
  src/qemu/qemu_monitor_json.c | 97 
 +++-
  src/qemu/qemu_monitor_json.h |  9 +++-
  5 files changed, 144 insertions(+), 18 deletions(-)
 
 diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
 index b3263ac..f0e530d 100644
 --- a/src/qemu/qemu_driver.c
 +++ b/src/qemu/qemu_driver.c

...

 @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
  }
 
  qemuDomainObjEnterMonitor(driver, vm);
 -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath,
 -  speed, mode, async);
 +if (baseSource)
 +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src,

I remember that at some point accessing of domain definition while in
the monitor was not okay for some reason, but I can't now remember why
nor whether it was fixed.

 + baseSource);
 +if (!baseSource || basePath)
 +ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath,
 +  speed, mode, async);
  if (qemuDomainObjExitMonitor(driver, vm)  0)
  ret = -1;
  if (ret  0) {

...

 diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
 index d869a72..cf7dc5e 100644
 --- a/src/qemu/qemu_monitor.c
 +++ b/src/qemu/qemu_monitor.c
 @@ -1,7 +1,7 @@
  /*
   * qemu_monitor.c: interaction with QEMU monitor console
   *
 - * Copyright (C) 2006-2014 Red Hat, Inc.
 + * Copyright (C) 2006-2015 Red Hat, Inc.

Shouldn't we employ something as in gnulib, where copyrights would be
bumped at once everywhere?

   * Copyright (C) 2006 Daniel P. Berrange
   *
   * This library is free software; you can redistribute it and/or

...

 diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h
 index b30da34..e67d800 100644
 --- a/src/qemu/qemu_monitor.h
 +++ b/src/qemu/qemu_monitor.h
 @@ -1,7 +1,7 @@
  /*
   * 

Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu

2015-03-13 Thread Eric Blake
On 03/13/2015 02:02 AM, Peter Krempa wrote:
 @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
  }

  qemuDomainObjEnterMonitor(driver, vm);
 -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath,
 -  speed, mode, async);
 +if (baseSource)
 +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src,
 
 I remember that at some point accessing of domain definition while in
 the monitor was not okay for some reason, but I can't now remember why
 nor whether it was fixed.

Oh, right.  You're thinking of CVE-2013-6458. That problem was that as
soon as we enter the monitor, we drop locks.  If we do not already own a
block job, then some other parallel API could be hot-unplugging a disk
before we regain control, freeing 'disk' before we dereference it.  But
we fixed that problem by guaranteeing that we always own the job early
enough (no other thread can hot-unplug the disk as long as we own the
job), so it is not an issue for this patch.


 - * Copyright (C) 2006-2014 Red Hat, Inc.
 + * Copyright (C) 2006-2015 Red Hat, Inc.
 
 Shouldn't we employ something as in gnulib, where copyrights would be
 bumped at once everywhere?

Might be nice, but one wrinkle.  Gnulib has a single copyright holder
(FSF), so they can afford to bump all files at once (the bump is also
owned by FSF, so FSF adding another year to its copyright is
appropriate).  But libvirt is split among multiple copyright holders -
Red Hat can't claim copyright over all files, so it wouldn't be wise to
bump all files, just the ones that Red Hat has already touched.

Personally, I've just got an emacs hook that checks if any file I touch
has an up-to-date copyright line.


 +static char *
 +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image,
 + virStorageSourcePtr top,
 + virStorageSourcePtr target)
 +{
 +virJSONValuePtr backing;
 +char *ret;
 +
 +if (!top)
 +return NULL;
 
 In case the backing chain as remembered by libvirt is shorter than what
 qemu sees you don't report error. Since the caller checks whether an
 error was set and if not then adds one, please state this fact in a
 comment here as it's not obvious until you follow the call chain.

Will do.

 
 +if (top != target) {
 +backing = virJSONValueObjectGet(image, backing-image);
 +return qemuMonitorJSONDiskNameLookupOne(backing, top-backingStore,
 +target);
 
 Also the recursion doesn't take into account that for some reason qemu
 might report a shorter chain than libvirt thinks, which would crash
 here.

Oh, good catch (and looks like it explains what Shanzhi reported).


 +if (!dev || dev-type != VIR_JSON_TYPE_OBJECT) {
 
 [1]
 
 +virReportError(VIR_ERR_INTERNAL_ERROR, %s,
 +   _(block info device entry was not in expected 
 format));
 +goto cleanup;
 +}
 +
 +if ((thisdev = virJSONValueObjectGetString(dev, device)) == NULL) 
 {
 
 You are mixing styles of cheching of the pointer to be non-null within a
 few lines ([1])

Copy-and-paste from another recursive parser of query-block
information, but I can make it more consistent.

 
 ACK if you add the comment and fix the potential crash. I'm currently OK
 with accessing domain definition while it's unlocked (but guarded via
 the domain job) as I don't have an counter example where it wouldn't
 work correctly.

I'm still a bit worried by Shanzhi's report of a crash; maybe I still
have a race condition.  That is, we change libvirt's notion of the chain
length after a commit based on response to a qemu event rather than a
user command - I was thinking that libvirt's chain and qemu's chain will
always be the same length, but since Shanzhi provided a stack trace
where it is not true, I'm wondering if the qemu chain being shorter than
the libvirt chain might mean that we have some sort of window where a
qemu event happens at the wrong moment when repeatedly hammering on
consecutive commits.  So I'll post a v3 after more testing rather than
just blindly going on this ACK.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu

2015-03-13 Thread Daniel P. Berrange
On Fri, Mar 13, 2015 at 07:01:06AM -0600, Eric Blake wrote:
 On 03/13/2015 02:02 AM, Peter Krempa wrote:
  @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
   }
 
   qemuDomainObjEnterMonitor(driver, vm);
  -ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath,
  -  speed, mode, async);
  +if (baseSource)
  +basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src,
  
  I remember that at some point accessing of domain definition while in
  the monitor was not okay for some reason, but I can't now remember why
  nor whether it was fixed.
 
 Oh, right.  You're thinking of CVE-2013-6458. That problem was that as
 soon as we enter the monitor, we drop locks.  If we do not already own a
 block job, then some other parallel API could be hot-unplugging a disk
 before we regain control, freeing 'disk' before we dereference it.  But
 we fixed that problem by guaranteeing that we always own the job early
 enough (no other thread can hot-unplug the disk as long as we own the
 job), so it is not an issue for this patch.
 
 
  - * Copyright (C) 2006-2014 Red Hat, Inc.
  + * Copyright (C) 2006-2015 Red Hat, Inc.
  
  Shouldn't we employ something as in gnulib, where copyrights would be
  bumped at once everywhere?
 
 Might be nice, but one wrinkle.  Gnulib has a single copyright holder
 (FSF), so they can afford to bump all files at once (the bump is also
 owned by FSF, so FSF adding another year to its copyright is
 appropriate).  But libvirt is split among multiple copyright holders -
 Red Hat can't claim copyright over all files, so it wouldn't be wise to
 bump all files, just the ones that Red Hat has already touched.
 
 Personally, I've just got an emacs hook that checks if any file I touch
 has an up-to-date copyright line.

Technically there is no need to actually assert copyright over the
code at all, since copyright is an automatic right you get the moment
you author the code.  Given that the copyright notice is not even
required in the first place, asserting a year alongside the copyright
notice is by implication not required either, nor is updating the year
when you change code.

Adding the Copyright lines is at most an informative step, to assist
those reading the code in seeing its providence  ownership. Of course
GIT history is much more useful for that purpose, but not everyone will
receive a copy of GIT repo when they receive the code.

In essence, the Copyright lines had a moderate benefit in clarifying
ownership, but no legal benefit. By all means include a date when first
starting a new file, but I think updating existing dates is pretty
much a waste of time.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2] qemu: read backing chain names from qemu

2015-03-12 Thread Shanzhi Yu
I do meet libvirtd crash sometime when test this patch(I also
met it when test v1 yesterday, but can not reproduce it 100%.)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9d39700 (LWP 25413)]
virJSONValueObjectGetString (object=0x0, key=key@entry=0x7fffe4f72429
filename) at util/virjson.c:1074
1074if (object-type != VIR_JSON_TYPE_OBJECT)
(gdb) t a a bt

Thread 6 (Thread 0x7fffe9d39700 (LWP 25413)):
#0  virJSONValueObjectGetString (object=0x0,
key=key@entry=0x7fffe4f72429 filename) at util/virjson.c:1074
#1  0x7fffe4f2a1f4 in qemuMonitorJSONDiskNameLookupOne
(image=optimized out, top=0x7fffd40013b0,
target=target@entry=0x7fffd40013b0)
at qemu/qemu_monitor_json.c:3901
#2  0x7fffe4f2a1bc in qemuMonitorJSONDiskNameLookupOne
(image=optimized out, top=top@entry=0x7fffdc0fc940,
target=target@entry=0x7fffd40013b0)
at qemu/qemu_monitor_json.c:3898
#3  0x7fffe4f31800 in qemuMonitorJSONDiskNameLookup (mon=optimized
out, device=0x7fffd429cee0 drive-virtio-disk0, top=0x7fffdc0fc940,
target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3963
#4  0x7fffe4f1f87e in qemuMonitorDiskNameLookup (mon=optimized
out, device=optimized out, top=optimized out,
target=target@entry=0x7fffd40013b0)
at qemu/qemu_monitor.c:3475
#5  0x7fffe4f55775 in qemuDomainBlockCommit (dom=optimized out,
path=optimized out, base=optimized out, top=optimized out,
bandwidth=optimized out,
flags=optimized out) at qemu/qemu_driver.c:16937
#6  0x775ff933 in virDomainBlockCommit
(dom=dom@entry=0x7fffd429d630, disk=0x7fffd40010a0 vda, base=0x0,
top=0x0, bandwidth=0, flags=5)
at libvirt-domain.c:10218
#7  0x555736fe in remoteDispatchDomainBlockCommit
(server=optimized out, msg=optimized out, args=0x7fffd429d9c0,
rerr=0x7fffe9d38cb0,
client=optimized out) at remote_dispatch.h:2594
#8  remoteDispatchDomainBlockCommitHelper (server=optimized out,
client=optimized out, msg=optimized out, rerr=0x7fffe9d38cb0,
args=0x7fffd429d9c0,
ret=optimized out) at remote_dispatch.h:2564
#9  0x77653db9 in virNetServerProgramDispatchCall
(msg=0x557d8240, client=0x557ce4a0, server=0x557cc820,
prog=0x557d4a40)
at rpc/virnetserverprogram.c:437
#10 virNetServerProgramDispatch (prog=0x557d4a40,
server=server@entry=0x557cc820, client=0x557ce4a0,
msg=0x557d8240) at rpc/virnetserverprogram.c:307
#11 0x555989d8 in virNetServerProcessMsg (msg=optimized out,
prog=optimized out, client=optimized out, srv=0x557cc820) at
rpc/virnetserver.c:172
#12 virNetServerHandleJob (jobOpaque=optimized out,
opaque=0x557cc820) at rpc/virnetserver.c:193
#13 0x7755ed8e in virThreadPoolWorker
(opaque=opaque@entry=0x557d8370) at util/virthreadpool.c:144
#14 0x7755e72e in virThreadHelper (data=optimized out) at
util/virthread.c:197
#15 0x75de252a in start_thread (arg=0x7fffe9d39700) at
pthread_create.c:310
#16 0x75b1e22d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109



On 03/13/2015 04:23 AM, Eric Blake wrote:
 https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that
 after a series of disk snapshots into existing destination images,
 followed by active commits of the top image, it is possible for
 qemu 2.2 and earlier to end up tracking a different name for the
 image than what it would have had when opening the chain afresh.
 That is, when starting with the chain 'a - b - c', the name
 associated with 'b' is how it was spelled in the metadata of 'c',
 but when starting with 'a', taking two snapshots into 'a - b - c',
 then committing 'c' back into 'b', the name associated with 'b' is
 now the name used when taking the first snapshot.

 Sadly, older qemu doesn't know how to treat different spellings of
 the same filename as identical files (it uses strcmp() instead of
 checking for the same inode), which means libvirt's attempt to
 commit an image using solely the names learned from qcow2 metadata
 fails with a cryptic:

 error: internal error: unable to execute QEMU command 'block-commit': Top 
 image file /tmp/images/c/../b/b not found

 even though the file exists.  Trying to teach libvirt the rules on
 which name qemu will expect is not worth the effort (besides, we'd
 have to remember it across libvirtd restarts, and track whether a
 file was opened via metadata or via snapshot creation for a given
 qemu process); it is easier to just always directly ask qemu what
 string it expects to see in the first place.

 As a safety valve, we validate that any name returned by qemu
 still maps to the same local file as we have tracked it, so that
 a compromised qemu cannot accidentally cause us to act on an
 incorrect file.

 * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New
 prototype.
 * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup):
 Likewise.
 * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function.
 * src/qemu/qemu_monitor_json.c 

[libvirt] [PATCH v2] qemu: read backing chain names from qemu

2015-03-12 Thread Eric Blake
https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that
after a series of disk snapshots into existing destination images,
followed by active commits of the top image, it is possible for
qemu 2.2 and earlier to end up tracking a different name for the
image than what it would have had when opening the chain afresh.
That is, when starting with the chain 'a - b - c', the name
associated with 'b' is how it was spelled in the metadata of 'c',
but when starting with 'a', taking two snapshots into 'a - b - c',
then committing 'c' back into 'b', the name associated with 'b' is
now the name used when taking the first snapshot.

Sadly, older qemu doesn't know how to treat different spellings of
the same filename as identical files (it uses strcmp() instead of
checking for the same inode), which means libvirt's attempt to
commit an image using solely the names learned from qcow2 metadata
fails with a cryptic:

error: internal error: unable to execute QEMU command 'block-commit': Top image 
file /tmp/images/c/../b/b not found

even though the file exists.  Trying to teach libvirt the rules on
which name qemu will expect is not worth the effort (besides, we'd
have to remember it across libvirtd restarts, and track whether a
file was opened via metadata or via snapshot creation for a given
qemu process); it is easier to just always directly ask qemu what
string it expects to see in the first place.

As a safety valve, we validate that any name returned by qemu
still maps to the same local file as we have tracked it, so that
a compromised qemu cannot accidentally cause us to act on an
incorrect file.

* src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New
prototype.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup)
(qemuMonitorJSONDiskNameLookupOne): Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit)
(qemuDomainBlockJobImpl): Use it.

Signed-off-by: Eric Blake ebl...@redhat.com
---

v2: as suggested by Dan, add a sanity checking valve to ensure we
don't use qemu's string until vetting that it resolves to the same
local name we are already tracking

 src/qemu/qemu_driver.c   | 28 ++---
 src/qemu/qemu_monitor.c  | 20 -
 src/qemu/qemu_monitor.h  |  8 +++-
 src/qemu/qemu_monitor_json.c | 97 +++-
 src/qemu/qemu_monitor_json.h |  9 +++-
 5 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b3263ac..f0e530d 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -16132,9 +16132,6 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
 goto endjob;

 if (baseSource) {
-if (qemuGetDriveSourceString(baseSource, NULL, basePath)  0)
-goto endjob;
-
 if (flags  VIR_DOMAIN_BLOCK_REBASE_RELATIVE) {
 if (!virQEMUCapsGet(priv-qemuCaps, 
QEMU_CAPS_CHANGE_BACKING_FILE)) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, %s,
@@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
 }

 qemuDomainObjEnterMonitor(driver, vm);
-ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath,
-  speed, mode, async);
+if (baseSource)
+basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src,
+ baseSource);
+if (!baseSource || basePath)
+ret = qemuMonitorBlockJob(priv-mon, device, basePath, backingPath,
+  speed, mode, async);
 if (qemuDomainObjExitMonitor(driver, vm)  0)
 ret = -1;
 if (ret  0) {
@@ -16903,12 +16904,6 @@ qemuDomainBlockCommit(virDomainPtr dom,
VIR_DISK_CHAIN_READ_WRITE)  0))
 goto endjob;

-if (qemuGetDriveSourceString(topSource, NULL, topPath)  0)
-goto endjob;
-
-if (qemuGetDriveSourceString(baseSource, NULL, basePath)  0)
-goto endjob;
-
 if (flags  VIR_DOMAIN_BLOCK_COMMIT_RELATIVE 
 topSource != disk-src) {
 if (!virQEMUCapsGet(priv-qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) {
@@ -16939,9 +16934,14 @@ qemuDomainBlockCommit(virDomainPtr dom,
 disk-mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT;
 }
 qemuDomainObjEnterMonitor(driver, vm);
-ret = qemuMonitorBlockCommit(priv-mon, device,
- topPath, basePath, backingPath,
- speed);
+basePath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src,
+ baseSource);
+topPath = qemuMonitorDiskNameLookup(priv-mon, device, disk-src,
+topSource);
+if (basePath  topPath)
+ret = qemuMonitorBlockCommit(priv-mon, device,
+ topPath,