Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Kevin Wolf
Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
 On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.
 
 The backing file information changes when image streaming completes.
 
 Before: fedora.img - my_vm.qed
 After: my_vm.qed (fedora.img is no longer referenced)
 
 The image streaming operation copies data out of fedora.img and
 populates my_vm.qed.  When image streaming completes, the backing file
 is no longer needed and my_vm.qed is updated to drop the backing file.
 
 I think we need to design carefully to prevent QEMU and libvirt making
 incorrect assumptions about who does what.  I really wish that all
 this image file business was outside QEMU and libvirt - that we had a
 separate storage management service which handled the details.  QEMU
 would only do block device operations (no image format manipulation),
 and libvirt would only delegate to the storage management service.

And how do you implement that in a way that works on all platforms, and
without root privileges? I can't see this happen unless it stays
completely optional.

Kevin



Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Avi Kivity

On 07/20/2011 04:51 PM, Kevin Wolf wrote:


  The problem is that QEMU will find backing file file names inside the
  images which it will be unable to open. How do you suggest we get around
  that?

This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.


It can be done without blockdev.  Have a dictionary that translates 
filenames, and populate it from the command line (for a bonus, translate 
a filename to a file descriptor inherited from the caller or passed via 
the monitor).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Kevin Wolf
Am 22.07.2011 09:36, schrieb Avi Kivity:
 On 07/20/2011 04:51 PM, Kevin Wolf wrote:

  The problem is that QEMU will find backing file file names inside the
  images which it will be unable to open. How do you suggest we get around
  that?

 This is the part with allowing libvirt to override the backing file. Of
 course, this is not something that we can add with five lines of code,
 it requires -blockdev.
 
 It can be done without blockdev.  Have a dictionary that translates 
 filenames, and populate it from the command line (for a bonus, translate 
 a filename to a file descriptor inherited from the caller or passed via 
 the monitor).

Sure, you can always add ugly hacks, but it isn't the right solution
that we want to use for all times. However, once we use it, it will show
up in the external API and we'll never get rid of it again.

Kevin



Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Stefan Hajnoczi
On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
 On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.

 The backing file information changes when image streaming completes.

 Before: fedora.img - my_vm.qed
 After: my_vm.qed (fedora.img is no longer referenced)

 The image streaming operation copies data out of fedora.img and
 populates my_vm.qed.  When image streaming completes, the backing file
 is no longer needed and my_vm.qed is updated to drop the backing file.

 I think we need to design carefully to prevent QEMU and libvirt making
 incorrect assumptions about who does what.  I really wish that all
 this image file business was outside QEMU and libvirt - that we had a
 separate storage management service which handled the details.  QEMU
 would only do block device operations (no image format manipulation),
 and libvirt would only delegate to the storage management service.

 And how do you implement that in a way that works on all platforms, and
 without root privileges? I can't see this happen unless it stays
 completely optional.

The cross-platform way would be an iSCSI target that understands image
formats.  But iSCSI requires copying when doing I/O and we can't pass
through virtio-blk.

I'm not sure I see the root privilege issue.

Stefan



Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Blue Swirl
On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl blauwir...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.

 The backing file information changes when image streaming completes.

 Before: fedora.img - my_vm.qed
 After: my_vm.qed (fedora.img is no longer referenced)

 The image streaming operation copies data out of fedora.img and
 populates my_vm.qed.  When image streaming completes, the backing file
 is no longer needed and my_vm.qed is updated to drop the backing file.

 I think we need to design carefully to prevent QEMU and libvirt making
 incorrect assumptions about who does what.  I really wish that all
 this image file business was outside QEMU and libvirt - that we had a
 separate storage management service which handled the details.  QEMU
 would only do block device operations (no image format manipulation),
 and libvirt would only delegate to the storage management service.
 Today we seem to be sprinkling a little bit of storage management into
 QEMU and a little bit into libvirt :(.

 In that spirit it is much nicer to think of storage like a SAN
 appliance where you have LUNs that you access as block devices.  It
 also provides an API for snapshotting, cloning LUNs, etc.

 Let's move to that model instead of worrying about how to spread
 storage logic across QEMU and libvirt.

 Would NBD protocol fit to this purpose, or is it too simple? Then
 libvirt would handle the storage format completely and present an NBD
 interface to QEMU (or give an fd to an external service) and QEMU
 would not care about the storage format in this mode at all.

 NBD does not support flush (fdatasync).  Therefore it only supports
 the slow cache=writethrough mode in a safe manner.

Maybe NBD could still be used in networked setups as a secondary alternative.

 It would be neat to use virtio-blk as the interface because it can be
 passed through to the guest.  The guest talks directly to the storage
 management service without going through QEMU.  The trick is to do
 something like vhost:
 1. An ioeventfd for virtqueue (guest-host) kicks
 2. An irqfd for host-guest kicks
 3. Shared memory for vring and zero-copy data access

 The storage management service provides a UNIX domain socket over
 which fds can be passed to set up the vhost-like virtio-blk interface.

 Moving the image format code into a separate program makes it possible
 to safely write to a backing file while VMs are using it because the
 storage service can be host-wide, not per-VM.  For example, streaming
 a shared backing file over NFS while running VMs using copy-on-write
 images.  If we ever want to do deduplication or other global
 operations, then this approach is nice too.

 To summarize:
 The storage service manages image files including creation, deletion,
 snapshotting, and actual I/O.  QEMU uses a vhost-like virtio-blk
 interface and can pass it directly into the guest.  libvirt uses the
 storage service API without needing to parse image files or keep track
 of backing file relationships.

Excellent plan. If one day kernel provides builtin virtio-blk services
which can be passed via libvirt and QEMU to the guest, we'll even have
zero copy all the way.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Blue Swirl
On Fri, Jul 22, 2011 at 12:11 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
 On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.

 The backing file information changes when image streaming completes.

 Before: fedora.img - my_vm.qed
 After: my_vm.qed (fedora.img is no longer referenced)

 The image streaming operation copies data out of fedora.img and
 populates my_vm.qed.  When image streaming completes, the backing file
 is no longer needed and my_vm.qed is updated to drop the backing file.

 I think we need to design carefully to prevent QEMU and libvirt making
 incorrect assumptions about who does what.  I really wish that all
 this image file business was outside QEMU and libvirt - that we had a
 separate storage management service which handled the details.  QEMU
 would only do block device operations (no image format manipulation),
 and libvirt would only delegate to the storage management service.

 And how do you implement that in a way that works on all platforms, and
 without root privileges? I can't see this happen unless it stays
 completely optional.

 The cross-platform way would be an iSCSI target that understands image
 formats.  But iSCSI requires copying when doing I/O and we can't pass
 through virtio-blk.

The guest could use iSCSI directly using the network interface without
virtio-blk. This setup wouldn't give max performance in local use but
it could also be useful in some networked setups and probably more
useful than NBD.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Blue Swirl
On Fri, Jul 22, 2011 at 11:11 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 22.07.2011 09:36, schrieb Avi Kivity:
 On 07/20/2011 04:51 PM, Kevin Wolf wrote:

  The problem is that QEMU will find backing file file names inside the
  images which it will be unable to open. How do you suggest we get around
  that?

 This is the part with allowing libvirt to override the backing file. Of
 course, this is not something that we can add with five lines of code,
 it requires -blockdev.

 It can be done without blockdev.  Have a dictionary that translates
 filenames, and populate it from the command line (for a bonus, translate
 a filename to a file descriptor inherited from the caller or passed via
 the monitor).

 Sure, you can always add ugly hacks, but it isn't the right solution
 that we want to use for all times. However, once we use it, it will show
 up in the external API and we'll never get rid of it again.

Fully agree. This would also be a highly specific API for QCOW2 and
similar formats.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Eric Blake

On 07/20/2011 12:00 PM, Blue Swirl wrote:

Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?


The fix for CVE-2010-2238 already deals with this: if primary image C refers
to backing file CA of raw format, but does not state what file format CA
contains, then a malicious guest can modify the contents of CA to appear to
be yet another qcow2 image.  At which point, if libvirt follows the backing
file specified in CA, then yes, the malicious guest really can cause libvirt
to expose arbitrary file CB for manipulation by the guest.  But that
security hole was already plugged - by default, libvirt refuses to probe
backing files parsed from qcow2 headers for file format, but instead
requires the outer qcow2 header to also include the a file format
designation for the backing file.  At which point, you then have a safe
chain: if C refers to CA, then libvirt knows that both C and CA are
essential to the storage presented by giving qemu the file name C, and the
guest will already be modifying CA, but there is no storage corruption
involved.


But what if CA is accessed even if C is not? For example, QEMU opens C
(to determine CA and write new information about the path), closes it
and then requests CA?


Why is qemu trying to access CA?

Either because CA was mentioned as a backing file for C (in which case 
libvirt already knows about it, because either libvirt handed C to qemu 
at startup time after already parsing C's headers to learn that CA is a 
backing file, or because libvirt called the snapshot_blkdev command with 
the intent of having qemu populate CA with C as its backing file), or 
because qemu has a bug (in which case, libvirt should refuse the access 
to CA).


Libvirt is already perfectly capable of tracking all files that qemu 
might need to access, and whether it is qemu or libvirt that does the 
open() of those files, we can still have libvirt validate whether each 
request for a file makes sense given the context of all previous files 
in use from the time the qemu command line was invoked and across all 
monitor commands in the meantime.


On non-NFS solutions, where every file can have a SELinux label, then 
the security is then present by merely having libvirt relabel all such 
files to a unique label for that particular qemu process, and SELinux 
merely enforces that qemu cannot open() anything but what libvirt has 
already labeled.  And since libvirt already knows which files to label 
in the non-NFS scenario, it already knows which fds to pass in the NFS 
scenario, at which point the ability to prevent qemu from open()ing an 
NFS file is a security enhancement.


Your question about qemu wanting to use CA is thus answered 
independently of whether the fd management solution is solved by libvirt 
handing an fd for CA to qemu prior to any monitor command where qemu 
will then need to use CA, or whether qemu is taught to asynchronously 
ask libvirt to open an fd for CA on qemu's behalf.  The answer is that 
libvirt already tracks whether qemu should access CA, and just needs a 
way to enforce that knowledge.  The enforcement already exists for 
non-NFS via SELinux labels, and the proposal to add fd handling will 
expand that enforcement to also cover NFS.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Blue Swirl
On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 20.07.2011 15:25, schrieb Jes Sorensen:
 On 07/20/11 12:01, Kevin Wolf wrote:
 Right, we're stuck with the two horros of NFS and selinux, so we need
 something that gets around the problem. In a sane world we would simply
 say 'no NFS, no selinux', but as you say that will never happen.

 My suggestion of a callback mechanism where libvirt registers the
 callback with QEMU for open() calls, allowing libvirt to perform the
 open and return the open file descriptor would get around this problem.
 To me this sounds more like a problem than a solution. It basically
 means that during an open (which may even be initiated by a monitor
 command), you need monitor interaction. It basically means that open
 becomes asynchronous, and requires clients to deal with that, which
 sounds at least interesting... Also you have to add some magic to all
 places opening something.

 I think if libvirt wants qemu to use an fd instead of a file name, it
 shouldn't pass a file name but an fd in the first place. Which means
 that the two that we need are support for an fd: protocol (patches on
 the list, need review), and a way for libvirt to override the backing
 file of an image.

 The problem is that QEMU will find backing file file names inside the
 images which it will be unable to open. How do you suggest we get around
 that?

 This is the part with allowing libvirt to override the backing file. Of
 course, this is not something that we can add with five lines of code,
 it requires -blockdev.

There could still be some issues:
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?

This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C, but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Blue Swirl
On Wed, Jul 20, 2011 at 8:41 PM, Eric Blake ebl...@redhat.com wrote:
 On 07/20/2011 11:20 AM, Blue Swirl wrote:

 There could still be some issues:
 Let's have files A, B, C etc. with backing files AA etc. How would
 libvirt know that when QEMU wants to write to file CA, this is because
 it's needed to access C, or is it just trickery by a devious guest to
 corrupt storage?

 The fix for CVE-2010-2238 already deals with this: if primary image C refers
 to backing file CA of raw format, but does not state what file format CA
 contains, then a malicious guest can modify the contents of CA to appear to
 be yet another qcow2 image.  At which point, if libvirt follows the backing
 file specified in CA, then yes, the malicious guest really can cause libvirt
 to expose arbitrary file CB for manipulation by the guest.  But that
 security hole was already plugged - by default, libvirt refuses to probe
 backing files parsed from qcow2 headers for file format, but instead
 requires the outer qcow2 header to also include the a file format
 designation for the backing file.  At which point, you then have a safe
 chain: if C refers to CA, then libvirt knows that both C and CA are
 essential to the storage presented by giving qemu the file name C, and the
 guest will already be modifying CA, but there is no storage corruption
 involved.

But what if CA is accessed even if C is not? For example, QEMU opens C
(to determine CA and write new information about the path), closes it
and then requests CA?

 That is, as long as libvirt can already accurately read the chain of backing
 files from any starting point, then it can hand that entire chain of backing
 files (whether by the topmost file name as it does now, or whether by a
 series of fds as is being proposed) to qemu.


 This could be handled so that instead of naming the backing file, QEMU
 asks for a descriptor for the backing file by presenting the
 descriptor to main file C, but I think the real solution is that
 libvirt should handle the storage formats completely and it should
 present QEMU with only a raw file like interface (read/write/seek) for
 the data. Then any backing files would be handled within libvirt.
 Performance could suffer, though.

 The monitor interface was not designed to throw the read()/write()/seek()
 burden back on libvirt, and indeed that would kill performance so it is a
 non-starter idea.  All we need for security is the open() burden to be
 shifted out of qemu and into libvirt.

Obviously the interface should be faster than monitor, for example a
pair of sockets with some efficient protocol. Monitor could still be
used to set up these.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Stefan Hajnoczi
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.

The backing file information changes when image streaming completes.

Before: fedora.img - my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)

The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.

I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
Today we seem to be sprinkling a little bit of storage management into
QEMU and a little bit into libvirt :(.

In that spirit it is much nicer to think of storage like a SAN
appliance where you have LUNs that you access as block devices.  It
also provides an API for snapshotting, cloning LUNs, etc.

Let's move to that model instead of worrying about how to spread
storage logic across QEMU and libvirt.

Stefan



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Michael Roth

On 07/19/2011 11:47 AM, Daniel P. Berrange wrote:

On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:

On 07/19/11 16:24, Eric Blake wrote:

[adding the libvir-list]
On 07/19/2011 08:09 AM, Jes Sorensen wrote:

Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.


But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.


What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.


Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.


We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.


This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.


How wrong would it be to call out to qemu-img to handle this instead? 
Seems like a more stable interface (assuming the output of `qemu-img 
info` is treated as an API of sorts, or perhaps some other output mode 
is added to qemu-img that is considered stable).




Daniel





Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Blue Swirl
On Thu, Jul 21, 2011 at 11:25 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 20.07.2011 19:20, schrieb Blue Swirl:
 On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 20.07.2011 15:25, schrieb Jes Sorensen:
 On 07/20/11 12:01, Kevin Wolf wrote:
 Right, we're stuck with the two horros of NFS and selinux, so we need
 something that gets around the problem. In a sane world we would simply
 say 'no NFS, no selinux', but as you say that will never happen.

 My suggestion of a callback mechanism where libvirt registers the
 callback with QEMU for open() calls, allowing libvirt to perform the
 open and return the open file descriptor would get around this problem.
 To me this sounds more like a problem than a solution. It basically
 means that during an open (which may even be initiated by a monitor
 command), you need monitor interaction. It basically means that open
 becomes asynchronous, and requires clients to deal with that, which
 sounds at least interesting... Also you have to add some magic to all
 places opening something.

 I think if libvirt wants qemu to use an fd instead of a file name, it
 shouldn't pass a file name but an fd in the first place. Which means
 that the two that we need are support for an fd: protocol (patches on
 the list, need review), and a way for libvirt to override the backing
 file of an image.

 The problem is that QEMU will find backing file file names inside the
 images which it will be unable to open. How do you suggest we get around
 that?

 This is the part with allowing libvirt to override the backing file. Of
 course, this is not something that we can add with five lines of code,
 it requires -blockdev.

 There could still be some issues:
 Let's have files A, B, C etc. with backing files AA etc. How would
 libvirt know that when QEMU wants to write to file CA, this is because
 it's needed to access C, or is it just trickery by a devious guest to
 corrupt storage?

 This could be handled so that instead of naming the backing file, QEMU
 asks for a descriptor for the backing file by presenting the
 descriptor to main file C,

 qemu shouldn't ask for anything. libvirt shouldn't give it a filename in
 the first place. It should do something like this:

 { execute: blockdev_add, arguments= {
  driver: fd, fd: 4, backing-file: {
    driver: fd, fd: 5
  }
 }}

 And then qemu doesn't even have a reason to know that there is something
 called CA.

Yes, that's better.

 but I think the real solution is that
 libvirt should handle the storage formats completely and it should
 present QEMU with only a raw file like interface (read/write/seek) for
 the data. Then any backing files would be handled within libvirt.
 Performance could suffer, though.

 I like your humour. :-)

Well, for some applications, security is more important than
performance or convenience.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Blue Swirl
On Thu, Jul 21, 2011 at 11:07 AM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 07/20/11 21:51, Blue Swirl wrote:
 And the snapshot_blkdev monitor command is a case where qemu needs to create
  a new qcow2 image on the fly, while referencing the name of an existing
  file.  What backing name do you put in the new qcow2 file unless you 
  already
  have a name association for all fds already open for the existing backing
  file?
 For backing file with original name of /path/in/storage, QEMU could
 present the fd and the backin path by requesting something like
 fd:12,/path/in/storage. The next file in chain /path2/file would
 be fd:12,fd:34,/path2/file. Or if possible, -fd 12 -backing
 /path/in/storage with spaces and funny characters escaped etc.

 Rather than trying to do this by mangling files on the disk, I would
 suggest we allow registering a call-back open function, which calls back
 into libvirt and requests it to open a given file. It can then do all
 it's security foo to decide whether or not to allow the file to be open.

Just to clarify: I was not proposing any mangling of the files.

 This is relatively clean and avoids the mess of relying on outside
 processes messing about in the images.

 Cheers,
 Jes





Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Blue Swirl
On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.

 The backing file information changes when image streaming completes.

 Before: fedora.img - my_vm.qed
 After: my_vm.qed (fedora.img is no longer referenced)

 The image streaming operation copies data out of fedora.img and
 populates my_vm.qed.  When image streaming completes, the backing file
 is no longer needed and my_vm.qed is updated to drop the backing file.

 I think we need to design carefully to prevent QEMU and libvirt making
 incorrect assumptions about who does what.  I really wish that all
 this image file business was outside QEMU and libvirt - that we had a
 separate storage management service which handled the details.  QEMU
 would only do block device operations (no image format manipulation),
 and libvirt would only delegate to the storage management service.
 Today we seem to be sprinkling a little bit of storage management into
 QEMU and a little bit into libvirt :(.

 In that spirit it is much nicer to think of storage like a SAN
 appliance where you have LUNs that you access as block devices.  It
 also provides an API for snapshotting, cloning LUNs, etc.

 Let's move to that model instead of worrying about how to spread
 storage logic across QEMU and libvirt.

Would NBD protocol fit to this purpose, or is it too simple? Then
libvirt would handle the storage format completely and present an NBD
interface to QEMU (or give an fd to an external service) and QEMU
would not care about the storage format in this mode at all.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-21 Thread Stefan Hajnoczi
On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl blauwir...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote:
 Thank you for persisting - you've found another hole that needs to be
 plugged.  It sounds like you are proposing that after a qemu process dies,
 that libvirt re-reads the qcow2 metadata headers, and validates that the
 backing file information has not changed in a manner unexpected by libvirt.
  If it has, then the qemu process that just died was compromised to the
 point that restarting a new qemu process from the old image is now a
 security risk.  So this is _yet another_ security aspect that needs to be
 coded into libvirt as part of hardening sVirt.

 The backing file information changes when image streaming completes.

 Before: fedora.img - my_vm.qed
 After: my_vm.qed (fedora.img is no longer referenced)

 The image streaming operation copies data out of fedora.img and
 populates my_vm.qed.  When image streaming completes, the backing file
 is no longer needed and my_vm.qed is updated to drop the backing file.

 I think we need to design carefully to prevent QEMU and libvirt making
 incorrect assumptions about who does what.  I really wish that all
 this image file business was outside QEMU and libvirt - that we had a
 separate storage management service which handled the details.  QEMU
 would only do block device operations (no image format manipulation),
 and libvirt would only delegate to the storage management service.
 Today we seem to be sprinkling a little bit of storage management into
 QEMU and a little bit into libvirt :(.

 In that spirit it is much nicer to think of storage like a SAN
 appliance where you have LUNs that you access as block devices.  It
 also provides an API for snapshotting, cloning LUNs, etc.

 Let's move to that model instead of worrying about how to spread
 storage logic across QEMU and libvirt.

 Would NBD protocol fit to this purpose, or is it too simple? Then
 libvirt would handle the storage format completely and present an NBD
 interface to QEMU (or give an fd to an external service) and QEMU
 would not care about the storage format in this mode at all.

NBD does not support flush (fdatasync).  Therefore it only supports
the slow cache=writethrough mode in a safe manner.

It would be neat to use virtio-blk as the interface because it can be
passed through to the guest.  The guest talks directly to the storage
management service without going through QEMU.  The trick is to do
something like vhost:
1. An ioeventfd for virtqueue (guest-host) kicks
2. An irqfd for host-guest kicks
3. Shared memory for vring and zero-copy data access

The storage management service provides a UNIX domain socket over
which fds can be passed to set up the vhost-like virtio-blk interface.

Moving the image format code into a separate program makes it possible
to safely write to a backing file while VMs are using it because the
storage service can be host-wide, not per-VM.  For example, streaming
a shared backing file over NFS while running VMs using copy-on-write
images.  If we ever want to do deduplication or other global
operations, then this approach is nice too.

To summarize:
The storage service manages image files including creation, deletion,
snapshotting, and actual I/O.  QEMU uses a vhost-like virtio-blk
interface and can pass it directly into the guest.  libvirt uses the
storage service API without needing to parse image files or keep track
of backing file relationships.

Stefan



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Markus Armbruster
Daniel P. Berrange berra...@redhat.com writes:

 On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
 On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com 
 wrote:
  On 07/19/11 16:24, Eric Blake wrote:
  [adding the libvir-list]
  On 07/19/2011 08:09 AM, Jes Sorensen wrote:
  Urgh, libvirt parsing image files is really unfortunate, it really
  doesn't give me warm fuzzy feelings :( libvirt really should not know
  about internals of image formats.
 
  But even if you add new features to qemu to avoid needing this in the
  future, it doesn't change the past - libvirt will always have to know
  how to parse image files understood by older qemu, and so as long as
  libvirt already knows how to do that parsing, we might as well take
  advantage of it.
 
  What has been done here in the past is plain wrong. Continuing to do it
  isn't the right thing to do here.
 
  Besides, I feel that having a well-documented file format, so that
  independent applications can both parse the same file with the same
  semantics by obeying the file format specification, is a good design goal.
 
  We all know that documentation is rarely uptodate, new features may not
  get added and libvirt will never be able to keep up. The driver for a
  file format belongs in QEMU and nowhere else.
 
 It should be a goal to avoid dependencies in multiple layers of the
 stack because it becomes are to add new features - they require
 coordinated changes in multiple layers.  Having both QEMU and libvirt
 know the internals of image files is such a multi-dependency.  If I
 want to add a new format or change an existing format I have to touch
 both layers.
 
 For fd-passing perhaps we have an opportunity to use a callback
 mechanism (QEMU request: filename - libvirt response: fd) and do all
 the image format parsing in QEMU.

 The reason why libvirt does the parsing of file headers to determine
 backing files is to maintain the trust boundary. Everything run from
 the exec() of QEMU onwards is considered untrusted code. So having
 QEMU parsing the file headers  passing back open() requests to libvirt
 is breaking the trust boundary.

Exactly.

The block drivers form a tree.  Each driver opens its children.  For
convenience, some of them have information on how to open their children
encoded in their image (e.g. COW backing images).  Others receive it as
configuration, encoded in their filename string (e.g. blkdebug).

Thus, we have direct control only over the root block driver.  We can
pass it fds easily.

We control the non-root block drivers only indirectly, through the block
drivers along the path from the root.  We don't have a way to pass them
fds.

If we had a way to construct the tree bottom-up, we could directly
control all the block drivers.

Requires knowing the number of children, and extracting child
configuration from images where applicable.

[...]



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Jes Sorensen
On 07/19/11 18:46, Daniel P. Berrange wrote:
 On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
 For fd-passing perhaps we have an opportunity to use a callback
 mechanism (QEMU request: filename - libvirt response: fd) and do all
 the image format parsing in QEMU.
 
 The reason why libvirt does the parsing of file headers to determine
 backing files is to maintain the trust boundary. Everything run from
 the exec() of QEMU onwards is considered untrusted code. So having
 QEMU parsing the file headers  passing back open() requests to libvirt
 is breaking the trust boundary.

Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.

 NB, i'm not happy about libvirt having to have knowledge of file format
 headers, but we needed something more efficient  reliable than invoking
 qemu-img info  parsing the output. Ideally QEMU (or something else)
 would provide a library libblockformat.so with stable APIs for at least
 reading metadata about image formats. If it had APIs for image creation,
 etc too that would be a bonus, but we're more or less ok spawning qemu-img
 for those cases currently.

Even having a library for libvirt to link against is suboptimal here.
Two processes shouldn't be fighting over the internals of metadata, the
ownership of the metadata belongs solely with QEMU. In addition you have
the constant issue of dependencies there, hence if QEMU is updated and
it provides a newer block format library, it may prevent libvirt from
running forcing an update of libvirt as well. That is not acceptable for
development.

Cheers,
Jes



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Jes Sorensen
On 07/19/11 18:14, Anthony Liguori wrote:
 As nice as that sentiment is, it will never fly, because it would be a
 regression in current behavior.  The whole reason that the virt_use_nfs
 SELinux bool exists is that some people are willing to make the partial
 security tradeoff.  Besides, the use of sVirt via SELinux is more than
 just open() protection - while the current virt_use_nfs bool makes NFS
 less secure than otherwise possible, it still gives some nice guarantees
 to the rest of the qemu process such as passthrough accesses to local
 pci devices.

 Well leaving things at status quo is not making it worse, it just leaves
 an evil in place.
 
 NFS and SELinux is a fundamental problem with SELinux and NFS.  We can
 piss and moan as much as we want about it but it's reality.  SELinux
 fundamentally requires extended attributes.  By the time NFS adds
 extended attribute support, we'll all be flying around in hover cars.
 
 As terrible as NFS is, people use it all of the time.
 
 It would be nice if libvirt had the ability to make better use of DAC to
 support isolation.  The fact that MAC is the only way you can do
 isolation between guests is pretty unfortunate.  If I could assign
 specific UIDs to a guest and use that to enforce isolation, it would go
 a long ways to solving this problem.

Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.

My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.

Jes



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Jes Sorensen
On 07/19/11 18:47, Daniel P. Berrange wrote:
 On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
 On 07/19/11 16:24, Eric Blake wrote:
 Besides, I feel that having a well-documented file format, so that
 independent applications can both parse the same file with the same
 semantics by obeying the file format specification, is a good design goal.

 We all know that documentation is rarely uptodate, new features may not
 get added and libvirt will never be able to keep up. The driver for a
 file format belongs in QEMU and nowhere else.
 
 This would be possible if QEMU to provide a libblockformat.so library
 which allowed apps to extract metadata from file formats using a stable
 API.

There is no reason for libvirt or any external process to mess about
with the internals of image files. You have the same problem if the
image file is encrypted.

Jes




Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Daniel P. Berrange
On Wed, Jul 20, 2011 at 10:26:49AM +0200, Jes Sorensen wrote:
 On 07/19/11 18:47, Daniel P. Berrange wrote:
  On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
  On 07/19/11 16:24, Eric Blake wrote:
  Besides, I feel that having a well-documented file format, so that
  independent applications can both parse the same file with the same
  semantics by obeying the file format specification, is a good design goal.
 
  We all know that documentation is rarely uptodate, new features may not
  get added and libvirt will never be able to keep up. The driver for a
  file format belongs in QEMU and nowhere else.
  
  This would be possible if QEMU to provide a libblockformat.so library
  which allowed apps to extract metadata from file formats using a stable
  API.
 
 There is no reason for libvirt or any external process to mess about
 with the internals of image files. You have the same problem if the
 image file is encrypted.

Just repeating libvirt doesn't need todo this many times doesn't make
it true. I have described why we need to read the disk image to determine
its backing files ahead of QEMU being launched quite clearly.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Daniel P. Berrange
On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote:
 On 07/19/11 18:46, Daniel P. Berrange wrote:
  On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
  For fd-passing perhaps we have an opportunity to use a callback
  mechanism (QEMU request: filename - libvirt response: fd) and do all
  the image format parsing in QEMU.
  
  The reason why libvirt does the parsing of file headers to determine
  backing files is to maintain the trust boundary. Everything run from
  the exec() of QEMU onwards is considered untrusted code. So having
  QEMU parsing the file headers  passing back open() requests to libvirt
  is breaking the trust boundary.
 
 Pardon, but I fail to see the issue here. If QEMU passes a filename back
 to libvirt, libvirt still gets to make the decision whether or not it is
 legitimate for QEMU to get that file descriptor or not. It doesn't
 change anything wrt who actually opens the file, hence the 'trust' is
 unchanged.

To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.

  NB, i'm not happy about libvirt having to have knowledge of file format
  headers, but we needed something more efficient  reliable than invoking
  qemu-img info  parsing the output. Ideally QEMU (or something else)
  would provide a library libblockformat.so with stable APIs for at least
  reading metadata about image formats. If it had APIs for image creation,
  etc too that would be a bonus, but we're more or less ok spawning qemu-img
  for those cases currently.
 
 Even having a library for libvirt to link against is suboptimal here.
 Two processes shouldn't be fighting over the internals of metadata, the
 ownership of the metadata belongs solely with QEMU. In addition you have
 the constant issue of dependencies there, hence if QEMU is updated and
 it provides a newer block format library, it may prevent libvirt from
 running forcing an update of libvirt as well. That is not acceptable for
 development.

We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has  what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Kevin Wolf
Am 19.07.2011 18:46, schrieb Daniel P. Berrange:
 On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
 On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com 
 wrote:
 On 07/19/11 16:24, Eric Blake wrote:
 [adding the libvir-list]
 On 07/19/2011 08:09 AM, Jes Sorensen wrote:
 Urgh, libvirt parsing image files is really unfortunate, it really
 doesn't give me warm fuzzy feelings :( libvirt really should not know
 about internals of image formats.

 But even if you add new features to qemu to avoid needing this in the
 future, it doesn't change the past - libvirt will always have to know
 how to parse image files understood by older qemu, and so as long as
 libvirt already knows how to do that parsing, we might as well take
 advantage of it.

 What has been done here in the past is plain wrong. Continuing to do it
 isn't the right thing to do here.

 Besides, I feel that having a well-documented file format, so that
 independent applications can both parse the same file with the same
 semantics by obeying the file format specification, is a good design goal.

 We all know that documentation is rarely uptodate, new features may not
 get added and libvirt will never be able to keep up. The driver for a
 file format belongs in QEMU and nowhere else.

 It should be a goal to avoid dependencies in multiple layers of the
 stack because it becomes are to add new features - they require
 coordinated changes in multiple layers.  Having both QEMU and libvirt
 know the internals of image files is such a multi-dependency.  If I
 want to add a new format or change an existing format I have to touch
 both layers.

 For fd-passing perhaps we have an opportunity to use a callback
 mechanism (QEMU request: filename - libvirt response: fd) and do all
 the image format parsing in QEMU.
 
 The reason why libvirt does the parsing of file headers to determine
 backing files is to maintain the trust boundary. Everything run from
 the exec() of QEMU onwards is considered untrusted code. So having
 QEMU parsing the file headers  passing back open() requests to libvirt
 is breaking the trust boundary.
 
 NB, i'm not happy about libvirt having to have knowledge of file format
 headers, but we needed something more efficient  reliable than invoking
 qemu-img info  parsing the output. 

What's the real problem with this approach? Parsing the data meant for
humans, from an interface that is potentially unstable? If this is it,
it should be easy enough to add a JSON output mode to qemu-img info.

 Ideally QEMU (or something else)
 would provide a library libblockformat.so with stable APIs for at least
 reading metadata about image formats. If it had APIs for image creation,
 etc too that would be a bonus, but we're more or less ok spawning qemu-img
 for those cases currently.

I'm afraid the block drivers have too many dependencies on the qemu core
for this to be an option without investing a lot of effort.

Kevin



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Daniel P. Berrange
On Wed, Jul 20, 2011 at 11:50:53AM +0200, Kevin Wolf wrote:
 Am 19.07.2011 18:46, schrieb Daniel P. Berrange:
  On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
  On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com 
  wrote:
  On 07/19/11 16:24, Eric Blake wrote:
  [adding the libvir-list]
  On 07/19/2011 08:09 AM, Jes Sorensen wrote:
  Urgh, libvirt parsing image files is really unfortunate, it really
  doesn't give me warm fuzzy feelings :( libvirt really should not know
  about internals of image formats.
 
  But even if you add new features to qemu to avoid needing this in the
  future, it doesn't change the past - libvirt will always have to know
  how to parse image files understood by older qemu, and so as long as
  libvirt already knows how to do that parsing, we might as well take
  advantage of it.
 
  What has been done here in the past is plain wrong. Continuing to do it
  isn't the right thing to do here.
 
  Besides, I feel that having a well-documented file format, so that
  independent applications can both parse the same file with the same
  semantics by obeying the file format specification, is a good design 
  goal.
 
  We all know that documentation is rarely uptodate, new features may not
  get added and libvirt will never be able to keep up. The driver for a
  file format belongs in QEMU and nowhere else.
 
  It should be a goal to avoid dependencies in multiple layers of the
  stack because it becomes are to add new features - they require
  coordinated changes in multiple layers.  Having both QEMU and libvirt
  know the internals of image files is such a multi-dependency.  If I
  want to add a new format or change an existing format I have to touch
  both layers.
 
  For fd-passing perhaps we have an opportunity to use a callback
  mechanism (QEMU request: filename - libvirt response: fd) and do all
  the image format parsing in QEMU.
  
  The reason why libvirt does the parsing of file headers to determine
  backing files is to maintain the trust boundary. Everything run from
  the exec() of QEMU onwards is considered untrusted code. So having
  QEMU parsing the file headers  passing back open() requests to libvirt
  is breaking the trust boundary.
  
  NB, i'm not happy about libvirt having to have knowledge of file format
  headers, but we needed something more efficient  reliable than invoking
  qemu-img info  parsing the output. 
 
 What's the real problem with this approach? Parsing the data meant for
 humans, from an interface that is potentially unstable? If this is it,
 it should be easy enough to add a JSON output mode to qemu-img info.

It is a really heavyweight solution to have to spawn qemu-img every
time we need to access this data, when it can be done with a trivial
open+read+close sequence. In addition the output data format is not
entirely pleasant for machine reading (some fields only have data
rounded up to MB, not the raw byte count). Finally, we also wanted
to be able to extract some basic metdata about disk image formats on
non-QEMU hosts, for our storage management APIs which are used on Xen
or VMWare hosts where many of these same disk image formats are also
used. A JSON output mode would be helpful, but unfortunately can't
really address the other issues.

  Ideally QEMU (or something else)
  would provide a library libblockformat.so with stable APIs for at least
  reading metadata about image formats. If it had APIs for image creation,
  etc too that would be a bonus, but we're more or less ok spawning qemu-img
  for those cases currently.
 
 I'm afraid the block drivers have too many dependencies on the qemu core
 for this to be an option without investing a lot of effort.

That's why I sort of think there is value in having someone provide a
standalone  library API for querying some core set of block format
metadata. QEMU is but one project with virtual disk formats, there are
plenty of others out there in existance, so while reusing QEMU block
code would be nice, it isn't leading to any significant reduction in
copies of block format parsing code amongst all the virt projects in
existance.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Kevin Wolf
Am 20.07.2011 10:25, schrieb Jes Sorensen:
 On 07/19/11 18:14, Anthony Liguori wrote:
 As nice as that sentiment is, it will never fly, because it would be a
 regression in current behavior.  The whole reason that the virt_use_nfs
 SELinux bool exists is that some people are willing to make the partial
 security tradeoff.  Besides, the use of sVirt via SELinux is more than
 just open() protection - while the current virt_use_nfs bool makes NFS
 less secure than otherwise possible, it still gives some nice guarantees
 to the rest of the qemu process such as passthrough accesses to local
 pci devices.

 Well leaving things at status quo is not making it worse, it just leaves
 an evil in place.

 NFS and SELinux is a fundamental problem with SELinux and NFS.  We can
 piss and moan as much as we want about it but it's reality.  SELinux
 fundamentally requires extended attributes.  By the time NFS adds
 extended attribute support, we'll all be flying around in hover cars.

 As terrible as NFS is, people use it all of the time.

 It would be nice if libvirt had the ability to make better use of DAC to
 support isolation.  The fact that MAC is the only way you can do
 isolation between guests is pretty unfortunate.  If I could assign
 specific UIDs to a guest and use that to enforce isolation, it would go
 a long ways to solving this problem.
 
 Right, we're stuck with the two horros of NFS and selinux, so we need
 something that gets around the problem. In a sane world we would simply
 say 'no NFS, no selinux', but as you say that will never happen.
 
 My suggestion of a callback mechanism where libvirt registers the
 callback with QEMU for open() calls, allowing libvirt to perform the
 open and return the open file descriptor would get around this problem.

To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least interesting... Also you have to add some magic to all
places opening something.

I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.

Kevin



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Cleber Rosa

On 07/19/2011 12:14 PM, Anthony Liguori wrote:

On 07/19/2011 09:30 AM, Jes Sorensen wrote:

On 07/19/11 16:24, Eric Blake wrote:

[adding the libvir-list]
On 07/19/2011 08:09 AM, Jes Sorensen wrote:

Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.


But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.


What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.


Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design 
goal.


We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.



It would be nice if libvirt had a way to pass fds for every disk and
backing file up front; then, SELinux can work around the lack of NFS
per-file labelling by blocking open() in qemu.  In fact, this has
already been proposed:


A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.


Yes, that could probably be made to work with libvirt.


I am a little frustrated this approach wasn't taken up front instead of
the evil hack of having libvirt attempt to parse image files.


If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.


As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior.  The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff.  Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice 
guarantees

to the rest of the qemu process such as passthrough accesses to local
pci devices.


Well leaving things at status quo is not making it worse, it just leaves
an evil in place.


NFS and SELinux is a fundamental problem with SELinux and NFS.  We can 
piss and moan as much as we want about it but it's reality.  SELinux 
fundamentally requires extended attributes.  By the time NFS adds 
extended attribute support, we'll all be flying around in hover cars.


As terrible as NFS is, people use it all of the time.

It would be nice if libvirt had the ability to make better use of DAC 
to support isolation.  The fact that MAC is the only way you can do 
isolation between guests is pretty unfortunate.  If I could assign 
specific UIDs to a guest and use that to enforce isolation, it would 
go a long ways to solving this problem.


Just as a reminder: with DAC, if a guest is compromised and somehow 
escalates to QEMU, it could disable its isolation (ie, by setting their 
own image files world readable). I guess we shouldn't try to fix the DAC 
model, but fix what's preventing us from fully using MAC, even though 
it's outside of QEMU.


CR.



Regards,

Anthony Liguori






Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Jes Sorensen
On 07/20/11 12:01, Kevin Wolf wrote:
  Right, we're stuck with the two horros of NFS and selinux, so we need
  something that gets around the problem. In a sane world we would simply
  say 'no NFS, no selinux', but as you say that will never happen.
  
  My suggestion of a callback mechanism where libvirt registers the
  callback with QEMU for open() calls, allowing libvirt to perform the
  open and return the open file descriptor would get around this problem.
 To me this sounds more like a problem than a solution. It basically
 means that during an open (which may even be initiated by a monitor
 command), you need monitor interaction. It basically means that open
 becomes asynchronous, and requires clients to deal with that, which
 sounds at least interesting... Also you have to add some magic to all
 places opening something.
 
 I think if libvirt wants qemu to use an fd instead of a file name, it
 shouldn't pass a file name but an fd in the first place. Which means
 that the two that we need are support for an fd: protocol (patches on
 the list, need review), and a way for libvirt to override the backing
 file of an image.

The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?

Jes




Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Kevin Wolf
Am 20.07.2011 15:25, schrieb Jes Sorensen:
 On 07/20/11 12:01, Kevin Wolf wrote:
 Right, we're stuck with the two horros of NFS and selinux, so we need
 something that gets around the problem. In a sane world we would simply
 say 'no NFS, no selinux', but as you say that will never happen.

 My suggestion of a callback mechanism where libvirt registers the
 callback with QEMU for open() calls, allowing libvirt to perform the
 open and return the open file descriptor would get around this problem.
 To me this sounds more like a problem than a solution. It basically
 means that during an open (which may even be initiated by a monitor
 command), you need monitor interaction. It basically means that open
 becomes asynchronous, and requires clients to deal with that, which
 sounds at least interesting... Also you have to add some magic to all
 places opening something.

 I think if libvirt wants qemu to use an fd instead of a file name, it
 shouldn't pass a file name but an fd in the first place. Which means
 that the two that we need are support for an fd: protocol (patches on
 the list, need review), and a way for libvirt to override the backing
 file of an image.
 
 The problem is that QEMU will find backing file file names inside the
 images which it will be unable to open. How do you suggest we get around
 that?

This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.

Kevin



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Anthony Liguori

On 07/20/2011 08:50 AM, Cleber Rosa wrote:

Just as a reminder: with DAC, if a guest is compromised and somehow
escalates to QEMU, it could disable its isolation (ie, by setting their
own image files world readable). I guess we shouldn't try to fix the DAC
model, but fix what's preventing us from fully using MAC, even though
it's outside of QEMU.


I don't see how a guest making its data world readable is a fundamental 
problem.


DAC is a fundamental part of the Unix design and is something that 
administrators understand very well.  I completely understand the value 
of MAC but to argue that we shouldn't present DAC as an option I think 
is fundamentally wrong.


Regards,

Anthony Liguori



CR.



Regards,

Anthony Liguori









Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Eric Blake

On 07/20/2011 07:25 AM, Jes Sorensen wrote:

I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.


The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?


We've already told you - qemu must have a way to be passed fds which are 
associated with names, and when a file refers to another backing file by 
name, then qemu falls back on its fd/name mapping to use the 
already-passed fd instead.  Which implies that someone else, either 
libvirt or a qemu-maintained libblockformat.so, needs to have a stable 
interface for parsing the backing file name out of an arbitrary qcow2 
file, and that this interface must work no matter how many other 
extensions are added to qcow2.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Anthony Liguori

On 07/19/2011 11:47 AM, Daniel P. Berrange wrote:

This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.


I'm in 100% agreement that we need to provide the equivalent of a 
libblockformat.so down the road.


But the block layer needs some work before we can support a stable 
interface.


Regards,

Anthony Liguori



Daniel





Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Eric Blake

On 07/20/2011 11:27 AM, Blue Swirl wrote:

We've already told you - qemu must have a way to be passed fds which are
associated with names, and when a file refers to another backing file by
name, then qemu falls back on its fd/name mapping to use the already-passed
fd instead.  Which implies that someone else, either libvirt or a
qemu-maintained libblockformat.so, needs to have a stable interface for
parsing the backing file name out of an arbitrary qcow2 file, and that this
interface must work no matter how many other extensions are added to qcow2.


I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.


Won't work.  Well, it might work within the context of a single qemu 
process.  But when that process ends, then libvirt would have to touch 
up the qcow2 headers of that file to replace the /etc/shadow name with 
the real backing file name, otherwise, the next time you restart 
qemu-img or a new qemu guest using the same image, the information has 
been lost, since the fd has been closed in the meantime.


We really _do_ need a way to give qemu both an fd and the name of the 
file that the fd is tied to.  On Linux, qemu could use /proc/self/fd to 
reconstruct the name from fd, but that's not portable to other OS.  And 
we've already discussed how in the libvirt model, that libvirt is deemed 
more secure than qemu.  Therefore, I think it is reasonable for qemu to 
make the assumptions that if it exposes a monitor command where the 
supervisor (libvirt or otherwise) can pass in both an fd and a file 
name, that either the supervisor is passing in correct information, or 
that the bug is in the supervisor and not in qemu if the supervisor 
passes in wrong information and things blow up.


And the snapshot_blkdev monitor command is a case where qemu needs to 
create a new qcow2 image on the fly, while referencing the name of an 
existing file.  What backing name do you put in the new qcow2 file 
unless you already have a name association for all fds already open for 
the existing backing file?


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Blue Swirl
On Wed, Jul 20, 2011 at 4:46 PM, Eric Blake ebl...@redhat.com wrote:
 On 07/20/2011 07:25 AM, Jes Sorensen wrote:

 I think if libvirt wants qemu to use an fd instead of a file name, it
 shouldn't pass a file name but an fd in the first place. Which means
 that the two that we need are support for an fd: protocol (patches on
 the list, need review), and a way for libvirt to override the backing
 file of an image.

 The problem is that QEMU will find backing file file names inside the
 images which it will be unable to open. How do you suggest we get around
 that?

 We've already told you - qemu must have a way to be passed fds which are
 associated with names, and when a file refers to another backing file by
 name, then qemu falls back on its fd/name mapping to use the already-passed
 fd instead.  Which implies that someone else, either libvirt or a
 qemu-maintained libblockformat.so, needs to have a stable interface for
 parsing the backing file name out of an arbitrary qcow2 file, and that this
 interface must work no matter how many other extensions are added to qcow2.

I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Cleber Rosa

On 07/20/2011 10:34 AM, Anthony Liguori wrote:

On 07/20/2011 08:50 AM, Cleber Rosa wrote:

Just as a reminder: with DAC, if a guest is compromised and somehow
escalates to QEMU, it could disable its isolation (ie, by setting their
own image files world readable). I guess we shouldn't try to fix the DAC
model, but fix what's preventing us from fully using MAC, even though
it's outside of QEMU.


I don't see how a guest making its data world readable is a 
fundamental problem.


Well, if we're discussing security models and how to provide the best 
isolation we can to VMs/QEMU instances, then a VM being able to read (or 
even write) data of another VM *is* a fundamental problem. setting 
their own imagine files world readable is just one example of how that 
could be accomplished.




DAC is a fundamental part of the Unix design and is something that 
administrators understand very well.


That's is a true sentence, but it does not make DAC the most appropriate 
solution here.


  I completely understand the value of MAC but to argue that we 
shouldn't present DAC as an option I think is fundamentally wrong.


I never said, and really don't think we shouldn't provide other security 
options/models, this is actually part of the well accepted security in 
multiple layers strategy.


I did assume, though, we were aiming for the best isolation level, and 
that is definitely MAC. DAC may indeed be good enough for some, but 
definitely not good enough for many others.


CR.



Regards,

Anthony Liguori



CR.



Regards,

Anthony Liguori












Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Eric Blake

On 07/20/2011 11:20 AM, Blue Swirl wrote:

There could still be some issues:
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?


The fix for CVE-2010-2238 already deals with this: if primary image C 
refers to backing file CA of raw format, but does not state what file 
format CA contains, then a malicious guest can modify the contents of CA 
to appear to be yet another qcow2 image.  At which point, if libvirt 
follows the backing file specified in CA, then yes, the malicious guest 
really can cause libvirt to expose arbitrary file CB for manipulation by 
the guest.  But that security hole was already plugged - by default, 
libvirt refuses to probe backing files parsed from qcow2 headers for 
file format, but instead requires the outer qcow2 header to also include 
the a file format designation for the backing file.  At which point, you 
then have a safe chain: if C refers to CA, then libvirt knows that both 
C and CA are essential to the storage presented by giving qemu the file 
name C, and the guest will already be modifying CA, but there is no 
storage corruption involved.


That is, as long as libvirt can already accurately read the chain of 
backing files from any starting point, then it can hand that entire 
chain of backing files (whether by the topmost file name as it does now, 
or whether by a series of fds as is being proposed) to qemu.




This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C, but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.


The monitor interface was not designed to throw the 
read()/write()/seek() burden back on libvirt, and indeed that would kill 
performance so it is a non-starter idea.  All we need for security is 
the open() burden to be shifted out of qemu and into libvirt.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Blue Swirl
On Wed, Jul 20, 2011 at 8:47 PM, Eric Blake ebl...@redhat.com wrote:
 On 07/20/2011 11:27 AM, Blue Swirl wrote:

 We've already told you - qemu must have a way to be passed fds which are
 associated with names, and when a file refers to another backing file by
 name, then qemu falls back on its fd/name mapping to use the
 already-passed
 fd instead.  Which implies that someone else, either libvirt or a
 qemu-maintained libblockformat.so, needs to have a stable interface for
 parsing the backing file name out of an arbitrary qcow2 file, and that
 this
 interface must work no matter how many other extensions are added to
 qcow2.

 I'd avoid any name based access in this case. If QEMU has write access
 to main file, it could forge the backing file name in main file to
 point to for example /etc/shadow and then request libvirt to perform
 the opening.

 Won't work.  Well, it might work within the context of a single qemu
 process.  But when that process ends, then libvirt would have to touch up
 the qcow2 headers of that file to replace the /etc/shadow name with the real
 backing file name, otherwise, the next time you restart qemu-img or a new
 qemu guest using the same image, the information has been lost, since the fd
 has been closed in the meantime.

How would libvirt know to do this touch up?

 We really _do_ need a way to give qemu both an fd and the name of the file
 that the fd is tied to.  On Linux, qemu could use /proc/self/fd to
 reconstruct the name from fd, but that's not portable to other OS.  And
 we've already discussed how in the libvirt model, that libvirt is deemed
 more secure than qemu.  Therefore, I think it is reasonable for qemu to make
 the assumptions that if it exposes a monitor command where the supervisor
 (libvirt or otherwise) can pass in both an fd and a file name, that either
 the supervisor is passing in correct information, or that the bug is in the
 supervisor and not in qemu if the supervisor passes in wrong information and
 things blow up.

Yes, I'm not worried about QEMU getting confused by libvirt.

 And the snapshot_blkdev monitor command is a case where qemu needs to create
 a new qcow2 image on the fly, while referencing the name of an existing
 file.  What backing name do you put in the new qcow2 file unless you already
 have a name association for all fds already open for the existing backing
 file?

For backing file with original name of /path/in/storage, QEMU could
present the fd and the backin path by requesting something like
fd:12,/path/in/storage. The next file in chain /path2/file would
be fd:12,fd:34,/path2/file. Or if possible, -fd 12 -backing
/path/in/storage with spaces and funny characters escaped etc.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Blue Swirl
On Wed, Jul 20, 2011 at 9:17 PM, Eric Blake ebl...@redhat.com wrote:
 On 07/20/2011 12:00 PM, Blue Swirl wrote:

 Let's have files A, B, C etc. with backing files AA etc. How would
 libvirt know that when QEMU wants to write to file CA, this is because
 it's needed to access C, or is it just trickery by a devious guest to
 corrupt storage?

 The fix for CVE-2010-2238 already deals with this: if primary image C
 refers
 to backing file CA of raw format, but does not state what file format CA
 contains, then a malicious guest can modify the contents of CA to appear
 to
 be yet another qcow2 image.  At which point, if libvirt follows the
 backing
 file specified in CA, then yes, the malicious guest really can cause
 libvirt
 to expose arbitrary file CB for manipulation by the guest.  But that
 security hole was already plugged - by default, libvirt refuses to probe
 backing files parsed from qcow2 headers for file format, but instead
 requires the outer qcow2 header to also include the a file format
 designation for the backing file.  At which point, you then have a safe
 chain: if C refers to CA, then libvirt knows that both C and CA are
 essential to the storage presented by giving qemu the file name C, and
 the
 guest will already be modifying CA, but there is no storage corruption
 involved.

 But what if CA is accessed even if C is not? For example, QEMU opens C
 (to determine CA and write new information about the path), closes it
 and then requests CA?

 Why is qemu trying to access CA?

 Either because CA was mentioned as a backing file for C (in which case
 libvirt already knows about it, because either libvirt handed C to qemu at
 startup time after already parsing C's headers to learn that CA is a backing
 file, or because libvirt called the snapshot_blkdev command with the intent
 of having qemu populate CA with C as its backing file), or because qemu has
 a bug (in which case, libvirt should refuse the access to CA).

So no new backing files can be introduced by QEMU after it has started
without libvirt knowing it?

 Libvirt is already perfectly capable of tracking all files that qemu might
 need to access, and whether it is qemu or libvirt that does the open() of
 those files, we can still have libvirt validate whether each request for a
 file makes sense given the context of all previous files in use from the
 time the qemu command line was invoked and across all monitor commands in
 the meantime.

 On non-NFS solutions, where every file can have a SELinux label, then the
 security is then present by merely having libvirt relabel all such files to
 a unique label for that particular qemu process, and SELinux merely enforces
 that qemu cannot open() anything but what libvirt has already labeled.  And
 since libvirt already knows which files to label in the non-NFS scenario, it
 already knows which fds to pass in the NFS scenario, at which point the
 ability to prevent qemu from open()ing an NFS file is a security
 enhancement.

 Your question about qemu wanting to use CA is thus answered independently of
 whether the fd management solution is solved by libvirt handing an fd for CA
 to qemu prior to any monitor command where qemu will then need to use CA, or
 whether qemu is taught to asynchronously ask libvirt to open an fd for CA on
 qemu's behalf.  The answer is that libvirt already tracks whether qemu
 should access CA, and just needs a way to enforce that knowledge.  The
 enforcement already exists for non-NFS via SELinux labels, and the proposal
 to add fd handling will expand that enforcement to also cover NFS.

OK. I think fds would be useful internally in a privilege separation
mode in plain QEMU too.



Re: [Qemu-devel] live snapshot wiki updated

2011-07-20 Thread Eric Blake

On 07/20/2011 02:01 PM, Blue Swirl wrote:

Either because CA was mentioned as a backing file for C (in which case
libvirt already knows about it, because either libvirt handed C to qemu at
startup time after already parsing C's headers to learn that CA is a backing
file, or because libvirt called the snapshot_blkdev command with the intent
of having qemu populate CA with C as its backing file), or because qemu has
a bug (in which case, libvirt should refuse the access to CA).


So no new backing files can be introduced by QEMU after it has started
without libvirt knowing it?


No, you missed my point.  A new backing file can only be introduced by 
qemu after it has started by libvirt using a finite set of monitor 
commands.  These include disk hotplug (libvirt adds to the list of files 
known to be accessed by qemu, by reading the image headers of the new 
disk to be hot-plugged prior to issuing the monitor command), by disk 
hot-unplug (libvirt revokes the access to the files making up that disk, 
which it remembers from before the disk was added), and snapshot_blkdev 
(libvirt is explicitly requesting a new qcow2 file with the old file as 
the backing image, so it knows the new relationship of files to be 
accessed by that block device).  Since libvirt issued the monitor 
commands, libvirt always knows which files qemu should be trying to 
access when servicing those block devices to the guest.




OK. I think fds would be useful internally in a privilege separation
mode in plain QEMU too.


Yes, there's more than one reason to add fd support to all possible 
situations where qemu is currently resorting to open().


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Jes Sorensen
On 07/18/11 16:08, Stefan Hajnoczi wrote:
 On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 I have been updating the live snapshot wiki for qemu to try and cover
 the commands we will want for async snapshot handling too.

 http://wiki.qemu.org/Features/Snapshots
 
 Regarding fd passing, do we even support SELinux today with backing files?

Not sure I understand what you mean. The current code should be happy to
take an existing file or a raw device for the snapshot.

Jes



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Stefan Hajnoczi
On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 07/18/11 16:08, Stefan Hajnoczi wrote:
 On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com 
 wrote:
 I have been updating the live snapshot wiki for qemu to try and cover
 the commands we will want for async snapshot handling too.

 http://wiki.qemu.org/Features/Snapshots

 Regarding fd passing, do we even support SELinux today with backing files?

 Not sure I understand what you mean. The current code should be happy to
 take an existing file or a raw device for the snapshot.

Sorry, I was off on a tangent.

I think today QEMU does not support opening image files with a backing
file purely using file descriptors.  We currently require the ability
to open files.

Stefan



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Jes Sorensen
On 07/19/11 15:23, Stefan Hajnoczi wrote:
 On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 07/18/11 16:08, Stefan Hajnoczi wrote:
 On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com 
 wrote:
 I have been updating the live snapshot wiki for qemu to try and cover
 the commands we will want for async snapshot handling too.

 http://wiki.qemu.org/Features/Snapshots

 Regarding fd passing, do we even support SELinux today with backing files?

 Not sure I understand what you mean. The current code should be happy to
 take an existing file or a raw device for the snapshot.
 
 Sorry, I was off on a tangent.
 
 I think today QEMU does not support opening image files with a backing
 file purely using file descriptors.  We currently require the ability
 to open files.

I see what you mean - I don't actually know how that would work, since
the backing file specified in the front image will be a file name.

Eric, what happens if libvirt in an selinux environment tells QEMU to
launch using an image file that is backed by backing file(s)?

Cheers,
Jes



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Eric Blake

On 07/19/2011 07:27 AM, Jes Sorensen wrote:

On 07/19/11 15:23, Stefan Hajnoczi wrote:

On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensenjes.soren...@redhat.com  wrote:

On 07/18/11 16:08, Stefan Hajnoczi wrote:

On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensenjes.soren...@redhat.com  wrote:

I have been updating the live snapshot wiki for qemu to try and cover
the commands we will want for async snapshot handling too.

http://wiki.qemu.org/Features/Snapshots


Regarding fd passing, do we even support SELinux today with backing files?


Not sure I understand what you mean. The current code should be happy to
take an existing file or a raw device for the snapshot.


Sorry, I was off on a tangent.

I think today QEMU does not support opening image files with a backing
file purely using file descriptors.  We currently require the ability
to open files.


I see what you mean - I don't actually know how that would work, since
the backing file specified in the front image will be a file name.

Eric, what happens if libvirt in an selinux environment tells QEMU to
launch using an image file that is backed by backing file(s)?


Before starting qemu, libvirt first parses all the image files, to see 
if any of them have backing images.  For every qcow2 or qed image with a 
backing file, libvirt sets the SELinux context of both the qcow2 image 
and its backing file so that qemu will be able to successfully open() 
them.  But if any of those files reside on NFS, then it is not possible 
to label individual files, so it requires setting the SELinux bool 
virt_use_nfs, which thus gives qemu the power to open() arbitrary files 
on NFS, and you've lost security.


It would be nice if libvirt had a way to pass fds for every disk and 
backing file up front; then, SELinux can work around the lack of NFS 
per-file labelling by blocking open() in qemu.  In fact, this has 
already been proposed:


http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02072.html
http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01992.html

That thread mentioned both a command-line syntax for passing in fds for 
backing files, as well as an extension to the getfd monitor command to 
allow association of a runtime fd with a filename.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Jes Sorensen
On 07/19/11 15:58, Eric Blake wrote:
 On 07/19/2011 07:27 AM, Jes Sorensen wrote:
 Eric, what happens if libvirt in an selinux environment tells QEMU to
 launch using an image file that is backed by backing file(s)?
 
 Before starting qemu, libvirt first parses all the image files, to see
 if any of them have backing images.  For every qcow2 or qed image with a
 backing file, libvirt sets the SELinux context of both the qcow2 image
 and its backing file so that qemu will be able to successfully open()
 them.  But if any of those files reside on NFS, then it is not possible
 to label individual files, so it requires setting the SELinux bool
 virt_use_nfs, which thus gives qemu the power to open() arbitrary files
 on NFS, and you've lost security.

Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.

 It would be nice if libvirt had a way to pass fds for every disk and
 backing file up front; then, SELinux can work around the lack of NFS
 per-file labelling by blocking open() in qemu.  In fact, this has
 already been proposed:

A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.

If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.

Cheers,
Jes



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Jes Sorensen
On 07/19/11 16:24, Eric Blake wrote:
 [adding the libvir-list]
 On 07/19/2011 08:09 AM, Jes Sorensen wrote:
 Urgh, libvirt parsing image files is really unfortunate, it really
 doesn't give me warm fuzzy feelings :( libvirt really should not know
 about internals of image formats.
 
 But even if you add new features to qemu to avoid needing this in the
 future, it doesn't change the past - libvirt will always have to know
 how to parse image files understood by older qemu, and so as long as
 libvirt already knows how to do that parsing, we might as well take
 advantage of it.

What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.

 Besides, I feel that having a well-documented file format, so that
 independent applications can both parse the same file with the same
 semantics by obeying the file format specification, is a good design goal.

We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.


 It would be nice if libvirt had a way to pass fds for every disk and
 backing file up front; then, SELinux can work around the lack of NFS
 per-file labelling by blocking open() in qemu.  In fact, this has
 already been proposed:

 A cleaner solution seems to have libvirt provide a call-back allowing
 QEMU to call out and have libvirt open a file descriptor instead. This
 way libvirt can validate it and open it for QEMU and pass it back.
 
 Yes, that could probably be made to work with libvirt.

I am a little frustrated this approach wasn't taken up front instead of
the evil hack of having libvirt attempt to parse image files.

 If we cannot do something like this, I would prefer to have backing
 files on NFS should simply not be supported when running in an selinux
 setup.
 
 As nice as that sentiment is, it will never fly, because it would be a
 regression in current behavior.  The whole reason that the virt_use_nfs
 SELinux bool exists is that some people are willing to make the partial
 security tradeoff.  Besides, the use of sVirt via SELinux is more than
 just open() protection - while the current virt_use_nfs bool makes NFS
 less secure than otherwise possible, it still gives some nice guarantees
 to the rest of the qemu process such as passthrough accesses to local
 pci devices.

Well leaving things at status quo is not making it worse, it just leaves
an evil in place.

 Just because it is currently not as secure to mix NFS shared storage
 with backing files doesn't stop some people from wanting to do it [in
 fact, that's my current development setup - I use qcow2 images on NFS
 shared storage, keep SELinux enabled, and enable the virt_use_nfs bool].
  This discussion is about adding enhancements that make SELinux even
 more powerful when using NFS shared storage, by adding fd passing
 (whether libvirt parses in advance, or whether qemu raises an event and
 requires feedback from libvirt), and not about crippling the existing
 capability to use the virt_use_nfs selinux bool.

I do not believe we should try and add extra interfaces to support
something which is inherently broken. This really boils down to whether
we should support fd passing for snapshots in the first place. If it is
to support the broken setup of libvirt parsing image files, then I am
totally against it, if we work on a proper solution that involves this
in some way, then we can discuss it.

Cheers,
Jes




Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Eric Blake

[adding the libvir-list]

On 07/19/2011 08:09 AM, Jes Sorensen wrote:

On 07/19/11 15:58, Eric Blake wrote:

On 07/19/2011 07:27 AM, Jes Sorensen wrote:

Eric, what happens if libvirt in an selinux environment tells QEMU to
launch using an image file that is backed by backing file(s)?


Before starting qemu, libvirt first parses all the image files, to see
if any of them have backing images.  For every qcow2 or qed image with a
backing file, libvirt sets the SELinux context of both the qcow2 image
and its backing file so that qemu will be able to successfully open()
them.  But if any of those files reside on NFS, then it is not possible
to label individual files, so it requires setting the SELinux bool
virt_use_nfs, which thus gives qemu the power to open() arbitrary files
on NFS, and you've lost security.


Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.


But even if you add new features to qemu to avoid needing this in the 
future, it doesn't change the past - libvirt will always have to know 
how to parse image files understood by older qemu, and so as long as 
libvirt already knows how to do that parsing, we might as well take 
advantage of it.


Besides, I feel that having a well-documented file format, so that 
independent applications can both parse the same file with the same 
semantics by obeying the file format specification, is a good design goal.





It would be nice if libvirt had a way to pass fds for every disk and
backing file up front; then, SELinux can work around the lack of NFS
per-file labelling by blocking open() in qemu.  In fact, this has
already been proposed:


A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.


Yes, that could probably be made to work with libvirt.



If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.


As nice as that sentiment is, it will never fly, because it would be a 
regression in current behavior.  The whole reason that the virt_use_nfs 
SELinux bool exists is that some people are willing to make the partial 
security tradeoff.  Besides, the use of sVirt via SELinux is more than 
just open() protection - while the current virt_use_nfs bool makes NFS 
less secure than otherwise possible, it still gives some nice guarantees 
to the rest of the qemu process such as passthrough accesses to local 
pci devices.


Just because it is currently not as secure to mix NFS shared storage 
with backing files doesn't stop some people from wanting to do it [in 
fact, that's my current development setup - I use qcow2 images on NFS 
shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. 
 This discussion is about adding enhancements that make SELinux even 
more powerful when using NFS shared storage, by adding fd passing 
(whether libvirt parses in advance, or whether qemu raises an event and 
requires feedback from libvirt), and not about crippling the existing 
capability to use the virt_use_nfs selinux bool.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Stefan Hajnoczi
On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 07/19/11 16:24, Eric Blake wrote:
 [adding the libvir-list]
 On 07/19/2011 08:09 AM, Jes Sorensen wrote:
 Urgh, libvirt parsing image files is really unfortunate, it really
 doesn't give me warm fuzzy feelings :( libvirt really should not know
 about internals of image formats.

 But even if you add new features to qemu to avoid needing this in the
 future, it doesn't change the past - libvirt will always have to know
 how to parse image files understood by older qemu, and so as long as
 libvirt already knows how to do that parsing, we might as well take
 advantage of it.

 What has been done here in the past is plain wrong. Continuing to do it
 isn't the right thing to do here.

 Besides, I feel that having a well-documented file format, so that
 independent applications can both parse the same file with the same
 semantics by obeying the file format specification, is a good design goal.

 We all know that documentation is rarely uptodate, new features may not
 get added and libvirt will never be able to keep up. The driver for a
 file format belongs in QEMU and nowhere else.

It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers.  Having both QEMU and libvirt
know the internals of image files is such a multi-dependency.  If I
want to add a new format or change an existing format I have to touch
both layers.

For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename - libvirt response: fd) and do all
the image format parsing in QEMU.

Stefan



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Anthony Liguori

On 07/19/2011 09:30 AM, Jes Sorensen wrote:

On 07/19/11 16:24, Eric Blake wrote:

[adding the libvir-list]
On 07/19/2011 08:09 AM, Jes Sorensen wrote:

Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.


But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.


What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.


Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.


We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.



It would be nice if libvirt had a way to pass fds for every disk and
backing file up front; then, SELinux can work around the lack of NFS
per-file labelling by blocking open() in qemu.  In fact, this has
already been proposed:


A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.


Yes, that could probably be made to work with libvirt.


I am a little frustrated this approach wasn't taken up front instead of
the evil hack of having libvirt attempt to parse image files.


If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.


As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior.  The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff.  Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.


Well leaving things at status quo is not making it worse, it just leaves
an evil in place.


NFS and SELinux is a fundamental problem with SELinux and NFS.  We can 
piss and moan as much as we want about it but it's reality.  SELinux 
fundamentally requires extended attributes.  By the time NFS adds 
extended attribute support, we'll all be flying around in hover cars.


As terrible as NFS is, people use it all of the time.

It would be nice if libvirt had the ability to make better use of DAC to 
support isolation.  The fact that MAC is the only way you can do 
isolation between guests is pretty unfortunate.  If I could assign 
specific UIDs to a guest and use that to enforce isolation, it would go 
a long ways to solving this problem.


Regards,

Anthony Liguori



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Daniel P. Berrange
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
 On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote:
  On 07/19/11 16:24, Eric Blake wrote:
  [adding the libvir-list]
  On 07/19/2011 08:09 AM, Jes Sorensen wrote:
  Urgh, libvirt parsing image files is really unfortunate, it really
  doesn't give me warm fuzzy feelings :( libvirt really should not know
  about internals of image formats.
 
  But even if you add new features to qemu to avoid needing this in the
  future, it doesn't change the past - libvirt will always have to know
  how to parse image files understood by older qemu, and so as long as
  libvirt already knows how to do that parsing, we might as well take
  advantage of it.
 
  What has been done here in the past is plain wrong. Continuing to do it
  isn't the right thing to do here.
 
  Besides, I feel that having a well-documented file format, so that
  independent applications can both parse the same file with the same
  semantics by obeying the file format specification, is a good design goal.
 
  We all know that documentation is rarely uptodate, new features may not
  get added and libvirt will never be able to keep up. The driver for a
  file format belongs in QEMU and nowhere else.
 
 It should be a goal to avoid dependencies in multiple layers of the
 stack because it becomes are to add new features - they require
 coordinated changes in multiple layers.  Having both QEMU and libvirt
 know the internals of image files is such a multi-dependency.  If I
 want to add a new format or change an existing format I have to touch
 both layers.
 
 For fd-passing perhaps we have an opportunity to use a callback
 mechanism (QEMU request: filename - libvirt response: fd) and do all
 the image format parsing in QEMU.

The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers  passing back open() requests to libvirt
is breaking the trust boundary.

NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient  reliable than invoking
qemu-img info  parsing the output. Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] live snapshot wiki updated

2011-07-19 Thread Daniel P. Berrange
On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
 On 07/19/11 16:24, Eric Blake wrote:
  [adding the libvir-list]
  On 07/19/2011 08:09 AM, Jes Sorensen wrote:
  Urgh, libvirt parsing image files is really unfortunate, it really
  doesn't give me warm fuzzy feelings :( libvirt really should not know
  about internals of image formats.
  
  But even if you add new features to qemu to avoid needing this in the
  future, it doesn't change the past - libvirt will always have to know
  how to parse image files understood by older qemu, and so as long as
  libvirt already knows how to do that parsing, we might as well take
  advantage of it.
 
 What has been done here in the past is plain wrong. Continuing to do it
 isn't the right thing to do here.
 
  Besides, I feel that having a well-documented file format, so that
  independent applications can both parse the same file with the same
  semantics by obeying the file format specification, is a good design goal.
 
 We all know that documentation is rarely uptodate, new features may not
 get added and libvirt will never be able to keep up. The driver for a
 file format belongs in QEMU and nowhere else.

This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] live snapshot wiki updated

2011-07-18 Thread Stefan Hajnoczi
On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 I have been updating the live snapshot wiki for qemu to try and cover
 the commands we will want for async snapshot handling too.

 http://wiki.qemu.org/Features/Snapshots

Regarding fd passing, do we even support SELinux today with backing files?

Stefan



[Qemu-devel] live snapshot wiki updated

2011-07-15 Thread Jes Sorensen
Hi,

I have been updating the live snapshot wiki for qemu to try and cover
the commands we will want for async snapshot handling too.

http://wiki.qemu.org/Features/Snapshots

Cheers,
Jes