Re: [Qemu-devel] live snapshot wiki updated
Am 21.07.2011 17:01, schrieb Stefan Hajnoczi: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional. Kevin
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 04:51 PM, Kevin Wolf wrote: The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] live snapshot wiki updated
Am 22.07.2011 09:36, schrieb Avi Kivity: On 07/20/2011 04:51 PM, Kevin Wolf wrote: The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor). Sure, you can always add ugly hacks, but it isn't the right solution that we want to use for all times. However, once we use it, it will show up in the external API and we'll never get rid of it again. Kevin
Re: [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf kw...@redhat.com wrote: Am 21.07.2011 17:01, schrieb Stefan Hajnoczi: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional. The cross-platform way would be an iSCSI target that understands image formats. But iSCSI requires copying when doing I/O and we can't pass through virtio-blk. I'm not sure I see the root privilege issue. Stefan
Re: [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl blauwir...@gmail.com wrote: On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all. NBD does not support flush (fdatasync). Therefore it only supports the slow cache=writethrough mode in a safe manner. Maybe NBD could still be used in networked setups as a secondary alternative. It would be neat to use virtio-blk as the interface because it can be passed through to the guest. The guest talks directly to the storage management service without going through QEMU. The trick is to do something like vhost: 1. An ioeventfd for virtqueue (guest-host) kicks 2. An irqfd for host-guest kicks 3. Shared memory for vring and zero-copy data access The storage management service provides a UNIX domain socket over which fds can be passed to set up the vhost-like virtio-blk interface. Moving the image format code into a separate program makes it possible to safely write to a backing file while VMs are using it because the storage service can be host-wide, not per-VM. For example, streaming a shared backing file over NFS while running VMs using copy-on-write images. If we ever want to do deduplication or other global operations, then this approach is nice too. To summarize: The storage service manages image files including creation, deletion, snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk interface and can pass it directly into the guest. libvirt uses the storage service API without needing to parse image files or keep track of backing file relationships. Excellent plan. If one day kernel provides builtin virtio-blk services which can be passed via libvirt and QEMU to the guest, we'll even have zero copy all the way.
Re: [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 12:11 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf kw...@redhat.com wrote: Am 21.07.2011 17:01, schrieb Stefan Hajnoczi: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional. The cross-platform way would be an iSCSI target that understands image formats. But iSCSI requires copying when doing I/O and we can't pass through virtio-blk. The guest could use iSCSI directly using the network interface without virtio-blk. This setup wouldn't give max performance in local use but it could also be useful in some networked setups and probably more useful than NBD.
Re: [Qemu-devel] live snapshot wiki updated
On Fri, Jul 22, 2011 at 11:11 AM, Kevin Wolf kw...@redhat.com wrote: Am 22.07.2011 09:36, schrieb Avi Kivity: On 07/20/2011 04:51 PM, Kevin Wolf wrote: The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor). Sure, you can always add ugly hacks, but it isn't the right solution that we want to use for all times. However, once we use it, it will show up in the external API and we'll never get rid of it again. Fully agree. This would also be a highly specific API for QCOW2 and similar formats.
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 12:00 PM, Blue Swirl wrote: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA? Why is qemu trying to access CA? Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA). Libvirt is already perfectly capable of tracking all files that qemu might need to access, and whether it is qemu or libvirt that does the open() of those files, we can still have libvirt validate whether each request for a file makes sense given the context of all previous files in use from the time the qemu command line was invoked and across all monitor commands in the meantime. On non-NFS solutions, where every file can have a SELinux label, then the security is then present by merely having libvirt relabel all such files to a unique label for that particular qemu process, and SELinux merely enforces that qemu cannot open() anything but what libvirt has already labeled. And since libvirt already knows which files to label in the non-NFS scenario, it already knows which fds to pass in the NFS scenario, at which point the ability to prevent qemu from open()ing an NFS file is a security enhancement. Your question about qemu wanting to use CA is thus answered independently of whether the fd management solution is solved by libvirt handing an fd for CA to qemu prior to any monitor command where qemu will then need to use CA, or whether qemu is taught to asynchronously ask libvirt to open an fd for CA on qemu's behalf. The answer is that libvirt already tracks whether qemu should access CA, and just needs a way to enforce that knowledge. The enforcement already exists for non-NFS via SELinux labels, and the proposal to add fd handling will expand that enforcement to also cover NFS. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf kw...@redhat.com wrote: Am 20.07.2011 15:25, schrieb Jes Sorensen: On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though.
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 8:41 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 11:20 AM, Blue Swirl wrote: There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA? That is, as long as libvirt can already accurately read the chain of backing files from any starting point, then it can hand that entire chain of backing files (whether by the topmost file name as it does now, or whether by a series of fds as is being proposed) to qemu. This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though. The monitor interface was not designed to throw the read()/write()/seek() burden back on libvirt, and indeed that would kill performance so it is a non-starter idea. All we need for security is the open() burden to be shifted out of qemu and into libvirt. Obviously the interface should be faster than monitor, for example a pair of sockets with some efficient protocol. Monitor could still be used to set up these.
Re: [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Stefan
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/2011 11:47 AM, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. How wrong would it be to call out to qemu-img to handle this instead? Seems like a more stable interface (assuming the output of `qemu-img info` is treated as an API of sorts, or perhaps some other output mode is added to qemu-img that is considered stable). Daniel
Re: [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 11:25 AM, Kevin Wolf kw...@redhat.com wrote: Am 20.07.2011 19:20, schrieb Blue Swirl: On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf kw...@redhat.com wrote: Am 20.07.2011 15:25, schrieb Jes Sorensen: On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, qemu shouldn't ask for anything. libvirt shouldn't give it a filename in the first place. It should do something like this: { execute: blockdev_add, arguments= { driver: fd, fd: 4, backing-file: { driver: fd, fd: 5 } }} And then qemu doesn't even have a reason to know that there is something called CA. Yes, that's better. but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though. I like your humour. :-) Well, for some applications, security is more important than performance or convenience.
Re: [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 11:07 AM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/20/11 21:51, Blue Swirl wrote: And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of /path/in/storage, QEMU could present the fd and the backin path by requesting something like fd:12,/path/in/storage. The next file in chain /path2/file would be fd:12,fd:34,/path2/file. Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc. Rather than trying to do this by mangling files on the disk, I would suggest we allow registering a call-back open function, which calls back into libvirt and requests it to open a given file. It can then do all it's security foo to decide whether or not to allow the file to be open. Just to clarify: I was not proposing any mangling of the files. This is relatively clean and avoids the mess of relying on outside processes messing about in the images. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all.
Re: [Qemu-devel] live snapshot wiki updated
On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl blauwir...@gmail.com wrote: On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake ebl...@redhat.com wrote: Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. The backing file information changes when image streaming completes. Before: fedora.img - my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all. NBD does not support flush (fdatasync). Therefore it only supports the slow cache=writethrough mode in a safe manner. It would be neat to use virtio-blk as the interface because it can be passed through to the guest. The guest talks directly to the storage management service without going through QEMU. The trick is to do something like vhost: 1. An ioeventfd for virtqueue (guest-host) kicks 2. An irqfd for host-guest kicks 3. Shared memory for vring and zero-copy data access The storage management service provides a UNIX domain socket over which fds can be passed to set up the vhost-like virtio-blk interface. Moving the image format code into a separate program makes it possible to safely write to a backing file while VMs are using it because the storage service can be host-wide, not per-VM. For example, streaming a shared backing file over NFS while running VMs using copy-on-write images. If we ever want to do deduplication or other global operations, then this approach is nice too. To summarize: The storage service manages image files including creation, deletion, snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk interface and can pass it directly into the guest. libvirt uses the storage service API without needing to parse image files or keep track of backing file relationships. Stefan
Re: [Qemu-devel] live snapshot wiki updated
Daniel P. Berrange berra...@redhat.com writes: On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers. For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. Exactly. The block drivers form a tree. Each driver opens its children. For convenience, some of them have information on how to open their children encoded in their image (e.g. COW backing images). Others receive it as configuration, encoded in their filename string (e.g. blkdebug). Thus, we have direct control only over the root block driver. We can pass it fds easily. We control the non-root block drivers only indirectly, through the block drivers along the path from the root. We don't have a way to pass them fds. If we had a way to construct the tree bottom-up, we could directly control all the block drivers. Requires knowing the number of children, and extracting child configuration from images where applicable. [...]
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 18:46, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient reliable than invoking qemu-img info parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. Even having a library for libvirt to link against is suboptimal here. Two processes shouldn't be fighting over the internals of metadata, the ownership of the metadata belongs solely with QEMU. In addition you have the constant issue of dependencies there, hence if QEMU is updated and it provides a newer block format library, it may prevent libvirt from running forcing an update of libvirt as well. That is not acceptable for development. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 18:14, Anthony Liguori wrote: As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars. As terrible as NFS is, people use it all of the time. It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem. Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 18:47, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. There is no reason for libvirt or any external process to mess about with the internals of image files. You have the same problem if the image file is encrypted. Jes
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 10:26:49AM +0200, Jes Sorensen wrote: On 07/19/11 18:47, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. There is no reason for libvirt or any external process to mess about with the internals of image files. You have the same problem if the image file is encrypted. Just repeating libvirt doesn't need todo this many times doesn't make it true. I have described why we need to read the disk image to determine its backing files ahead of QEMU being launched quite clearly. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote: On 07/19/11 18:46, Daniel P. Berrange wrote: On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged. To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient reliable than invoking qemu-img info parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. Even having a library for libvirt to link against is suboptimal here. Two processes shouldn't be fighting over the internals of metadata, the ownership of the metadata belongs solely with QEMU. In addition you have the constant issue of dependencies there, hence if QEMU is updated and it provides a newer block format library, it may prevent libvirt from running forcing an update of libvirt as well. That is not acceptable for development. We're not fighting over the internals of metadata. We just need to know ahead of launching QEMU, what backing files an image has what format they are in. We do that by reading at the metadata headers of the disk images. We never attempt to write to the disk images. Either someone provides a library todo that, or we write the probing code for each file format in libvirt. Currently we have the latter. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] live snapshot wiki updated
Am 19.07.2011 18:46, schrieb Daniel P. Berrange: On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers. For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient reliable than invoking qemu-img info parsing the output. What's the real problem with this approach? Parsing the data meant for humans, from an interface that is potentially unstable? If this is it, it should be easy enough to add a JSON output mode to qemu-img info. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. I'm afraid the block drivers have too many dependencies on the qemu core for this to be an option without investing a lot of effort. Kevin
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 11:50:53AM +0200, Kevin Wolf wrote: Am 19.07.2011 18:46, schrieb Daniel P. Berrange: On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers. For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient reliable than invoking qemu-img info parsing the output. What's the real problem with this approach? Parsing the data meant for humans, from an interface that is potentially unstable? If this is it, it should be easy enough to add a JSON output mode to qemu-img info. It is a really heavyweight solution to have to spawn qemu-img every time we need to access this data, when it can be done with a trivial open+read+close sequence. In addition the output data format is not entirely pleasant for machine reading (some fields only have data rounded up to MB, not the raw byte count). Finally, we also wanted to be able to extract some basic metdata about disk image formats on non-QEMU hosts, for our storage management APIs which are used on Xen or VMWare hosts where many of these same disk image formats are also used. A JSON output mode would be helpful, but unfortunately can't really address the other issues. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. I'm afraid the block drivers have too many dependencies on the qemu core for this to be an option without investing a lot of effort. That's why I sort of think there is value in having someone provide a standalone library API for querying some core set of block format metadata. QEMU is but one project with virtual disk formats, there are plenty of others out there in existance, so while reusing QEMU block code would be nice, it isn't leading to any significant reduction in copies of block format parsing code amongst all the virt projects in existance. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] live snapshot wiki updated
Am 20.07.2011 10:25, schrieb Jes Sorensen: On 07/19/11 18:14, Anthony Liguori wrote: As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars. As terrible as NFS is, people use it all of the time. It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem. Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. Kevin
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/2011 12:14 PM, Anthony Liguori wrote: On 07/19/2011 09:30 AM, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. Yes, that could probably be made to work with libvirt. I am a little frustrated this approach wasn't taken up front instead of the evil hack of having libvirt attempt to parse image files. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars. As terrible as NFS is, people use it all of the time. It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem. Just as a reminder: with DAC, if a guest is compromised and somehow escalates to QEMU, it could disable its isolation (ie, by setting their own image files world readable). I guess we shouldn't try to fix the DAC model, but fix what's preventing us from fully using MAC, even though it's outside of QEMU. CR. Regards, Anthony Liguori
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? Jes
Re: [Qemu-devel] live snapshot wiki updated
Am 20.07.2011 15:25, schrieb Jes Sorensen: On 07/20/11 12:01, Kevin Wolf wrote: Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least interesting... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. Kevin
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 08:50 AM, Cleber Rosa wrote: Just as a reminder: with DAC, if a guest is compromised and somehow escalates to QEMU, it could disable its isolation (ie, by setting their own image files world readable). I guess we shouldn't try to fix the DAC model, but fix what's preventing us from fully using MAC, even though it's outside of QEMU. I don't see how a guest making its data world readable is a fundamental problem. DAC is a fundamental part of the Unix design and is something that administrators understand very well. I completely understand the value of MAC but to argue that we shouldn't present DAC as an option I think is fundamentally wrong. Regards, Anthony Liguori CR. Regards, Anthony Liguori
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 07:25 AM, Jes Sorensen wrote: I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/2011 11:47 AM, Daniel P. Berrange wrote: This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. I'm in 100% agreement that we need to provide the equivalent of a libblockformat.so down the road. But the block layer needs some work before we can support a stable interface. Regards, Anthony Liguori Daniel
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 11:27 AM, Blue Swirl wrote: We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening. Won't work. Well, it might work within the context of a single qemu process. But when that process ends, then libvirt would have to touch up the qcow2 headers of that file to replace the /etc/shadow name with the real backing file name, otherwise, the next time you restart qemu-img or a new qemu guest using the same image, the information has been lost, since the fd has been closed in the meantime. We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu. Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up. And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 4:46 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 07:25 AM, Jes Sorensen wrote: I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening.
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 10:34 AM, Anthony Liguori wrote: On 07/20/2011 08:50 AM, Cleber Rosa wrote: Just as a reminder: with DAC, if a guest is compromised and somehow escalates to QEMU, it could disable its isolation (ie, by setting their own image files world readable). I guess we shouldn't try to fix the DAC model, but fix what's preventing us from fully using MAC, even though it's outside of QEMU. I don't see how a guest making its data world readable is a fundamental problem. Well, if we're discussing security models and how to provide the best isolation we can to VMs/QEMU instances, then a VM being able to read (or even write) data of another VM *is* a fundamental problem. setting their own imagine files world readable is just one example of how that could be accomplished. DAC is a fundamental part of the Unix design and is something that administrators understand very well. That's is a true sentence, but it does not make DAC the most appropriate solution here. I completely understand the value of MAC but to argue that we shouldn't present DAC as an option I think is fundamentally wrong. I never said, and really don't think we shouldn't provide other security options/models, this is actually part of the well accepted security in multiple layers strategy. I did assume, though, we were aiming for the best isolation level, and that is definitely MAC. DAC may indeed be good enough for some, but definitely not good enough for many others. CR. Regards, Anthony Liguori CR. Regards, Anthony Liguori
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 11:20 AM, Blue Swirl wrote: There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. That is, as long as libvirt can already accurately read the chain of backing files from any starting point, then it can hand that entire chain of backing files (whether by the topmost file name as it does now, or whether by a series of fds as is being proposed) to qemu. This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though. The monitor interface was not designed to throw the read()/write()/seek() burden back on libvirt, and indeed that would kill performance so it is a non-starter idea. All we need for security is the open() burden to be shifted out of qemu and into libvirt. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 8:47 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 11:27 AM, Blue Swirl wrote: We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening. Won't work. Well, it might work within the context of a single qemu process. But when that process ends, then libvirt would have to touch up the qcow2 headers of that file to replace the /etc/shadow name with the real backing file name, otherwise, the next time you restart qemu-img or a new qemu guest using the same image, the information has been lost, since the fd has been closed in the meantime. How would libvirt know to do this touch up? We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu. Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up. Yes, I'm not worried about QEMU getting confused by libvirt. And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of /path/in/storage, QEMU could present the fd and the backin path by requesting something like fd:12,/path/in/storage. The next file in chain /path2/file would be fd:12,fd:34,/path2/file. Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc.
Re: [Qemu-devel] live snapshot wiki updated
On Wed, Jul 20, 2011 at 9:17 PM, Eric Blake ebl...@redhat.com wrote: On 07/20/2011 12:00 PM, Blue Swirl wrote: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA? Why is qemu trying to access CA? Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA). So no new backing files can be introduced by QEMU after it has started without libvirt knowing it? Libvirt is already perfectly capable of tracking all files that qemu might need to access, and whether it is qemu or libvirt that does the open() of those files, we can still have libvirt validate whether each request for a file makes sense given the context of all previous files in use from the time the qemu command line was invoked and across all monitor commands in the meantime. On non-NFS solutions, where every file can have a SELinux label, then the security is then present by merely having libvirt relabel all such files to a unique label for that particular qemu process, and SELinux merely enforces that qemu cannot open() anything but what libvirt has already labeled. And since libvirt already knows which files to label in the non-NFS scenario, it already knows which fds to pass in the NFS scenario, at which point the ability to prevent qemu from open()ing an NFS file is a security enhancement. Your question about qemu wanting to use CA is thus answered independently of whether the fd management solution is solved by libvirt handing an fd for CA to qemu prior to any monitor command where qemu will then need to use CA, or whether qemu is taught to asynchronously ask libvirt to open an fd for CA on qemu's behalf. The answer is that libvirt already tracks whether qemu should access CA, and just needs a way to enforce that knowledge. The enforcement already exists for non-NFS via SELinux labels, and the proposal to add fd handling will expand that enforcement to also cover NFS. OK. I think fds would be useful internally in a privilege separation mode in plain QEMU too.
Re: [Qemu-devel] live snapshot wiki updated
On 07/20/2011 02:01 PM, Blue Swirl wrote: Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA). So no new backing files can be introduced by QEMU after it has started without libvirt knowing it? No, you missed my point. A new backing file can only be introduced by qemu after it has started by libvirt using a finite set of monitor commands. These include disk hotplug (libvirt adds to the list of files known to be accessed by qemu, by reading the image headers of the new disk to be hot-plugged prior to issuing the monitor command), by disk hot-unplug (libvirt revokes the access to the files making up that disk, which it remembers from before the disk was added), and snapshot_blkdev (libvirt is explicitly requesting a new qcow2 file with the old file as the backing image, so it knows the new relationship of files to be accessed by that block device). Since libvirt issued the monitor commands, libvirt always knows which files qemu should be trying to access when servicing those block devices to the guest. OK. I think fds would be useful internally in a privilege separation mode in plain QEMU too. Yes, there's more than one reason to add fd support to all possible situations where qemu is currently resorting to open(). -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On 07/18/11 16:08, Stefan Hajnoczi wrote: On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Not sure I understand what you mean. The current code should be happy to take an existing file or a raw device for the snapshot. Jes
Re: [Qemu-devel] live snapshot wiki updated
On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/18/11 16:08, Stefan Hajnoczi wrote: On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Not sure I understand what you mean. The current code should be happy to take an existing file or a raw device for the snapshot. Sorry, I was off on a tangent. I think today QEMU does not support opening image files with a backing file purely using file descriptors. We currently require the ability to open files. Stefan
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 15:23, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/18/11 16:08, Stefan Hajnoczi wrote: On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Not sure I understand what you mean. The current code should be happy to take an existing file or a raw device for the snapshot. Sorry, I was off on a tangent. I think today QEMU does not support opening image files with a backing file purely using file descriptors. We currently require the ability to open files. I see what you mean - I don't actually know how that would work, since the backing file specified in the front image will be a file name. Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)? Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/2011 07:27 AM, Jes Sorensen wrote: On 07/19/11 15:23, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 8:24 AM, Jes Sorensenjes.soren...@redhat.com wrote: On 07/18/11 16:08, Stefan Hajnoczi wrote: On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensenjes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Not sure I understand what you mean. The current code should be happy to take an existing file or a raw device for the snapshot. Sorry, I was off on a tangent. I think today QEMU does not support opening image files with a backing file purely using file descriptors. We currently require the ability to open files. I see what you mean - I don't actually know how that would work, since the backing file specified in the front image will be a file name. Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)? Before starting qemu, libvirt first parses all the image files, to see if any of them have backing images. For every qcow2 or qed image with a backing file, libvirt sets the SELinux context of both the qcow2 image and its backing file so that qemu will be able to successfully open() them. But if any of those files reside on NFS, then it is not possible to label individual files, so it requires setting the SELinux bool virt_use_nfs, which thus gives qemu the power to open() arbitrary files on NFS, and you've lost security. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02072.html http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01992.html That thread mentioned both a command-line syntax for passing in fds for backing files, as well as an extension to the getfd monitor command to allow association of a runtime fd with a filename. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 15:58, Eric Blake wrote: On 07/19/2011 07:27 AM, Jes Sorensen wrote: Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)? Before starting qemu, libvirt first parses all the image files, to see if any of them have backing images. For every qcow2 or qed image with a backing file, libvirt sets the SELinux context of both the qcow2 image and its backing file so that qemu will be able to successfully open() them. But if any of those files reside on NFS, then it is not possible to label individual files, so it requires setting the SELinux bool virt_use_nfs, which thus gives qemu the power to open() arbitrary files on NFS, and you've lost security. Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. Yes, that could probably be made to work with libvirt. I am a little frustrated this approach wasn't taken up front instead of the evil hack of having libvirt attempt to parse image files. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. Just because it is currently not as secure to mix NFS shared storage with backing files doesn't stop some people from wanting to do it [in fact, that's my current development setup - I use qcow2 images on NFS shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. This discussion is about adding enhancements that make SELinux even more powerful when using NFS shared storage, by adding fd passing (whether libvirt parses in advance, or whether qemu raises an event and requires feedback from libvirt), and not about crippling the existing capability to use the virt_use_nfs selinux bool. I do not believe we should try and add extra interfaces to support something which is inherently broken. This really boils down to whether we should support fd passing for snapshots in the first place. If it is to support the broken setup of libvirt parsing image files, then I am totally against it, if we work on a proper solution that involves this in some way, then we can discuss it. Cheers, Jes
Re: [Qemu-devel] live snapshot wiki updated
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: On 07/19/11 15:58, Eric Blake wrote: On 07/19/2011 07:27 AM, Jes Sorensen wrote: Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)? Before starting qemu, libvirt first parses all the image files, to see if any of them have backing images. For every qcow2 or qed image with a backing file, libvirt sets the SELinux context of both the qcow2 image and its backing file so that qemu will be able to successfully open() them. But if any of those files reside on NFS, then it is not possible to label individual files, so it requires setting the SELinux bool virt_use_nfs, which thus gives qemu the power to open() arbitrary files on NFS, and you've lost security. Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. Yes, that could probably be made to work with libvirt. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Just because it is currently not as secure to mix NFS shared storage with backing files doesn't stop some people from wanting to do it [in fact, that's my current development setup - I use qcow2 images on NFS shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. This discussion is about adding enhancements that make SELinux even more powerful when using NFS shared storage, by adding fd passing (whether libvirt parses in advance, or whether qemu raises an event and requires feedback from libvirt), and not about crippling the existing capability to use the virt_use_nfs selinux bool. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] live snapshot wiki updated
On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers. For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. Stefan
Re: [Qemu-devel] live snapshot wiki updated
On 07/19/2011 09:30 AM, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed: A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back. Yes, that could probably be made to work with libvirt. I am a little frustrated this approach wasn't taken up front instead of the evil hack of having libvirt attempt to parse image files. If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup. As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Well leaving things at status quo is not making it worse, it just leaves an evil in place. NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars. As terrible as NFS is, people use it all of the time. It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem. Regards, Anthony Liguori
Re: [Qemu-devel] live snapshot wiki updated
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote: On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen jes.soren...@redhat.com wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers. For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename - libvirt response: fd) and do all the image format parsing in QEMU. The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers passing back open() requests to libvirt is breaking the trust boundary. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient reliable than invoking qemu-img info parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] live snapshot wiki updated
On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote: On 07/19/11 16:24, Eric Blake wrote: [adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote: Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats. But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal. We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else. This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] live snapshot wiki updated
On Fri, Jul 15, 2011 at 3:58 PM, Jes Sorensen jes.soren...@redhat.com wrote: I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Regarding fd passing, do we even support SELinux today with backing files? Stefan
[Qemu-devel] live snapshot wiki updated
Hi, I have been updating the live snapshot wiki for qemu to try and cover the commands we will want for async snapshot handling too. http://wiki.qemu.org/Features/Snapshots Cheers, Jes