Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/17/2012 09:14 AM, Eric Blake wrote: On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote: The -open-hook-fd approach allows QEMU to support file descriptor passing without changing -drive. It also supports snapshot_blkdev and other commands By the way, How will it support them? The problem with snapshot_blkdev is that closing a file and opening a new file cannot be done by the QEMU process when an SELinux policy is in place to prevent opening files. snapshot_blkdev can take an fd:name instead of a /path/to/file for the file to open, in which case libvirt can pass in the named fd _prior_ to the snapshot_blkdev using the 'getfd' monitor command. The -open-hook-fd approach works even when the QEMU process is not allowed to open files since file descriptor passing over a UNIX domain socket is used to open files on behalf of QEMU. The -open-hook-fd approach would indeed allow snapshot_blokdev to ask for the fd after the fact, but it's much more painful. Consider a case with a two-disk snapshot: with the fd:name approach, the sequence is: libvirt calls getfd:name1 over normal monitor qemu responds libvirt calls getfd:name2 over normal monitor qemu responds libvirt calls transaction around blockdev-snapshot-sync over normal monitor, using fd:name1 and fd:name2 qemu responds but with -open-hook-fd, the approach would be: libvirt calls transaction qemu calls open(file1) over hook libvirt responds qemu calls open(file2) over hook libvirt responds qemu responds to the original transaction The 'transaction' operation is thus blocked by the time it takes to do two intermediate opens over a second channel, which kind of defeats the purpose of making the transaction take effect with minimal guest downtime. How are you defining "guest down time"? It's important to note that code running in QEMU does not equate to guest visible down time unless QEMU does an explicit vm_stop() which is not happening here. Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute while QEMU is holding it. If your concern is qemu_mutex being held while waiting for libvirt, it would be fairly easy to implement a qemu_open_async() that dropped allowed dropping back to the main loop and then calling a callback when the open completes. It would be pretty trivial to convert qmp_transaction to use such a command. But this is all speculative. There's no reason to believe that an RPC would have a noticable guest visible latency unless you assume there's lot contention. I would strongly suspect that the bdrv_flush() is going to be a much greater source of lock contention than the RPC would be. An RPC is only bound by scheduler latency whereas synchronous disk I/O is bound spinning a platter. And libvirt code becomes a lot trickier to deal with the fact that two channels are in use, and that the channel that issued the 'transaction' command must block while the other channel for handling hooks must be responsive. All libvirt needs to do is listen on a socket and delegate access according to a white list. Whatever is providing fd's needs to have no knowledge of anythign other than what the guest is allowed to access which shouldn't depend on an executing command. Regards, Anthony Liguori I'm really disliking the hook-fd approach, when a better solution is to make use of 'getfd' in advance of any operation that will need to open new fds.
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 07/09/2012 02:00 PM, Anthony Liguori wrote: >> with the fd:name approach, the sequence is: >> >> libvirt calls getfd:name1 over normal monitor >> qemu responds >> libvirt calls getfd:name2 over normal monitor >> qemu responds >> libvirt calls transaction around blockdev-snapshot-sync over normal >> monitor, using fd:name1 and fd:name2 >> qemu responds This general layout is true whether we rewrite all commands to understand fd:nnn (proposal 1) or whether we add new magic parsing (/dev/fd/nnn of proposal 3, or even /dev/fdset/nnn of proposal 5), all as called out in these messages: https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00227.html https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01098.html >> >> but with -open-hook-fd, the approach would be: >> >> libvirt calls transaction >> qemu calls open(file1) over hook >> libvirt responds >> qemu calls open(file2) over hook >> libvirt responds >> qemu responds to the original transaction whereas this approach is quite different in semantics, but may indeed be easier for qemu to implement, at the expense of some more complexity on the part of libvirt. At the high level, I think both approaches have one thing in common: by refactoring all qemu code to go through qemu_open(), we can then implement our desired complexity (whether fd:nn, /dev/fd/nnn, /dev/fdset/nnn, or some other magic name parsing; or whether it is an rpc call over a second socket in parallel to the monitor socket) in just one location. Likewise, both approaches have to deal with libvirtd restarts (magic name parsing by changing an 'inuse' flag when the monitor detects EOF; rpc passing by failing a qemu_open() when the rpc socket detects EOF). >> >> The 'transaction' operation is thus blocked by the time it takes to do >> two intermediate opens over a second channel, which kind of defeats the >> purpose of making the transaction take effect with minimal guest >> downtime. > > How are you defining "guest down time"? > > It's important to note that code running in QEMU does not equate to > guest visible down time unless QEMU does an explicit vm_stop() which is > not happening here. > > Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute > while QEMU is holding it. > > If your concern is qemu_mutex being held while waiting for libvirt, it > would be fairly easy to implement a qemu_open_async() that dropped > allowed dropping back to the main loop and then calling a callback when > the open completes. > > It would be pretty trivial to convert qmp_transaction to use such a > command. In other words, remembering that transactions are divided into phases: phase 1 - prepare: obtain all needed fds (whether by pre-opening them via 'pass-fd' or other new 'getfd' relative, or whether by RPC calls); no guest downtime, and with cleanup that avoids any leaks on any failures phase 2 - commit: flush all devices and actually make the changes in qemu state to use the fds obtained in phase 1 and where the guest downtime (if any) is more likely due to flushing changes in phase 2 > > But this is all speculative. There's no reason to believe that an RPC > would have a noticable guest visible latency unless you assume there's > lot contention. I would strongly suspect that the bdrv_flush() is going > to be a much greater source of lock contention than the RPC would be. > An RPC is only bound by scheduler latency whereas synchronous disk I/O > is bound spinning a platter. > >> And libvirt code becomes a lot trickier to deal with the fact >> that two channels are in use, and that the channel that issued the >> 'transaction' command must block while the other channel for handling >> hooks must be responsive. > > All libvirt needs to do is listen on a socket and delegate access > according to a white list. Whatever is providing fd's needs to have no > knowledge of anythign other than what the guest is allowed to access > which shouldn't depend on an executing command. That's not quite accurate. What the guest is allowed to access should indeed change depending on the executing command. That is, if I start a guest with: base <- delta then I only want to permet O_RDONLY access to base but O_RDWR access to delta. If I then call 'blockdev-snapshot-sync', I want to change to the situation: base <- delta <- snap and give O_RDWR permissions to snap; it would also be nice if qemu attempts to reopen delta with O_RDONLY permissions (although from a trust perspective, libvirt must assume that delta is still O_RDWR because qemu may have been compromised and lie about the tightening of permissions); at any rate, depending on SELinux capabilities of the file, libvirt may be able to enforce no further writes to 'delta' by toggling a SELinux label (obviously, this should only be done after 'blockdev-snapshot-sync' completes). On the other hand, the user could decide to do a 'block-commit', to squash things into: base where base is now O_RDWR. But libvirt doesn't w
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 07/09/2012 03:29 PM, Eric Blake wrote: On 07/09/2012 02:00 PM, Anthony Liguori wrote: with the fd:name approach, the sequence is: libvirt calls getfd:name1 over normal monitor qemu responds libvirt calls getfd:name2 over normal monitor qemu responds libvirt calls transaction around blockdev-snapshot-sync over normal monitor, using fd:name1 and fd:name2 qemu responds This general layout is true whether we rewrite all commands to understand fd:nnn (proposal 1) or whether we add new magic parsing (/dev/fd/nnn of proposal 3, or even /dev/fdset/nnn of proposal 5), all as called out in these messages: https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00227.html https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01098.html but with -open-hook-fd, the approach would be: libvirt calls transaction qemu calls open(file1) over hook libvirt responds qemu calls open(file2) over hook libvirt responds qemu responds to the original transaction whereas this approach is quite different in semantics, but may indeed be easier for qemu to implement, at the expense of some more complexity on the part of libvirt. At the high level, I think both approaches have one thing in common: by refactoring all qemu code to go through qemu_open(), we can then implement our desired complexity (whether fd:nn, /dev/fd/nnn, /dev/fdset/nnn, or some other magic name parsing; or whether it is an rpc call over a second socket in parallel to the monitor socket) in just one location. Likewise, both approaches have to deal with libvirtd restarts (magic name parsing by changing an 'inuse' flag when the monitor detects EOF; rpc passing by failing a qemu_open() when the rpc socket detects EOF). Ack. The 'transaction' operation is thus blocked by the time it takes to do two intermediate opens over a second channel, which kind of defeats the purpose of making the transaction take effect with minimal guest downtime. How are you defining "guest down time"? It's important to note that code running in QEMU does not equate to guest visible down time unless QEMU does an explicit vm_stop() which is not happening here. Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute while QEMU is holding it. If your concern is qemu_mutex being held while waiting for libvirt, it would be fairly easy to implement a qemu_open_async() that dropped allowed dropping back to the main loop and then calling a callback when the open completes. It would be pretty trivial to convert qmp_transaction to use such a command. In other words, remembering that transactions are divided into phases: phase 1 - prepare: obtain all needed fds (whether by pre-opening them via 'pass-fd' or other new 'getfd' relative, or whether by RPC calls); no guest downtime, and with cleanup that avoids any leaks on any failures phase 2 - commit: flush all devices and actually make the changes in qemu state to use the fds obtained in phase 1 and where the guest downtime (if any) is more likely due to flushing changes in phase 2 Not quite. A synchronous flush can cause lock contention. We need to separate out the problem of lock contention from guest down time. Also, there's no obvious need to move the flushes before opens. The main issue is that we use qemu_mutex to effectively create a write queue. You can imagine a simple write queueing mechanism that would obviate the need need for this such that we could flush, queue upcoming writes, and drop qemu_mutex to sleep waiting for libvirt to send us our fds. But this is all speculative. There's no reason to believe that an RPC would have a noticable guest visible latency unless you assume there's lot contention. I would strongly suspect that the bdrv_flush() is going to be a much greater source of lock contention than the RPC would be. An RPC is only bound by scheduler latency whereas synchronous disk I/O is bound spinning a platter. And libvirt code becomes a lot trickier to deal with the fact that two channels are in use, and that the channel that issued the 'transaction' command must block while the other channel for handling hooks must be responsive. All libvirt needs to do is listen on a socket and delegate access according to a white list. Whatever is providing fd's needs to have no knowledge of anythign other than what the guest is allowed to access which shouldn't depend on an executing command. That's not quite accurate. What the guest is allowed to access should indeed change depending on the executing command. That is, if I start a guest with: I should have spoke more clearly. libvirt may change the white list for various reasons dynamically. But there shouldn't be a direct dependency on whatever is serving up fd's and whatever is changing the white list. Basically, you just need a shared hash table for each guest. It should be quite simple. Maybe the only reason that I'm still leaning towards a 'pass-fd' solution instead of a hook fd solution is that
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/01/2012 02:25 PM, Anthony Liguori wrote: > Thanks for sending this out Stefan. Indeed. >> This series adds the -open-hook-fd command-line option. Whenever QEMU >> needs to >> open an image file it sends a request over the given UNIX domain >> socket. The >> response includes the file descriptor or an errno on failure. Please >> see the >> patches for details on the protocol. >> >> The -open-hook-fd approach allows QEMU to support file descriptor passing >> without changing -drive. It also supports snapshot_blkdev and other >> commands >> that re-open image files. >> >> Anthony Liguori wrote most of these patches. I >> added a >> demo -open-hook-fd server and added some small fixes. Since Anthony is >> traveling right now I'm sending the RFC for discussion. > > What I like about this approach is that it's useful outside the block > layer and is conceptionally simple from a QEMU PoV. We simply delegate > open() to libvirt and let libvirt enforce whatever rules it wants. > > This is not meant to be an alternative to blockdev, but even with > blockdev, I think we still want to use a mechanism like this even with > blockdev. The overall series looks like it would be rather interesting. What sort of timing restrictions are there? For example, the proposed 'drive-reopen' command (probably now delegated to qemu 1.2) would mean that qemu would be calling back into libvirt in order to do the reopen. If libvirt takes its time in passing back an open fd, is it going to starve qemu from answering unrelated monitor commands in the meantime? I definitely want to make sure we avoid deadlock where libvirt is waiting on a monitor command, but the monitor command is waiting on libvirt to pass an fd. Is this also an opportunity to request whether a particular fd must be seekable vs. acceptable as a one-pass read or write, perhaps by whether the command is 1 (seekable open) or 2 (one-pass open)? For example, migration is one-pass (and therefore libvirt passes a pipe which is hooked up to a helper app that uses O_DIRECT), while block devices must be seekable. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/01/2012 03:56 PM, Eric Blake wrote: On 05/01/2012 02:25 PM, Anthony Liguori wrote: Thanks for sending this out Stefan. Indeed. This series adds the -open-hook-fd command-line option. Whenever QEMU needs to open an image file it sends a request over the given UNIX domain socket. The response includes the file descriptor or an errno on failure. Please see the patches for details on the protocol. The -open-hook-fd approach allows QEMU to support file descriptor passing without changing -drive. It also supports snapshot_blkdev and other commands that re-open image files. Anthony Liguori wrote most of these patches. I added a demo -open-hook-fd server and added some small fixes. Since Anthony is traveling right now I'm sending the RFC for discussion. What I like about this approach is that it's useful outside the block layer and is conceptionally simple from a QEMU PoV. We simply delegate open() to libvirt and let libvirt enforce whatever rules it wants. This is not meant to be an alternative to blockdev, but even with blockdev, I think we still want to use a mechanism like this even with blockdev. The overall series looks like it would be rather interesting. What sort of timing restrictions are there? For example, the proposed 'drive-reopen' command (probably now delegated to qemu 1.2) would mean that qemu would be calling back into libvirt in order to do the reopen. If libvirt takes its time in passing back an open fd, is it going to starve qemu from answering unrelated monitor commands in the meantime? s/libvirt/kernel/g and your concerns are equally valid. Doing open() should never be done in a path that could block things. There's always the possibility that we're on top of NFS and the open could timeout. For something like drive_reopen, we should use an asynchronous open() that dispatched the open() in the posix-aio thread pool. That's part of what's nice about this approach, we could still call file_open() in the posix-aio thread pool... I definitely want to make sure we avoid deadlock where libvirt is waiting on a monitor command, but the monitor command is waiting on libvirt to pass an fd. Is this also an opportunity to request whether a particular fd must be seekable vs. acceptable as a one-pass read or write, perhaps by whether the command is 1 (seekable open) or 2 (one-pass open)? I'm not really sure where the distinction lies... I want the RPC to behave exactly like open(). So if we're assuming that open() of a /dev/ file returns something that is ioctl()'able, then that's what libvirt should return. If we want to sort of do fd-transformation where a special protocol is used for things like ioctl, that's fine, but it ought to be a different mechanism (that's probably not nearly as generic). For example, migration is one-pass (and therefore libvirt passes a pipe which is hooked up to a helper app that uses O_DIRECT), while block devices must be seekable. But migration doesn't involve doing an open(). This is not a replacement for fd passing. This is a replacement for open() to make up for the facts that (1) some management tools like libvirt cannot isolate guests with DAC and (2) SELinux cannot be used to isolate guests across all file systems. I would really prefer that the kernel fix this problem for us, but from what I'm told, the problem lies in the NFS standards committee so short of forking the NFS protocol, there isn't much that the kernel can do. Regards, Anthony Liguori
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/01/2012 03:53 PM, Anthony Liguori wrote: >> I think (correct me if I'm wrong) libvirt should be aware of any file >> that qemu >> asks it to open. So from a security point of view, libvirt can prevent >> opening a >> file if it isn't affiliated with the guest. > > Right, libvirt can maintain a whitelist of files QEMU is allowed to open > (which is already has because it needs to label these files). Indeed. > The only > complexity is that it's not a straight strcmp(). The path needs to be > (carefully) broken into components with '.' and '..' handled > appropriately. But this shouldn't be that difficult to do. Libvirt would probably canonicalize path names, both when sticking them in the whitelist, and in validating the requests from qemu - agreed that it's not difficult. More importantly, libvirt needs to start tracking the backing chain of any qcow2 or qed file as part of the domain XML; and operations like 'block-stream' would update not only the chain, but also the whitelist. In the drive-reopen case, this means that libvirt would have to be careful when to change labeling - provide access to the new files before drive-reopen, then revoke access to files after drive-reopen completes. In other words, having the -open-hook-fd client pass a command to libvirt at the time it is closing an fd would help libvirt know when qemu has quit using a file, which might make it easier to revoke SELinux labels at that time. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/01/2012 05:15 PM, Eric Blake wrote: On 05/01/2012 03:53 PM, Anthony Liguori wrote: I think (correct me if I'm wrong) libvirt should be aware of any file that qemu asks it to open. So from a security point of view, libvirt can prevent opening a file if it isn't affiliated with the guest. Right, libvirt can maintain a whitelist of files QEMU is allowed to open (which is already has because it needs to label these files). Indeed. The only complexity is that it's not a straight strcmp(). The path needs to be (carefully) broken into components with '.' and '..' handled appropriately. But this shouldn't be that difficult to do. Libvirt would probably canonicalize path names, both when sticking them in the whitelist, and in validating the requests from qemu - agreed that it's not difficult. More importantly, libvirt needs to start tracking the backing chain of any qcow2 or qed file as part of the domain XML; and operations like 'block-stream' would update not only the chain, but also the whitelist. In the drive-reopen case, this means that libvirt would have to be careful when to change labeling Would you give QEMU open access or change the way you label to only allow read/write access? I think the later is probably the better approach. So presumably, you'll need to adjust the sVirt policy too... You'll need to detect if a file is on NFS too and figure out what the default label is that was given so you can build the rules correctly. Regards, Anthony Liguori
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Wed, May 02, 2012 at 10:20:17AM +0200, Kevin Wolf wrote: > Am 01.05.2012 22:25, schrieb Anthony Liguori: > > Thanks for sending this out Stefan. > > > > On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote: > >> Libvirt can take advantage of SELinux to restrict the QEMU process and > >> prevent > >> it from opening files that it should not have access to. This improves > >> security because it prevents the attacker from escaping the QEMU process if > >> they manage to gain control. > >> > >> NFS has been a pain point for SELinux because it does not support labels > >> (which > >> I believe are stored in extended attributes). In other words, it's not > >> possible to use SELinux goodness on QEMU when image files are located on > >> NFS. > >> Today we have to allow QEMU access to any file on the NFS export rather > >> than > >> restricting specifically to the image files that the guest requires. > >> > >> File descriptor passing is a solution to this problem and might also come > >> in > >> handy elsewhere. Libvirt or another external process chooses files which > >> QEMU > >> is allowed to access and provides just those file descriptors - QEMU cannot > >> open the files itself. > >> > >> This series adds the -open-hook-fd command-line option. Whenever QEMU > >> needs to > >> open an image file it sends a request over the given UNIX domain socket. > >> The > >> response includes the file descriptor or an errno on failure. Please see > >> the > >> patches for details on the protocol. > >> > >> The -open-hook-fd approach allows QEMU to support file descriptor passing > >> without changing -drive. It also supports snapshot_blkdev and other > >> commands > >> that re-open image files. > >> > >> Anthony Liguori wrote most of these patches. I > >> added a > >> demo -open-hook-fd server and added some small fixes. Since Anthony is > >> traveling right now I'm sending the RFC for discussion. > > > > What I like about this approach is that it's useful outside the block layer > > and > > is conceptionally simple from a QEMU PoV. We simply delegate open() to > > libvirt > > and let libvirt enforce whatever rules it wants. > > > > This is not meant to be an alternative to blockdev, but even with blockdev, > > I > > think we still want to use a mechanism like this even with blockdev. > > What does it provide on top? > > This doesn't look like something that I'd like a lot. qemu should be > able to continue to run no matter what the management tool does, whether > it responds to RPCs properly or whether it has crashed. You need a > really good use case for the RPC that cannot be covered otherwise in > order to justify this. Indeed, this solution breaks if you stop or restart libvirtd while QEMU is running. Restarting libvirt while QEMU is running is something we must support, since installing RPM updates will restart libvirtd and we cannot let guests die in this case. I would much prefer to see us be able to pass FDs in directly alongside the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be fixed so that you do not need to re-open FDs on the fly. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
Am 02.05.2012 10:53, schrieb Daniel P. Berrange: > On Wed, May 02, 2012 at 10:20:17AM +0200, Kevin Wolf wrote: >> Am 01.05.2012 22:25, schrieb Anthony Liguori: >>> Thanks for sending this out Stefan. >>> >>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote: Libvirt can take advantage of SELinux to restrict the QEMU process and prevent it from opening files that it should not have access to. This improves security because it prevents the attacker from escaping the QEMU process if they manage to gain control. NFS has been a pain point for SELinux because it does not support labels (which I believe are stored in extended attributes). In other words, it's not possible to use SELinux goodness on QEMU when image files are located on NFS. Today we have to allow QEMU access to any file on the NFS export rather than restricting specifically to the image files that the guest requires. File descriptor passing is a solution to this problem and might also come in handy elsewhere. Libvirt or another external process chooses files which QEMU is allowed to access and provides just those file descriptors - QEMU cannot open the files itself. This series adds the -open-hook-fd command-line option. Whenever QEMU needs to open an image file it sends a request over the given UNIX domain socket. The response includes the file descriptor or an errno on failure. Please see the patches for details on the protocol. The -open-hook-fd approach allows QEMU to support file descriptor passing without changing -drive. It also supports snapshot_blkdev and other commands that re-open image files. Anthony Liguori wrote most of these patches. I added a demo -open-hook-fd server and added some small fixes. Since Anthony is traveling right now I'm sending the RFC for discussion. >>> >>> What I like about this approach is that it's useful outside the block layer >>> and >>> is conceptionally simple from a QEMU PoV. We simply delegate open() to >>> libvirt >>> and let libvirt enforce whatever rules it wants. >>> >>> This is not meant to be an alternative to blockdev, but even with blockdev, >>> I >>> think we still want to use a mechanism like this even with blockdev. >> >> What does it provide on top? >> >> This doesn't look like something that I'd like a lot. qemu should be >> able to continue to run no matter what the management tool does, whether >> it responds to RPCs properly or whether it has crashed. You need a >> really good use case for the RPC that cannot be covered otherwise in >> order to justify this. > > Indeed, this solution breaks if you stop or restart libvirtd while > QEMU is running. Restarting libvirt while QEMU is running is something > we must support, since installing RPM updates will restart libvirtd > and we cannot let guests die in this case. > > I would much prefer to see us be able to pass FDs in directly alongside > the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be > fixed so that you do not need to re-open FDs on the fly. I agree, and this is what -blockdev would give us. Part of why I don't like the RFC (apart from RPCing the management tool being just wrong) is that once again it's trying to take shortcuts and only provide a hack for the urgent need instead of doing it properly and implementing -blockdev. I suspect that if we take something half-baked like this, we will keep being unhappy with the situation in the block layer, but it won't hurt enough any more to actually spend effort on it, so that we'll go another five years with it. Kevin
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Wed, May 02, 2012 at 11:45:26AM +0200, Kevin Wolf wrote: > Am 02.05.2012 10:53, schrieb Daniel P. Berrange: > > On Wed, May 02, 2012 at 10:20:17AM +0200, Kevin Wolf wrote: > >> Am 01.05.2012 22:25, schrieb Anthony Liguori: > >>> Thanks for sending this out Stefan. > >>> > >>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote: > Libvirt can take advantage of SELinux to restrict the QEMU process and > prevent > it from opening files that it should not have access to. This improves > security because it prevents the attacker from escaping the QEMU process > if > they manage to gain control. > > NFS has been a pain point for SELinux because it does not support labels > (which > I believe are stored in extended attributes). In other words, it's not > possible to use SELinux goodness on QEMU when image files are located on > NFS. > Today we have to allow QEMU access to any file on the NFS export rather > than > restricting specifically to the image files that the guest requires. > > File descriptor passing is a solution to this problem and might also > come in > handy elsewhere. Libvirt or another external process chooses files > which QEMU > is allowed to access and provides just those file descriptors - QEMU > cannot > open the files itself. > > This series adds the -open-hook-fd command-line option. Whenever QEMU > needs to > open an image file it sends a request over the given UNIX domain socket. > The > response includes the file descriptor or an errno on failure. Please > see the > patches for details on the protocol. > > The -open-hook-fd approach allows QEMU to support file descriptor passing > without changing -drive. It also supports snapshot_blkdev and other > commands > that re-open image files. > > Anthony Liguori wrote most of these patches. I > added a > demo -open-hook-fd server and added some small fixes. Since Anthony is > traveling right now I'm sending the RFC for discussion. > >>> > >>> What I like about this approach is that it's useful outside the block > >>> layer and > >>> is conceptionally simple from a QEMU PoV. We simply delegate open() to > >>> libvirt > >>> and let libvirt enforce whatever rules it wants. > >>> > >>> This is not meant to be an alternative to blockdev, but even with > >>> blockdev, I > >>> think we still want to use a mechanism like this even with blockdev. > >> > >> What does it provide on top? > >> > >> This doesn't look like something that I'd like a lot. qemu should be > >> able to continue to run no matter what the management tool does, whether > >> it responds to RPCs properly or whether it has crashed. You need a > >> really good use case for the RPC that cannot be covered otherwise in > >> order to justify this. > > > > Indeed, this solution breaks if you stop or restart libvirtd while > > QEMU is running. Restarting libvirt while QEMU is running is something > > we must support, since installing RPM updates will restart libvirtd > > and we cannot let guests die in this case. > > > > I would much prefer to see us be able to pass FDs in directly alongside > > the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be > > fixed so that you do not need to re-open FDs on the fly. > > I agree, and this is what -blockdev would give us. > > Part of why I don't like the RFC (apart from RPCing the management tool > being just wrong) is that once again it's trying to take shortcuts and > only provide a hack for the urgent need instead of doing it properly and > implementing -blockdev. I suspect that if we take something half-baked > like this, we will keep being unhappy with the situation in the block > layer, but it won't hurt enough any more to actually spend effort on it, > so that we'll go another five years with it. I tend to agree - we have been talking about -blockdev for faar to long without (AFAICT) making any real progress towards getting it done. I'd love to see someone bite the bullet & have a go at implementing it Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
Il 01/05/2012 22:56, Eric Blake ha scritto: > What sort > of timing restrictions are there? For example, the proposed > 'drive-reopen' command (probably now delegated to qemu 1.2) would mean > that qemu would be calling back into libvirt in order to do the reopen. > If libvirt takes its time in passing back an open fd, is it going to > starve qemu from answering unrelated monitor commands in the meantime? > I definitely want to make sure we avoid deadlock where libvirt is > waiting on a monitor command, but the monitor command is waiting on > libvirt to pass an fd. FWIW I'm going to kill drive-reopen in favor of something like block-job-complete that will not require reopening (it will require opening the backing files though, and that can also take time). Paolo
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
Il 02/05/2012 11:56, Daniel P. Berrange ha scritto: > I tend to agree - we have been talking about -blockdev for faar to long > without (AFAICT) making any real progress towards getting it done. I'd > love to see someone bite the bullet & have a go at implementing it Having a spec would help somewhat... Paolo
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/02/2012 04:45 AM, Kevin Wolf wrote: Am 02.05.2012 10:53, schrieb Daniel P. Berrange: I would much prefer to see us be able to pass FDs in directly alongside the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be fixed so that you do not need to re-open FDs on the fly. I agree, and this is what -blockdev would give us. Part of why I don't like the RFC (apart from RPCing the management tool being just wrong) is that once again it's trying to take shortcuts and only provide a hack for the urgent need instead of doing it properly and implementing -blockdev. The proper way to address this problem is *not* -blockdev. -blockdev is another short cut. The proper way to solve this problem is to add extended attribute to SELinux. Another proper solution is for libvirt to launch guests with different UIDs and use DAC to prevent guests from opening files. I suspect that if we take something half-baked like this, we will keep being unhappy with the situation in the block layer, but it won't hurt enough any more to actually spend effort on it, so that we'll go another five years with it. Wanting to refactor the block layer is great. I am fully in support of it. But holding practical features hostage is not reasonable. There is nothing intrinsically cleaner about using -blockdev fd=X verses using an RPC like this. -blockdev has a lot of nice characteristics but solving this problem is not one of them. Regards, Anthony Liguori Kevin
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi wrote: > Libvirt can take advantage of SELinux to restrict the QEMU process and prevent > it from opening files that it should not have access to. This improves > security because it prevents the attacker from escaping the QEMU process if > they manage to gain control. > > NFS has been a pain point for SELinux because it does not support labels > (which > I believe are stored in extended attributes). In other words, it's not > possible to use SELinux goodness on QEMU when image files are located on NFS. > Today we have to allow QEMU access to any file on the NFS export rather than > restricting specifically to the image files that the guest requires. > > File descriptor passing is a solution to this problem and might also come in > handy elsewhere. Libvirt or another external process chooses files which QEMU > is allowed to access and provides just those file descriptors - QEMU cannot > open the files itself. > > This series adds the -open-hook-fd command-line option. Whenever QEMU needs > to > open an image file it sends a request over the given UNIX domain socket. The > response includes the file descriptor or an errno on failure. Please see the > patches for details on the protocol. > > The -open-hook-fd approach allows QEMU to support file descriptor passing > without changing -drive. It also supports snapshot_blkdev and other commands By the way, How will it support them? > that re-open image files. > > Anthony Liguori wrote most of these patches. I added a > demo -open-hook-fd server and added some small fixes. Since Anthony is > traveling right now I'm sending the RFC for discussion. > > Anthony Liguori (3): > block: add open() wrapper that can be hooked by libvirt > block: add new command line parameter that and protocol description > block: plumb up open-hook-fd option > > Stefan Hajnoczi (2): > osdep: add qemu_recvmsg() wrapper > Example -open-hook-fd server > > block.c | 107 ++ > block.h | 2 + > block/raw-posix.c | 18 +++ > block/raw-win32.c | 2 +- > block/vdi.c | 2 +- > block/vmdk.c | 6 +-- > block/vpc.c | 2 +- > block/vvfat.c | 4 +- > block_int.h | 12 + > osdep.c | 46 + > qemu-common.h | 2 + > qemu-options.hx | 42 +++ > test-fd-passing.c | 147 > + > vl.c | 3 ++ > 14 files changed, 378 insertions(+), 17 deletions(-) > create mode 100644 test-fd-passing.c > > -- > 1.7.10 > > -- > libvir-list mailing list > libvir-l...@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/01/2012 06:15 PM, Eric Blake wrote: On 05/01/2012 03:53 PM, Anthony Liguori wrote: I think (correct me if I'm wrong) libvirt should be aware of any file that qemu asks it to open. So from a security point of view, libvirt can prevent opening a file if it isn't affiliated with the guest. Right, libvirt can maintain a whitelist of files QEMU is allowed to open (which is already has because it needs to label these files). Indeed. The only complexity is that it's not a straight strcmp(). The path needs to be (carefully) broken into components with '.' and '..' handled appropriately. But this shouldn't be that difficult to do. Libvirt would probably canonicalize path names, both when sticking them in the whitelist, and in validating the requests from qemu - agreed that it's not difficult. More importantly, libvirt needs to start tracking the backing chain of any qcow2 or qed file as part of the domain XML; and operations like 'block-stream' would update not only the chain, but also the whitelist. In the drive-reopen case, this means that libvirt would have to be careful when to change labeling - provide access to the new files before drive-reopen, then revoke access to files after drive-reopen completes. In other words, having the -open-hook-fd client pass a command to libvirt at the time it is closing an fd would help libvirt know when qemu has quit using a file, which might make it easier to revoke SELinux labels at that time. If we were to go with this approach, I think the following updates would be required for libvirt. Could you let me know if I'm missing anything? libvirt tasks: - Introduce a data structure to store file whitelist per guest - Add -open-hook-fd option to QEMU command line and pass Unix domain socket fd to QEMU - Create open() handler that handles requests from QEMU to open files and passes back fd - Potentially also handle close requests from QEMU? Would allow libvirt to update XML and whitelist (as well as SELinux labels). - Canonicalize path names when putting them in whitelist and when validating requests from QEMU - XML updates to track backing chain of qcow2 and qed files - Update whitelist and XML chain when QEMU monitor commands are used to open new files: block-stream, drive-reopen, drive_add, savevm, snapshot_blkdev, change Updates would also be required for SELinux and AppArmor policy to allow libvirt open of NFS files, and allow QEMU read/write (no open allowed) of NFS Files. -- Regards, Corey
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote: > On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi > wrote: > > Libvirt can take advantage of SELinux to restrict the QEMU process and > > prevent > > it from opening files that it should not have access to. This improves > > security because it prevents the attacker from escaping the QEMU process if > > they manage to gain control. > > > > NFS has been a pain point for SELinux because it does not support labels > > (which > > I believe are stored in extended attributes). In other words, it's not > > possible to use SELinux goodness on QEMU when image files are located on > > NFS. > > Today we have to allow QEMU access to any file on the NFS export rather than > > restricting specifically to the image files that the guest requires. > > > > File descriptor passing is a solution to this problem and might also come in > > handy elsewhere. Libvirt or another external process chooses files which > > QEMU > > is allowed to access and provides just those file descriptors - QEMU cannot > > open the files itself. > > > > This series adds the -open-hook-fd command-line option. Whenever QEMU > > needs to > > open an image file it sends a request over the given UNIX domain socket. > > The > > response includes the file descriptor or an errno on failure. Please see > > the > > patches for details on the protocol. > > > > The -open-hook-fd approach allows QEMU to support file descriptor passing > > without changing -drive. It also supports snapshot_blkdev and other > > commands > By the way, How will it support them? The problem with snapshot_blkdev is that closing a file and opening a new file cannot be done by the QEMU process when an SELinux policy is in place to prevent opening files. The -open-hook-fd approach works even when the QEMU process is not allowed to open files since file descriptor passing over a UNIX domain socket is used to open files on behalf of QEMU. Stefan
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Thu, May 17, 2012 at 9:42 PM, Stefan Hajnoczi wrote: > On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote: >> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi >> wrote: >> > Libvirt can take advantage of SELinux to restrict the QEMU process and >> > prevent >> > it from opening files that it should not have access to. This improves >> > security because it prevents the attacker from escaping the QEMU process if >> > they manage to gain control. >> > >> > NFS has been a pain point for SELinux because it does not support labels >> > (which >> > I believe are stored in extended attributes). In other words, it's not >> > possible to use SELinux goodness on QEMU when image files are located on >> > NFS. >> > Today we have to allow QEMU access to any file on the NFS export rather >> > than >> > restricting specifically to the image files that the guest requires. >> > >> > File descriptor passing is a solution to this problem and might also come >> > in >> > handy elsewhere. Libvirt or another external process chooses files which >> > QEMU >> > is allowed to access and provides just those file descriptors - QEMU cannot >> > open the files itself. >> > >> > This series adds the -open-hook-fd command-line option. Whenever QEMU >> > needs to >> > open an image file it sends a request over the given UNIX domain socket. >> > The >> > response includes the file descriptor or an errno on failure. Please see >> > the >> > patches for details on the protocol. >> > >> > The -open-hook-fd approach allows QEMU to support file descriptor passing >> > without changing -drive. It also supports snapshot_blkdev and other >> > commands >> By the way, How will it support them? > > The problem with snapshot_blkdev is that closing a file and opening a > new file cannot be done by the QEMU process when an SELinux policy is in > place to prevent opening files. > > The -open-hook-fd approach works even when the QEMU process is not > allowed to open files since file descriptor passing over a UNIX domain > socket is used to open files on behalf of QEMU. Do you mean that libvirt will provide QEMU with one service? When QEMU need open or close one new file, it can send one request to libvirt? > > Stefan > -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Thu, May 17, 2012 at 9:42 PM, Stefan Hajnoczi wrote: > On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote: >> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi >> wrote: >> > Libvirt can take advantage of SELinux to restrict the QEMU process and >> > prevent >> > it from opening files that it should not have access to. This improves >> > security because it prevents the attacker from escaping the QEMU process if >> > they manage to gain control. >> > >> > NFS has been a pain point for SELinux because it does not support labels >> > (which >> > I believe are stored in extended attributes). In other words, it's not >> > possible to use SELinux goodness on QEMU when image files are located on >> > NFS. >> > Today we have to allow QEMU access to any file on the NFS export rather >> > than >> > restricting specifically to the image files that the guest requires. >> > >> > File descriptor passing is a solution to this problem and might also come >> > in >> > handy elsewhere. Libvirt or another external process chooses files which >> > QEMU >> > is allowed to access and provides just those file descriptors - QEMU cannot >> > open the files itself. >> > >> > This series adds the -open-hook-fd command-line option. Whenever QEMU >> > needs to >> > open an image file it sends a request over the given UNIX domain socket. >> > The >> > response includes the file descriptor or an errno on failure. Please see >> > the >> > patches for details on the protocol. >> > >> > The -open-hook-fd approach allows QEMU to support file descriptor passing >> > without changing -drive. It also supports snapshot_blkdev and other >> > commands >> By the way, How will it support them? > > The problem with snapshot_blkdev is that closing a file and opening a > new file cannot be done by the QEMU process when an SELinux policy is in > place to prevent opening files. > > The -open-hook-fd approach works even when the QEMU process is not > allowed to open files since file descriptor passing over a UNIX domain > socket is used to open files on behalf of QEMU. I thought that the patchset can only let QEMU passively get passed fd parameter from upper application. > > Stefan > -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote: >>> >>> The -open-hook-fd approach allows QEMU to support file descriptor passing >>> without changing -drive. It also supports snapshot_blkdev and other >>> commands >> By the way, How will it support them? > > The problem with snapshot_blkdev is that closing a file and opening a > new file cannot be done by the QEMU process when an SELinux policy is in > place to prevent opening files. snapshot_blkdev can take an fd:name instead of a /path/to/file for the file to open, in which case libvirt can pass in the named fd _prior_ to the snapshot_blkdev using the 'getfd' monitor command. > > The -open-hook-fd approach works even when the QEMU process is not > allowed to open files since file descriptor passing over a UNIX domain > socket is used to open files on behalf of QEMU. The -open-hook-fd approach would indeed allow snapshot_blokdev to ask for the fd after the fact, but it's much more painful. Consider a case with a two-disk snapshot: with the fd:name approach, the sequence is: libvirt calls getfd:name1 over normal monitor qemu responds libvirt calls getfd:name2 over normal monitor qemu responds libvirt calls transaction around blockdev-snapshot-sync over normal monitor, using fd:name1 and fd:name2 qemu responds but with -open-hook-fd, the approach would be: libvirt calls transaction qemu calls open(file1) over hook libvirt responds qemu calls open(file2) over hook libvirt responds qemu responds to the original transaction The 'transaction' operation is thus blocked by the time it takes to do two intermediate opens over a second channel, which kind of defeats the purpose of making the transaction take effect with minimal guest downtime. And libvirt code becomes a lot trickier to deal with the fact that two channels are in use, and that the channel that issued the 'transaction' command must block while the other channel for handling hooks must be responsive. I'm really disliking the hook-fd approach, when a better solution is to make use of 'getfd' in advance of any operation that will need to open new fds. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Thu, May 17, 2012 at 08:14:15AM -0600, Eric Blake wrote: > On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote: > > >>> > >>> The -open-hook-fd approach allows QEMU to support file descriptor passing > >>> without changing -drive. It also supports snapshot_blkdev and other > >>> commands > >> By the way, How will it support them? > > > > The problem with snapshot_blkdev is that closing a file and opening a > > new file cannot be done by the QEMU process when an SELinux policy is in > > place to prevent opening files. > > snapshot_blkdev can take an fd:name instead of a /path/to/file for the > file to open, in which case libvirt can pass in the named fd _prior_ to > the snapshot_blkdev using the 'getfd' monitor command. > > > > > The -open-hook-fd approach works even when the QEMU process is not > > allowed to open files since file descriptor passing over a UNIX domain > > socket is used to open files on behalf of QEMU. > > The -open-hook-fd approach would indeed allow snapshot_blokdev to ask > for the fd after the fact, but it's much more painful. Consider a case > with a two-disk snapshot: > > with the fd:name approach, the sequence is: > > libvirt calls getfd:name1 over normal monitor > qemu responds > libvirt calls getfd:name2 over normal monitor > qemu responds > libvirt calls transaction around blockdev-snapshot-sync over normal > monitor, using fd:name1 and fd:name2 > qemu responds > > but with -open-hook-fd, the approach would be: > > libvirt calls transaction > qemu calls open(file1) over hook > libvirt responds > qemu calls open(file2) over hook > libvirt responds > qemu responds to the original transaction > > The 'transaction' operation is thus blocked by the time it takes to do > two intermediate opens over a second channel, which kind of defeats the > purpose of making the transaction take effect with minimal guest > downtime. And libvirt code becomes a lot trickier to deal with the fact > that two channels are in use, and that the channel that issued the > 'transaction' command must block while the other channel for handling > hooks must be responsive. > > I'm really disliking the hook-fd approach, when a better solution is to > make use of 'getfd' in advance of any operation that will need to open > new fds. This is a good technical argument for using getfd. I agree with you. Stefan
Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
On Thu, May 17, 2012 at 10:02:01PM +0800, Zhi Yong Wu wrote: > On Thu, May 17, 2012 at 9:42 PM, Stefan Hajnoczi > wrote: > > On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote: > >> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi > >> wrote: > >> > Libvirt can take advantage of SELinux to restrict the QEMU process and > >> > prevent > >> > it from opening files that it should not have access to. This improves > >> > security because it prevents the attacker from escaping the QEMU process > >> > if > >> > they manage to gain control. > >> > > >> > NFS has been a pain point for SELinux because it does not support labels > >> > (which > >> > I believe are stored in extended attributes). In other words, it's not > >> > possible to use SELinux goodness on QEMU when image files are located on > >> > NFS. > >> > Today we have to allow QEMU access to any file on the NFS export rather > >> > than > >> > restricting specifically to the image files that the guest requires. > >> > > >> > File descriptor passing is a solution to this problem and might also > >> > come in > >> > handy elsewhere. Libvirt or another external process chooses files > >> > which QEMU > >> > is allowed to access and provides just those file descriptors - QEMU > >> > cannot > >> > open the files itself. > >> > > >> > This series adds the -open-hook-fd command-line option. Whenever QEMU > >> > needs to > >> > open an image file it sends a request over the given UNIX domain socket. > >> > The > >> > response includes the file descriptor or an errno on failure. Please > >> > see the > >> > patches for details on the protocol. > >> > > >> > The -open-hook-fd approach allows QEMU to support file descriptor passing > >> > without changing -drive. It also supports snapshot_blkdev and other > >> > commands > >> By the way, How will it support them? > > > > The problem with snapshot_blkdev is that closing a file and opening a > > new file cannot be done by the QEMU process when an SELinux policy is in > > place to prevent opening files. > > > > The -open-hook-fd approach works even when the QEMU process is not > > allowed to open files since file descriptor passing over a UNIX domain > > socket is used to open files on behalf of QEMU. > I thought that the patchset can only let QEMU passively get passed fd > parameter from upper application. No. What this patch series does is make QEMU request file descriptors from an external process (e.g. libvirt) each time it wants to open an image file. Stefan