Re: [Qemu-devel] Block layer roadmap on wiki
On Mon, Aug 22, 2011 at 04:01:08PM -0500, Anthony Liguori wrote: On 08/22/2011 03:48 PM, Ryan Harper wrote: * Stefan Hajnoczistefa...@gmail.com [2011-08-22 15:32]: We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. I think this is a key point. While I do like the idea of keeping QEMU focused on single VM, I think we don't help ourselves by not consuming the hypervisor platform services and integrating/exploiting those features to make using QEMU easier. Let's avoid the h-word here as it's not terribly relevant to the discussion. Configuring block devices is fundamentally a privileged operation. QEMU fundamentally is designed to be useful as an unprivileged user. That's the trouble with something like LVM. Only root can create LVM snapshots and it's an all-or-nothing security model. If you want to get QEMU out of the snapshot business, you need a file system that's widely available that allows non-privileged users to take snapshots of individual files. I don't think we should remove qcow2 internal snapshots or blockdev_snapshot. But they have performance limitations where it makes sense to start using existing storage support instead of reimplementing efficient and scalable snapshots ourselves. btrfs is maturing and its BTRFS_IOC_CLONE ioctl is unprivileged. So we can offer that option for unprivileged users. Stefan
Re: [Qemu-devel] Block layer roadmap on wiki
Am 22.08.2011 23:01, schrieb Anthony Liguori: On 08/22/2011 03:48 PM, Ryan Harper wrote: * Stefan Hajnoczistefa...@gmail.com [2011-08-22 15:32]: We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. I think this is a key point. While I do like the idea of keeping QEMU focused on single VM, I think we don't help ourselves by not consuming the hypervisor platform services and integrating/exploiting those features to make using QEMU easier. Let's avoid the h-word here as it's not terribly relevant to the discussion. Configuring block devices is fundamentally a privileged operation. QEMU fundamentally is designed to be useful as an unprivileged user. That's the trouble with something like LVM. Only root can create LVM snapshots and it's an all-or-nothing security model. If you want to get QEMU out of the snapshot business, you need a file system that's widely available that allows non-privileged users to take snapshots of individual files. I agree with you there (and it's interesting how different perception of the BoF results can be ;-)) It's probably true that there are ways to do certain things on host block devices and we should definitely support such use cases better (where we means mostly the management layer, but we can possibly integrate things into qemu like a file-btrfs or lvm_device backend that supports snapshots or something). It isn't for everyone, though, and this is why I tried to point out in the BoF that image formats aren't going to go away and we still need good support for them. Providing only raw for running VMs and declaring the rest of the formats to be intended for import/export only doesn't work. Kevin
Re: [Qemu-devel] Block layer roadmap on wiki
On Tue, Aug 23, 2011 at 12:25 PM, Kevin Wolf kw...@redhat.com wrote: Am 22.08.2011 23:01, schrieb Anthony Liguori: On 08/22/2011 03:48 PM, Ryan Harper wrote: * Stefan Hajnoczistefa...@gmail.com [2011-08-22 15:32]: We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. I think this is a key point. While I do like the idea of keeping QEMU focused on single VM, I think we don't help ourselves by not consuming the hypervisor platform services and integrating/exploiting those features to make using QEMU easier. Let's avoid the h-word here as it's not terribly relevant to the discussion. Configuring block devices is fundamentally a privileged operation. QEMU fundamentally is designed to be useful as an unprivileged user. That's the trouble with something like LVM. Only root can create LVM snapshots and it's an all-or-nothing security model. If you want to get QEMU out of the snapshot business, you need a file system that's widely available that allows non-privileged users to take snapshots of individual files. I agree with you there (and it's interesting how different perception of the BoF results can be ;-)) It's probably true that there are ways to do certain things on host block devices and we should definitely support such use cases better (where we means mostly the management layer, but we can possibly integrate things into qemu like a file-btrfs or lvm_device backend that supports snapshots or something). It isn't for everyone, though, and this is why I tried to point out in the BoF that image formats aren't going to go away and we still need good support for them. Providing only raw for running VMs and declaring the rest of the formats to be intended for import/export only doesn't work. I have said that block/*.c doesn't go away. But we need to look at exploiting storage features rather than reinventing them. Snapshots are an example: we do not have a scalable snapshot mechanism in QEMU. External snapshots are inefficient when you build up multiple levels (due to having to follow the backing file chain) and when you delete a snapshot (due to copying data back into the backing file). Internal snapshots in qcow2 involve operations that traverse the image metadata. This traversal becomes a problem when image files grow large (e.g. 1 TB and beyond) because the I/O required can take more than 1 second which is problematic for taking snapshots while the VM is running. There are known ways of doing better internal snapshots along the lines of what ZFS, btrfs, and thin-dev do. But that means redesigning the image metadata and reimplementing these storage systems in userspace. What I'm suggesting is that we draw the line here. Keep what we've got and continue the optimizations that we have in the pipeline. But when we hit significant new features, work with existing storage systems. Why? Because we need to support existing storage anyway and therefore reinventing our own is not a good use of resources. Stefan
Re: [Qemu-devel] Block layer roadmap on wiki
* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 08:35]: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. +1 Storage management features are not just available in remote SAN and NAS appliances anymore. For local storage, btrfs has file-level clones and thin-dev is significantly improving LVM snapshots. Thin-dev is bringing a much more efficient and scalable snapshot model to LVM. This device-mapper feature will make LVM attractive for high performance I/O without giving up snapshot and clone features. It also supports cloning off block devices that are not in the pool (e.g. external storage, much like QEMU's backing files feature): https://github.com/jthornber/linux-2.6/tree/thin-dev This will not replace image formats overnight because image formats are still widely used and will continue to be a useful for transferring and sharing disk images. But focussing on the larger Any thoughts on how to make this easily usable for LVM? If there were an export/import to/from file to LVM? is that sufficient? Anything like this in existence? storage stack where either local LVM, btrfs, or storage appliances do the storage management means we exploit those options instead of implementing equivalent functionality ourselves. QEMU then runs with plain old raw in more cases. Stefan -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Block layer roadmap on wiki
On Mon, Aug 22, 2011 at 09:27:12AM -0500, Ryan Harper wrote: * Stefan Hajnoczi stefa...@gmail.com [2011-08-22 08:35]: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. +1 Storage management features are not just available in remote SAN and NAS appliances anymore. For local storage, btrfs has file-level clones and thin-dev is significantly improving LVM snapshots. Thin-dev is bringing a much more efficient and scalable snapshot model to LVM. This device-mapper feature will make LVM attractive for high performance I/O without giving up snapshot and clone features. It also supports cloning off block devices that are not in the pool (e.g. external storage, much like QEMU's backing files feature): https://github.com/jthornber/linux-2.6/tree/thin-dev This will not replace image formats overnight because image formats are still widely used and will continue to be a useful for transferring and sharing disk images. But focussing on the larger Any thoughts on how to make this easily usable for LVM? If there were an export/import to/from file to LVM? is that sufficient? Anything like this in existence? Forgot to mention a major advantage of a raw-oriented storage stack: we need good support for raw + storage appliance anyway. Users want to hook up their SAN or NAS just like they can with other hypervisors. Time spent on image formats would be better spent fleshing out integration with LVM, btrfs, SAN, NAS, and friends. Back to import/export, it serves two purposes: 1. Efficient transport. Uploading and downloading image files in a compact form that represents zero blocks efficiently and perhaps compresses data. 2. Compatibility with other hypervisors and external tools. Here it's all about using a well-defined file format. In order to pull off a raw-oriented storage stack we need to do import/export well. So this is an area where we have to focus. Image streaming is a good approach for import because it allows the VM to start instantly (even before the image is fully imported). A qemu-nbd process serves up image data and we stream into a logical volume. For export we can do a fuse file system that presents logical volumes as image files. That way existing applications can get at the data as if there were real image files sitting on the file system. Sequential read access is easy for all formats, random read is more difficult but should be doable for most formats (the exception would be stream compressed formats that are not designed for random access). So moving to a raw-oriented storage stack does not mean we get rid of image formats. We still need them but they are outside the critical I/O path. Their role is changed since we don't push features into the formats anymore. Side note: iSCSI vs NBD came up during the BoF. Although NBD has not seen maintenance or activity recently it's perfectly possible to build on it. The first feature we need is a flush command (so that NBD can do non-O_DSYNC accesses for speed). At that point we have a bare-bones remote block protocol that can be used for migration and for connecting up userspace image formats. iSCSI is more complex and suited for permanent storage, whereas NBD is simple but perhaps not a protocol we want to access data over for a long period of time. Stefan
Re: [Qemu-devel] Block layer roadmap on wiki
On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. The flip side is, tighter integration either makes features hard to consume or makes QEMU enter a space it currently hasn't. Many features require root privileges to configure and a system-wide scope. That's not QEMU today. In addition, it makes QEMU tied to a specific platform (most likely Linux). None of this is especially bad I guess, but none of it is a simple problem. You could certainly rm -rf block/* and still be able to accomplish much of what's done today but it would be extremely painful to do in practice. We have to find a balance of not reinventing things and making sure that simple things are simple to do. That may require tighter integration and more focus on the higher up pieces in the stack to really enable this. Regards, Anthony Liguori Storage management features are not just available in remote SAN and NAS appliances anymore. For local storage, btrfs has file-level clones and thin-dev is significantly improving LVM snapshots. Thin-dev is bringing a much more efficient and scalable snapshot model to LVM. This device-mapper feature will make LVM attractive for high performance I/O without giving up snapshot and clone features. It also supports cloning off block devices that are not in the pool (e.g. external storage, much like QEMU's backing files feature): https://github.com/jthornber/linux-2.6/tree/thin-dev This will not replace image formats overnight because image formats are still widely used and will continue to be a useful for transferring and sharing disk images. But focussing on the larger storage stack where either local LVM, btrfs, or storage appliances do the storage management means we exploit those options instead of implementing equivalent functionality ourselves. QEMU then runs with plain old raw in more cases. Stefan
Re: [Qemu-devel] Block layer roadmap on wiki
On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. The flip side is, tighter integration either makes features hard to consume or makes QEMU enter a space it currently hasn't. Many features require root privileges to configure and a system-wide scope. That's not QEMU today. QEMU itself should be about emulation and virtualization. Storage management needs to be done outside of QEMU. Today you can already take an LVM snapshot - it happens outside of QEMU. It's at the libvirt level where different storage systems get abstracted (LVM, directory, iSCSI, etc) and there is a single API/command set to invoke management functions. But even without libvirt you can do it yourself, and I think this separation makes sense so that QEMU can be focussed on running a single VM rather than managing storage. In addition, it makes QEMU tied to a specific platform (most likely Linux). QEMU will still work but certain features might not be available. For example, this is true today if you're using a storage appliance that does deduplication - that's a feature you're getting on top of the emulation/virtualization that QEMU does. But it doesn't tie QEMU to a particular platform. You could certainly rm -rf block/* and still be able to accomplish much of what's done today but it would be extremely painful to do in practice. We have to find a balance of not reinventing things and making sure that simple things are simple to do. We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. That may require tighter integration and more focus on the higher up pieces in the stack to really enable this. Yes, exactly. Much of it shouldn't be inside QEMU. Stefan
Re: [Qemu-devel] Block layer roadmap on wiki
* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 15:32]: On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote: At KVM Forum Kevin, Christoph, and I had an opportunity to get together for a Block Layer BoF. We went through the recent roadmap mailing list thread and touched on each proposed feature. Here is the block layer roadmap wiki page: http://wiki.qemu.org/BlockRoadmap Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you mentioned you want it for the next release. My main take-away from the BoF was that integrating support for host block devices and storage appliances will allow us to reduce the amount of effort spent on image formats. In order to make image formats support the desired features and performance we end up implementing much of the storage stack and file systems in userspace - code that is duplicated and cannot take advantage of the existing storage stack. The flip side is, tighter integration either makes features hard to consume or makes QEMU enter a space it currently hasn't. Many features require root privileges to configure and a system-wide scope. That's not QEMU today. QEMU itself should be about emulation and virtualization. Storage management needs to be done outside of QEMU. Today you can already take an LVM snapshot - it happens outside of QEMU. It's at the libvirt level where different storage systems get abstracted (LVM, directory, iSCSI, etc) and there is a single API/command set to invoke management functions. But even without libvirt you can do it yourself, and I think this separation makes sense so that QEMU can be focussed on running a single VM rather than managing storage. In addition, it makes QEMU tied to a specific platform (most likely Linux). QEMU will still work but certain features might not be available. For example, this is true today if you're using a storage appliance that does deduplication - that's a feature you're getting on top of the emulation/virtualization that QEMU does. But it doesn't tie QEMU to a particular platform. You could certainly rm -rf block/* and still be able to accomplish much of what's done today but it would be extremely painful to do in practice. We have to find a balance of not reinventing things and making sure that simple things are simple to do. We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. I think this is a key point. While I do like the idea of keeping QEMU focused on single VM, I think we don't help ourselves by not consuming the hypervisor platform services and integrating/exploiting those features to make using QEMU easier. That said, it does mean that some things like system-wide config and privs are hard and aren't strictly virtualization issues, but that doesn't mean we can't integrate some sort of solution. That may require tighter integration and more focus on the higher up pieces in the stack to really enable this. Yes, exactly. Much of it shouldn't be inside QEMU. Stefan -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] Block layer roadmap on wiki
On 08/22/2011 03:48 PM, Ryan Harper wrote: * Stefan Hajnoczistefa...@gmail.com [2011-08-22 15:32]: We wouldn't rm -rf block/* because we still need qemu-nbd. It probably makes sense to keep what we have today. I'm talking more about a shift from writing our own image format to integrating existing storage support. I think this is a key point. While I do like the idea of keeping QEMU focused on single VM, I think we don't help ourselves by not consuming the hypervisor platform services and integrating/exploiting those features to make using QEMU easier. Let's avoid the h-word here as it's not terribly relevant to the discussion. Configuring block devices is fundamentally a privileged operation. QEMU fundamentally is designed to be useful as an unprivileged user. That's the trouble with something like LVM. Only root can create LVM snapshots and it's an all-or-nothing security model. If you want to get QEMU out of the snapshot business, you need a file system that's widely available that allows non-privileged users to take snapshots of individual files. Regards, Anthony Liguori
Re: [Qemu-devel] Block layer roadmap
Frediano suggested: Enhanced volume key in qcow3 * luks-like key scheme that allows changing passphrase without re-encrypting image data
Re: [Qemu-devel] Block layer roadmap
Am 27.07.2011 14:37, schrieb Stefan Hajnoczi: Hi, Here is a list of block layer and storage changes that have been discussed. It is useful to have a roadmap of changes in order to avoid duplication, allow more developers to contribute, and to communicate the direction of storage in QEMU. I suggest we first do a braindump of all changes that have been discussed. Later we can discuss specific changes and if/when they fit into the roadmap - please don't jump into discussion about specific changes yet. Kevin: I hope this a useful starting point. Here are all the items that I am aware of: =Material for next QEMU release= Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Generic copy-on-read [Stefan] * Populate image file to avoid fetching same block from backing file again later Generic image streaming [Stefan] * Make block_stream commands available for all image formats that support backing files Live block copy [Marcelo/Kevin/Stefan?] * Copy the contents of an image file while a guest is using it In-place qcow2 - qed conversion [Devin, GSoC 2011]: * Fast conversion between qcow2 and qed image formats without copy all data VMDK enhancements [Fam, GSoC 2011] * Implement latest VMDK specs to support modern image files Block I/O limits [Zhi Yong] * Resource control for guest I/O bandwidth/iops consumption Another item that just came up on IRC again: Allow guests to toggle WCE, so that we finally can get rid of the cache=writethrough default. We should definitely get this done before 1.0. Kevin
Re: [Qemu-devel] Block layer roadmap
On Wed, Jul 27, 2011 at 1:37 PM, Stefan Hajnoczi stefa...@gmail.com wrote: =Changes where I am not aware of firm plans= qcow2 online resize * Handle snapshots * Support shrinking qed online resize * Support shrinking
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote: Another item that just came up on IRC again: Allow guests to toggle WCE, so that we finally can get rid of the cache=writethrough default. We should definitely get this done before 1.0. Guest toggling is nice and cool, but the real feature is to allow it on the command line. As mentioned about 10 times before guests generally don't fiddle with it without the administrator triggering it manually. So specifying it in the qemu config is the much better interface for 95% of the use cases.
Re: [Qemu-devel] Block layer roadmap
On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading?
Re: [Qemu-devel] Block layer roadmap
2011/7/28 Christoph Hellwig h...@lst.de: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? This has nothing (or few) to do with threads. Instead of splitting functions in callbacks at every synchronous function it allow to write more readable code. For instance after changing qcow code from current to coroutine you remove about 400 lines (about 30%). This will help maintaining code and develop more complicated optimizations. About threading you can do threading using AIO and using coroutines. Frediano
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 02:15:38PM +0200, Frediano Ziglio wrote: This has nothing (or few) to do with threads. Instead of splitting functions in callbacks at every synchronous function it allow to write more readable code. Thanks for repeating my statement that it doesn't fix the real thing but just adds syntactic sugar.
Re: [Qemu-devel] Block layer roadmap
Am 28.07.2011 14:08, schrieb Christoph Hellwig: On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote: Another item that just came up on IRC again: Allow guests to toggle WCE, so that we finally can get rid of the cache=writethrough default. We should definitely get this done before 1.0. Guest toggling is nice and cool, but the real feature is to allow it on the command line. As mentioned about 10 times before guests generally don't fiddle with it without the administrator triggering it manually. So specifying it in the qemu config is the much better interface for 95% of the use cases. I don't really care about guest toggling. I care about the default cache mode, and Anthony said that he doesn't agree with changing it unless guests can toggle it. Kevin
Re: [Qemu-devel] Block layer roadmap
Am 28.07.2011 14:09, schrieb Christoph Hellwig: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? The only current block driver that really does everything in an async state machine is qed. It's definitely not nice code, and having to convert all of the other block drivers to this would be a lot of work. So if it only means that we're making things async that would block the VCPU until now, isn't that a great improvement already? The advantage compared to threading is that it allows an easy and incremental transition. Kevin
Re: [Qemu-devel] Block layer roadmap
On 07/28/2011 07:09 AM, Christoph Hellwig wrote: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? It is threading--just with a common locking model where a single big lock is held to make up for the fact that most of QEMU isn't reentrant. By restructuring the code to be threaded, we can incrementally remove the big lock if we audit for use of non-reentrant functions and introduce more granular locking. The whole ucontext/setjmp thing is just an optimization. I would hope it entirely disappears long term as we promote coroutines to full threads. Regards, Anthony Liguori
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf kw...@redhat.com wrote: Am 28.07.2011 14:09, schrieb Christoph Hellwig: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? The only current block driver that really does everything in an async state machine is qed. It's definitely not nice code, and having to convert all of the other block drivers to this would be a lot of work. Thanks Kevin :). I do agree with the clumsiness of async callback programming - a lot of code is spent bundling and unbundling variables to pass between functions, not to mention that the control flow is much harder to follow. So if it only means that we're making things async that would block the VCPU until now, isn't that a great improvement already? The advantage compared to threading is that it allows an easy and incremental transition. Also remember that we already have a threads-based implementation of coroutines. Later on we can do fine-grained locking and switch to threads directly instead of coroutines, if need be. Stefan
Re: [Qemu-devel] Block layer roadmap
Am 28.07.2011 14:54, schrieb Stefan Hajnoczi: On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf kw...@redhat.com wrote: Am 28.07.2011 14:09, schrieb Christoph Hellwig: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? The only current block driver that really does everything in an async state machine is qed. It's definitely not nice code, and having to convert all of the other block drivers to this would be a lot of work. Thanks Kevin :). I certainly didn't mean to attack your code or even yourself. It's not that qed is done particularly bad or anything. That the code isn't really nice is just the natural result of the callback-based programming model. So if it only means that we're making things async that would block the VCPU until now, isn't that a great improvement already? The advantage compared to threading is that it allows an easy and incremental transition. Also remember that we already have a threads-based implementation of coroutines. Later on we can do fine-grained locking and switch to threads directly instead of coroutines, if need be. Might be an option for the future, but for now there's enough left to do to take real advantage of coroutines. Kevin
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 2:10 PM, Kevin Wolf kw...@redhat.com wrote: Am 28.07.2011 14:54, schrieb Stefan Hajnoczi: On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf kw...@redhat.com wrote: Am 28.07.2011 14:09, schrieb Christoph Hellwig: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? The only current block driver that really does everything in an async state machine is qed. It's definitely not nice code, and having to convert all of the other block drivers to this would be a lot of work. Thanks Kevin :). I certainly didn't mean to attack your code or even yourself. It's not that qed is done particularly bad or anything. That the code isn't really nice is just the natural result of the callback-based programming model. No worries, no offence taken :) Stefan
Re: [Qemu-devel] Block layer roadmap
On 07/27/2011 07:37 AM, Stefan Hajnoczi wrote: Hi, Here is a list of block layer and storage changes that have been discussed. It is useful to have a roadmap of changes in order to avoid duplication, allow more developers to contribute, and to communicate the direction of storage in QEMU. Thanks for writing this up Stefan! I suggest we first do a braindump of all changes that have been discussed. Later we can discuss specific changes and if/when they fit into the roadmap - please don't jump into discussion about specific changes yet. Kevin: I hope this a useful starting point. Here are all the items that I am aware of: =Material for next QEMU release= Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Generic copy-on-read [Stefan] * Populate image file to avoid fetching same block from backing file again later Generic image streaming [Stefan] * Make block_stream commands available for all image formats that support backing files Live block copy [Marcelo/Kevin/Stefan?] * Copy the contents of an image file while a guest is using it In-place qcow2- qed conversion [Devin, GSoC 2011]: * Fast conversion between qcow2 and qed image formats without copy all data VMDK enhancements [Fam, GSoC 2011] * Implement latest VMDK specs to support modern image files Block I/O limits [Zhi Yong] * Resource control for guest I/O bandwidth/iops consumption =Changes where I am not aware of firm plans= Cow overlay [Dong Xu Robert] * Allow live block copy and image streaming to raw destination files snapshot_blkdev and Backup API [Jes, Jagane] * Support for consistent disk snapshots and dirty block tracking * Allow backup software to integrate with QEMU -blockdev [Markus?] * Explicit user control over block device trees I'm planning on helping out here however I can. Regards, Anthony Liguori QCOW3 * Extend qcow2 format to address current and future image format challenges iSCSI/NBD/Remote block device integration * Enable remote access to disk images for live migration and other tasks Pre/post block copy * Working block migration Avoid blocking QEMU threads * Today loss of NFS connectivity can hang guests * It's critical never to block the vcpu thread * The iothread should also not block while the qemu mutex is held * All blocking operations must be done asynchronously or in a worker thread virtio-scsi [Paolo/Stefan] * The next step after virtio-blk, full SCSI command set and appears as SCSI HBA in guest tcm_vhost [Stefan] * Directly connect virtio-scsi with Linux in-kernel SCSI target * Pass-through of host SCSI devices