Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-23 Thread Stefan Hajnoczi
On Mon, Aug 22, 2011 at 04:01:08PM -0500, Anthony Liguori wrote:
 On 08/22/2011 03:48 PM, Ryan Harper wrote:
 * Stefan Hajnoczistefa...@gmail.com  [2011-08-22 15:32]:
 We wouldn't rm -rf block/* because we still need qemu-nbd.  It
 probably makes sense to keep what we have today.  I'm talking more
 about a shift from writing our own image format to integrating
 existing storage support.
 
 I think this is a key point.  While I do like the idea of keeping QEMU
 focused on single VM, I think we don't help ourselves by not consuming
 the hypervisor platform services and integrating/exploiting those
 features to make using QEMU easier.
 
 Let's avoid the h-word here as it's not terribly relevant to the discussion.
 
 Configuring block devices is fundamentally a privileged operation.
 QEMU fundamentally is designed to be useful as an unprivileged user.
 
 That's the trouble with something like LVM.  Only root can create
 LVM snapshots and it's an all-or-nothing security model.
 
 If you want to get QEMU out of the snapshot business, you need a
 file system that's widely available that allows non-privileged users
 to take snapshots of individual files.

I don't think we should remove qcow2 internal snapshots or
blockdev_snapshot.  But they have performance limitations where it makes
sense to start using existing storage support instead of reimplementing
efficient and scalable snapshots ourselves.

btrfs is maturing and its BTRFS_IOC_CLONE ioctl is unprivileged.  So we
can offer that option for unprivileged users.

Stefan



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-23 Thread Kevin Wolf
Am 22.08.2011 23:01, schrieb Anthony Liguori:
 On 08/22/2011 03:48 PM, Ryan Harper wrote:
 * Stefan Hajnoczistefa...@gmail.com  [2011-08-22 15:32]:
 We wouldn't rm -rf block/* because we still need qemu-nbd.  It
 probably makes sense to keep what we have today.  I'm talking more
 about a shift from writing our own image format to integrating
 existing storage support.

 I think this is a key point.  While I do like the idea of keeping QEMU
 focused on single VM, I think we don't help ourselves by not consuming
 the hypervisor platform services and integrating/exploiting those
 features to make using QEMU easier.
 
 Let's avoid the h-word here as it's not terribly relevant to the discussion.
 
 Configuring block devices is fundamentally a privileged operation.  QEMU 
 fundamentally is designed to be useful as an unprivileged user.
 
 That's the trouble with something like LVM.  Only root can create LVM 
 snapshots and it's an all-or-nothing security model.
 
 If you want to get QEMU out of the snapshot business, you need a file 
 system that's widely available that allows non-privileged users to take 
 snapshots of individual files.

I agree with you there (and it's interesting how different perception of
the BoF results can be ;-))

It's probably true that there are ways to do certain things on host
block devices and we should definitely support such use cases better
(where we means mostly the management layer, but we can possibly
integrate things into qemu like a file-btrfs or lvm_device backend that
supports snapshots or something).

It isn't for everyone, though, and this is why I tried to point out in
the BoF that image formats aren't going to go away and we still need
good support for them. Providing only raw for running VMs and declaring
the rest of the formats to be intended for import/export only doesn't work.

Kevin



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-23 Thread Stefan Hajnoczi
On Tue, Aug 23, 2011 at 12:25 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 22.08.2011 23:01, schrieb Anthony Liguori:
 On 08/22/2011 03:48 PM, Ryan Harper wrote:
 * Stefan Hajnoczistefa...@gmail.com  [2011-08-22 15:32]:
 We wouldn't rm -rf block/* because we still need qemu-nbd.  It
 probably makes sense to keep what we have today.  I'm talking more
 about a shift from writing our own image format to integrating
 existing storage support.

 I think this is a key point.  While I do like the idea of keeping QEMU
 focused on single VM, I think we don't help ourselves by not consuming
 the hypervisor platform services and integrating/exploiting those
 features to make using QEMU easier.

 Let's avoid the h-word here as it's not terribly relevant to the discussion.

 Configuring block devices is fundamentally a privileged operation.  QEMU
 fundamentally is designed to be useful as an unprivileged user.

 That's the trouble with something like LVM.  Only root can create LVM
 snapshots and it's an all-or-nothing security model.

 If you want to get QEMU out of the snapshot business, you need a file
 system that's widely available that allows non-privileged users to take
 snapshots of individual files.

 I agree with you there (and it's interesting how different perception of
 the BoF results can be ;-))

 It's probably true that there are ways to do certain things on host
 block devices and we should definitely support such use cases better
 (where we means mostly the management layer, but we can possibly
 integrate things into qemu like a file-btrfs or lvm_device backend that
 supports snapshots or something).

 It isn't for everyone, though, and this is why I tried to point out in
 the BoF that image formats aren't going to go away and we still need
 good support for them. Providing only raw for running VMs and declaring
 the rest of the formats to be intended for import/export only doesn't work.

I have said that block/*.c doesn't go away.  But we need to look at
exploiting storage features rather than reinventing them.

Snapshots are an example: we do not have a scalable snapshot mechanism
in QEMU.  External snapshots are inefficient when you build up
multiple levels (due to having to follow the backing file chain) and
when you delete a snapshot (due to copying data back into the backing
file).  Internal snapshots in qcow2 involve operations that traverse
the image metadata.  This traversal becomes a problem when image files
grow large (e.g. 1 TB and beyond) because the I/O required can take
more than 1 second which is problematic for taking snapshots while the
VM is running.

There are known ways of doing better internal snapshots along the
lines of what ZFS, btrfs, and thin-dev do.  But that means redesigning
the image metadata and reimplementing these storage systems in
userspace.

What I'm suggesting is that we draw the line here.  Keep what we've
got and continue the optimizations that we have in the pipeline.  But
when we hit significant new features, work with existing storage
systems.  Why?  Because we need to support existing storage anyway and
therefore reinventing our own is not a good use of resources.

Stefan



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Ryan Harper
* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 08:35]:
 At KVM Forum Kevin, Christoph, and I had an opportunity to get
 together for a Block Layer BoF.  We went through the recent roadmap
 mailing list thread and touched on each proposed feature.
 
 Here is the block layer roadmap wiki page:
 http://wiki.qemu.org/BlockRoadmap
 
 Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
 mentioned you want it for the next release.
 
 My main take-away from the BoF was that integrating support for host
 block devices and storage appliances will allow us to reduce the
 amount of effort spent on image formats.  In order to make image
 formats support the desired features and performance we end up
 implementing much of the storage stack and file systems in userspace -
 code that is duplicated and cannot take advantage of the existing
 storage stack.

+1

 
 Storage management features are not just available in remote SAN and
 NAS appliances anymore.  For local storage, btrfs has file-level
 clones and thin-dev is significantly improving LVM snapshots.
 
 Thin-dev is bringing a much more efficient and scalable snapshot model
 to LVM.  This device-mapper feature will make LVM attractive for high
 performance I/O without giving up snapshot and clone features.  It
 also supports cloning off block devices that are not in the pool (e.g.
 external storage, much like QEMU's backing files feature):
 https://github.com/jthornber/linux-2.6/tree/thin-dev
 
 This will not replace image formats overnight because image formats
 are still widely used and will continue to be a useful for
 transferring and sharing disk images.  But focussing on the larger

Any thoughts on how to make this easily usable for LVM?  If there were
an export/import to/from file to LVM?  is that sufficient?  Anything
like this in existence?

 storage stack where either local LVM, btrfs, or storage appliances do
 the storage management means we exploit those options instead of
 implementing equivalent functionality ourselves.  QEMU then runs with
 plain old raw in more cases.
 
 Stefan

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Stefan Hajnoczi
On Mon, Aug 22, 2011 at 09:27:12AM -0500, Ryan Harper wrote:
 * Stefan Hajnoczi stefa...@gmail.com [2011-08-22 08:35]:
  At KVM Forum Kevin, Christoph, and I had an opportunity to get
  together for a Block Layer BoF.  We went through the recent roadmap
  mailing list thread and touched on each proposed feature.
  
  Here is the block layer roadmap wiki page:
  http://wiki.qemu.org/BlockRoadmap
  
  Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
  mentioned you want it for the next release.
  
  My main take-away from the BoF was that integrating support for host
  block devices and storage appliances will allow us to reduce the
  amount of effort spent on image formats.  In order to make image
  formats support the desired features and performance we end up
  implementing much of the storage stack and file systems in userspace -
  code that is duplicated and cannot take advantage of the existing
  storage stack.
 
 +1
 
  
  Storage management features are not just available in remote SAN and
  NAS appliances anymore.  For local storage, btrfs has file-level
  clones and thin-dev is significantly improving LVM snapshots.
  
  Thin-dev is bringing a much more efficient and scalable snapshot model
  to LVM.  This device-mapper feature will make LVM attractive for high
  performance I/O without giving up snapshot and clone features.  It
  also supports cloning off block devices that are not in the pool (e.g.
  external storage, much like QEMU's backing files feature):
  https://github.com/jthornber/linux-2.6/tree/thin-dev
  
  This will not replace image formats overnight because image formats
  are still widely used and will continue to be a useful for
  transferring and sharing disk images.  But focussing on the larger
 
 Any thoughts on how to make this easily usable for LVM?  If there were
 an export/import to/from file to LVM?  is that sufficient?  Anything
 like this in existence?

Forgot to mention a major advantage of a raw-oriented storage stack: we need
good support for raw + storage appliance anyway.  Users want to hook up their
SAN or NAS just like they can with other hypervisors.  Time spent on image
formats would be better spent fleshing out integration with LVM, btrfs, SAN,
NAS, and friends.

Back to import/export, it serves two purposes:
1. Efficient transport.  Uploading and downloading image files in a
   compact form that represents zero blocks efficiently and perhaps
   compresses data.
2. Compatibility with other hypervisors and external tools.  Here it's
   all about using a well-defined file format.

In order to pull off a raw-oriented storage stack we need to do
import/export well.  So this is an area where we have to focus.

Image streaming is a good approach for import because it allows the VM
to start instantly (even before the image is fully imported).  A
qemu-nbd process serves up image data and we stream into a logical
volume.

For export we can do a fuse file system that presents logical volumes as image
files.  That way existing applications can get at the data as if there were
real image files sitting on the file system.  Sequential read access is easy
for all formats, random read is more difficult but should be doable for most
formats (the exception would be stream compressed formats that are not designed
for random access).

So moving to a raw-oriented storage stack does not mean we get rid of
image formats.  We still need them but they are outside the critical I/O
path.  Their role is changed since we don't push features into the
formats anymore.

Side note: iSCSI vs NBD came up during the BoF.  Although NBD has not
seen maintenance or activity recently it's perfectly possible to build
on it.  The first feature we need is a flush command (so that NBD can do
non-O_DSYNC accesses for speed).  At that point we have a bare-bones
remote block protocol that can be used for migration and for connecting
up userspace image formats.  iSCSI is more complex and suited for
permanent storage, whereas NBD is simple but perhaps not a protocol we
want to access data over for a long period of time.

Stefan



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Anthony Liguori

On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:

At KVM Forum Kevin, Christoph, and I had an opportunity to get
together for a Block Layer BoF.  We went through the recent roadmap
mailing list thread and touched on each proposed feature.

Here is the block layer roadmap wiki page:
http://wiki.qemu.org/BlockRoadmap

Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
mentioned you want it for the next release.

My main take-away from the BoF was that integrating support for host
block devices and storage appliances will allow us to reduce the
amount of effort spent on image formats.  In order to make image
formats support the desired features and performance we end up
implementing much of the storage stack and file systems in userspace -
code that is duplicated and cannot take advantage of the existing
storage stack.


The flip side is, tighter integration either makes features hard to 
consume or makes QEMU enter a space it currently hasn't.  Many features 
require root privileges to configure and a system-wide scope.  That's 
not QEMU today.


In addition, it makes QEMU tied to a specific platform (most likely Linux).

None of this is especially bad I guess, but none of it is a simple problem.

You could certainly rm -rf block/* and still be able to accomplish much 
of what's done today but it would be extremely painful to do in 
practice.  We have to find a balance of not reinventing things and 
making sure that simple things are simple to do.


That may require tighter integration and more focus on the higher up 
pieces in the stack to really enable this.


Regards,

Anthony Liguori



Storage management features are not just available in remote SAN and
NAS appliances anymore.  For local storage, btrfs has file-level
clones and thin-dev is significantly improving LVM snapshots.

Thin-dev is bringing a much more efficient and scalable snapshot model
to LVM.  This device-mapper feature will make LVM attractive for high
performance I/O without giving up snapshot and clone features.  It
also supports cloning off block devices that are not in the pool (e.g.
external storage, much like QEMU's backing files feature):
https://github.com/jthornber/linux-2.6/tree/thin-dev

This will not replace image formats overnight because image formats
are still widely used and will continue to be a useful for
transferring and sharing disk images.  But focussing on the larger
storage stack where either local LVM, btrfs, or storage appliances do
the storage management means we exploit those options instead of
implementing equivalent functionality ourselves.  QEMU then runs with
plain old raw in more cases.

Stefan






Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Stefan Hajnoczi
On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:

 At KVM Forum Kevin, Christoph, and I had an opportunity to get
 together for a Block Layer BoF.  We went through the recent roadmap
 mailing list thread and touched on each proposed feature.

 Here is the block layer roadmap wiki page:
 http://wiki.qemu.org/BlockRoadmap

 Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
 mentioned you want it for the next release.

 My main take-away from the BoF was that integrating support for host
 block devices and storage appliances will allow us to reduce the
 amount of effort spent on image formats.  In order to make image
 formats support the desired features and performance we end up
 implementing much of the storage stack and file systems in userspace -
 code that is duplicated and cannot take advantage of the existing
 storage stack.

 The flip side is, tighter integration either makes features hard to consume
 or makes QEMU enter a space it currently hasn't.  Many features require root
 privileges to configure and a system-wide scope.  That's not QEMU today.

QEMU itself should be about emulation and virtualization.  Storage
management needs to be done outside of QEMU.  Today you can already
take an LVM snapshot - it happens outside of QEMU.  It's at the
libvirt level where different storage systems get abstracted (LVM,
directory, iSCSI, etc) and there is a single API/command set to invoke
management functions.  But even without libvirt you can do it
yourself, and I think this separation makes sense so that QEMU can be
focussed on running a single VM rather than managing storage.

 In addition, it makes QEMU tied to a specific platform (most likely Linux).

QEMU will still work but certain features might not be available.  For
example, this is true today if you're using a storage appliance that
does deduplication - that's a feature you're getting on top of the
emulation/virtualization that QEMU does.  But it doesn't tie QEMU to a
particular platform.

 You could certainly rm -rf block/* and still be able to accomplish much of
 what's done today but it would be extremely painful to do in practice.  We
 have to find a balance of not reinventing things and making sure that simple
 things are simple to do.

We wouldn't rm -rf block/* because we still need qemu-nbd.  It
probably makes sense to keep what we have today.  I'm talking more
about a shift from writing our own image format to integrating
existing storage support.

 That may require tighter integration and more focus on the higher up pieces
 in the stack to really enable this.

Yes, exactly.  Much of it shouldn't be inside QEMU.

Stefan



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Ryan Harper
* Stefan Hajnoczi stefa...@gmail.com [2011-08-22 15:32]:
 On Mon, Aug 22, 2011 at 8:04 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
  On 08/22/2011 08:34 AM, Stefan Hajnoczi wrote:
 
  At KVM Forum Kevin, Christoph, and I had an opportunity to get
  together for a Block Layer BoF.  We went through the recent roadmap
  mailing list thread and touched on each proposed feature.
 
  Here is the block layer roadmap wiki page:
  http://wiki.qemu.org/BlockRoadmap
 
  Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
  mentioned you want it for the next release.
 
  My main take-away from the BoF was that integrating support for host
  block devices and storage appliances will allow us to reduce the
  amount of effort spent on image formats.  In order to make image
  formats support the desired features and performance we end up
  implementing much of the storage stack and file systems in userspace -
  code that is duplicated and cannot take advantage of the existing
  storage stack.
 
  The flip side is, tighter integration either makes features hard to consume
  or makes QEMU enter a space it currently hasn't.  Many features require root
  privileges to configure and a system-wide scope.  That's not QEMU today.
 
 QEMU itself should be about emulation and virtualization.  Storage
 management needs to be done outside of QEMU.  Today you can already
 take an LVM snapshot - it happens outside of QEMU.  It's at the
 libvirt level where different storage systems get abstracted (LVM,
 directory, iSCSI, etc) and there is a single API/command set to invoke
 management functions.  But even without libvirt you can do it
 yourself, and I think this separation makes sense so that QEMU can be
 focussed on running a single VM rather than managing storage.
 
  In addition, it makes QEMU tied to a specific platform (most likely Linux).
 
 QEMU will still work but certain features might not be available.  For
 example, this is true today if you're using a storage appliance that
 does deduplication - that's a feature you're getting on top of the
 emulation/virtualization that QEMU does.  But it doesn't tie QEMU to a
 particular platform.
 
  You could certainly rm -rf block/* and still be able to accomplish much of
  what's done today but it would be extremely painful to do in practice.  We
  have to find a balance of not reinventing things and making sure that simple
  things are simple to do.
 
 We wouldn't rm -rf block/* because we still need qemu-nbd.  It
 probably makes sense to keep what we have today.  I'm talking more
 about a shift from writing our own image format to integrating
 existing storage support.

I think this is a key point.  While I do like the idea of keeping QEMU
focused on single VM, I think we don't help ourselves by not consuming
the hypervisor platform services and integrating/exploiting those
features to make using QEMU easier.

That said, it does mean that some things like system-wide config and
privs are hard and aren't strictly virtualization issues, but that
doesn't mean we can't integrate some sort of solution.

 
  That may require tighter integration and more focus on the higher up pieces
  in the stack to really enable this.
 
 Yes, exactly.  Much of it shouldn't be inside QEMU.
 
 Stefan

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com



Re: [Qemu-devel] Block layer roadmap on wiki

2011-08-22 Thread Anthony Liguori

On 08/22/2011 03:48 PM, Ryan Harper wrote:

* Stefan Hajnoczistefa...@gmail.com  [2011-08-22 15:32]:

We wouldn't rm -rf block/* because we still need qemu-nbd.  It
probably makes sense to keep what we have today.  I'm talking more
about a shift from writing our own image format to integrating
existing storage support.


I think this is a key point.  While I do like the idea of keeping QEMU
focused on single VM, I think we don't help ourselves by not consuming
the hypervisor platform services and integrating/exploiting those
features to make using QEMU easier.


Let's avoid the h-word here as it's not terribly relevant to the discussion.

Configuring block devices is fundamentally a privileged operation.  QEMU 
fundamentally is designed to be useful as an unprivileged user.


That's the trouble with something like LVM.  Only root can create LVM 
snapshots and it's an all-or-nothing security model.


If you want to get QEMU out of the snapshot business, you need a file 
system that's widely available that allows non-privileged users to take 
snapshots of individual files.


Regards,

Anthony Liguori



Re: [Qemu-devel] Block layer roadmap

2011-07-29 Thread Stefan Hajnoczi
Frediano suggested:

Enhanced volume key in qcow3
 * luks-like key scheme that allows changing passphrase without
re-encrypting image data



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 27.07.2011 14:37, schrieb Stefan Hajnoczi:
 Hi,
 Here is a list of block layer and storage changes that have been discussed.  
 It
 is useful to have a roadmap of changes in order to avoid duplication, allow
 more developers to contribute, and to communicate the direction of storage in
 QEMU.
 
 I suggest we first do a braindump of all changes that have been discussed.
 Later we can discuss specific changes and if/when they fit into the roadmap -
 please don't jump into discussion about specific changes yet.
 
 Kevin: I hope this a useful starting point.  Here are all the items
 that I am aware of:
 
 =Material for next QEMU release=
 
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads
 
 Generic copy-on-read [Stefan]
  * Populate image file to avoid fetching same block from backing file
 again later
 
 Generic image streaming [Stefan]
  * Make block_stream commands available for all image formats that
 support backing files
 
 Live block copy [Marcelo/Kevin/Stefan?]
  * Copy the contents of an image file while a guest is using it
 
 In-place qcow2 - qed conversion [Devin, GSoC 2011]:
  * Fast conversion between qcow2 and qed image formats without copy all data
 
 VMDK enhancements [Fam, GSoC 2011]
  * Implement latest VMDK specs to support modern image files
 
 Block I/O limits [Zhi Yong]
  * Resource control for guest I/O bandwidth/iops consumption

Another item that just came up on IRC again: Allow guests to toggle WCE,
so that we finally can get rid of the cache=writethrough default. We
should definitely get this done before 1.0.

Kevin



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Stefan Hajnoczi
On Wed, Jul 27, 2011 at 1:37 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 =Changes where I am not aware of firm plans=

qcow2 online resize
 * Handle snapshots
 * Support shrinking

qed online resize
 * Support shrinking



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote:
 Another item that just came up on IRC again: Allow guests to toggle WCE,
 so that we finally can get rid of the cache=writethrough default. We
 should definitely get this done before 1.0.

Guest toggling is nice and cool, but the real feature is to allow it on
the command line.  As mentioned about 10 times before guests generally don't
fiddle with it without the administrator triggering it manually.  So
specifying it in the qemu config is the much better interface for 95% of
the use cases.




Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Christoph Hellwig
On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads

Can anyone explain what the whole point of this is?  It really just is
a bit of syntactic sugar for the current async state machines.  What does
it buy us over going for real threading?




Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Frediano Ziglio
2011/7/28 Christoph Hellwig h...@lst.de:
 On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads

 Can anyone explain what the whole point of this is?  It really just is
 a bit of syntactic sugar for the current async state machines.  What does
 it buy us over going for real threading?


This has nothing (or few) to do with threads. Instead of splitting
functions in callbacks at every synchronous function it allow to write
more readable code. For instance after changing qcow code from current
to coroutine you remove about 400 lines (about 30%). This will help
maintaining code and develop more complicated optimizations.

About threading you can do threading using AIO and using coroutines.

Frediano



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 02:15:38PM +0200, Frediano Ziglio wrote:
 
 This has nothing (or few) to do with threads. Instead of splitting
 functions in callbacks at every synchronous function it allow to write
 more readable code.

Thanks for repeating my statement that it doesn't fix the real thing
but just adds syntactic sugar.  




Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:08, schrieb Christoph Hellwig:
 On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote:
 Another item that just came up on IRC again: Allow guests to toggle WCE,
 so that we finally can get rid of the cache=writethrough default. We
 should definitely get this done before 1.0.
 
 Guest toggling is nice and cool, but the real feature is to allow it on
 the command line.  As mentioned about 10 times before guests generally don't
 fiddle with it without the administrator triggering it manually.  So
 specifying it in the qemu config is the much better interface for 95% of
 the use cases.

I don't really care about guest toggling. I care about the default cache
mode, and Anthony said that he doesn't agree with changing it unless
guests can toggle it.

Kevin



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:09, schrieb Christoph Hellwig:
 On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads
 
 Can anyone explain what the whole point of this is?  It really just is
 a bit of syntactic sugar for the current async state machines.  What does
 it buy us over going for real threading?

The only current block driver that really does everything in an async
state machine is qed. It's definitely not nice code, and having to
convert all of the other block drivers to this would be a lot of work.

So if it only means that we're making things async that would block the
VCPU until now, isn't that a great improvement already?

The advantage compared to threading is that it allows an easy and
incremental transition.

Kevin



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Anthony Liguori

On 07/28/2011 07:09 AM, Christoph Hellwig wrote:

On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:

Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads


Can anyone explain what the whole point of this is?  It really just is
a bit of syntactic sugar for the current async state machines.  What does
it buy us over going for real threading?


It is threading--just with a common locking model where a single big 
lock is held to make up for the fact that most of QEMU isn't reentrant.


By restructuring the code to be threaded, we can incrementally remove 
the big lock if we audit for use of non-reentrant functions and 
introduce more granular locking.


The whole ucontext/setjmp thing is just an optimization.  I would hope 
it entirely disappears long term as we promote coroutines to full threads.


Regards,

Anthony Liguori









Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 28.07.2011 14:09, schrieb Christoph Hellwig:
 On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads

 Can anyone explain what the whole point of this is?  It really just is
 a bit of syntactic sugar for the current async state machines.  What does
 it buy us over going for real threading?

 The only current block driver that really does everything in an async
 state machine is qed. It's definitely not nice code, and having to
 convert all of the other block drivers to this would be a lot of work.

Thanks Kevin :).  I do agree with the clumsiness of async callback
programming - a lot of code is spent bundling and unbundling variables
to pass between functions, not to mention that the control flow is
much harder to follow.

 So if it only means that we're making things async that would block the
 VCPU until now, isn't that a great improvement already?

 The advantage compared to threading is that it allows an easy and
 incremental transition.

Also remember that we already have a threads-based implementation of
coroutines.  Later on we can do fine-grained locking and switch to
threads directly instead of coroutines, if need be.

Stefan



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:54, schrieb Stefan Hajnoczi:
 On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 28.07.2011 14:09, schrieb Christoph Hellwig:
 On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU 
 threads

 Can anyone explain what the whole point of this is?  It really just is
 a bit of syntactic sugar for the current async state machines.  What does
 it buy us over going for real threading?

 The only current block driver that really does everything in an async
 state machine is qed. It's definitely not nice code, and having to
 convert all of the other block drivers to this would be a lot of work.
 
 Thanks Kevin :). 

I certainly didn't mean to attack your code or even yourself. It's not
that qed is done particularly bad or anything. That the code isn't
really nice is just the natural result of the callback-based programming
model.

 So if it only means that we're making things async that would block the
 VCPU until now, isn't that a great improvement already?

 The advantage compared to threading is that it allows an easy and
 incremental transition.
 
 Also remember that we already have a threads-based implementation of
 coroutines.  Later on we can do fine-grained locking and switch to
 threads directly instead of coroutines, if need be.

Might be an option for the future, but for now there's enough left to do
to take real advantage of coroutines.

Kevin



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 2:10 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 28.07.2011 14:54, schrieb Stefan Hajnoczi:
 On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 28.07.2011 14:09, schrieb Christoph Hellwig:
 On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU 
 threads

 Can anyone explain what the whole point of this is?  It really just is
 a bit of syntactic sugar for the current async state machines.  What does
 it buy us over going for real threading?

 The only current block driver that really does everything in an async
 state machine is qed. It's definitely not nice code, and having to
 convert all of the other block drivers to this would be a lot of work.

 Thanks Kevin :).

 I certainly didn't mean to attack your code or even yourself. It's not
 that qed is done particularly bad or anything. That the code isn't
 really nice is just the natural result of the callback-based programming
 model.

No worries, no offence taken :)

Stefan



Re: [Qemu-devel] Block layer roadmap

2011-07-27 Thread Anthony Liguori

On 07/27/2011 07:37 AM, Stefan Hajnoczi wrote:

Hi,
Here is a list of block layer and storage changes that have been discussed.  It
is useful to have a roadmap of changes in order to avoid duplication, allow
more developers to contribute, and to communicate the direction of storage in
QEMU.


Thanks for writing this up Stefan!


I suggest we first do a braindump of all changes that have been discussed.
Later we can discuss specific changes and if/when they fit into the roadmap -
please don't jump into discussion about specific changes yet.

Kevin: I hope this a useful starting point.  Here are all the items
that I am aware of:

=Material for next QEMU release=

Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads

Generic copy-on-read [Stefan]
  * Populate image file to avoid fetching same block from backing file
again later

Generic image streaming [Stefan]
  * Make block_stream commands available for all image formats that
support backing files

Live block copy [Marcelo/Kevin/Stefan?]
  * Copy the contents of an image file while a guest is using it

In-place qcow2-  qed conversion [Devin, GSoC 2011]:
  * Fast conversion between qcow2 and qed image formats without copy all data

VMDK enhancements [Fam, GSoC 2011]
  * Implement latest VMDK specs to support modern image files

Block I/O limits [Zhi Yong]
  * Resource control for guest I/O bandwidth/iops consumption

=Changes where I am not aware of firm plans=

Cow overlay [Dong Xu Robert]
  * Allow live block copy and image streaming to raw destination files

snapshot_blkdev and Backup API [Jes, Jagane]
  * Support for consistent disk snapshots and dirty block tracking
  * Allow backup software to integrate with QEMU

-blockdev [Markus?]
  * Explicit user control over block device trees


I'm planning on helping out here however I can.

Regards,

Anthony Liguori


QCOW3
  * Extend qcow2 format to address current and future image format challenges

iSCSI/NBD/Remote block device integration
  * Enable remote access to disk images for live migration and other tasks

Pre/post block copy
  * Working block migration

Avoid blocking QEMU threads
  * Today loss of NFS connectivity can hang guests
  * It's critical never to block the vcpu thread
  * The iothread should also not block while the qemu mutex is held
  * All blocking operations must be done asynchronously or in a worker thread

virtio-scsi [Paolo/Stefan]
  * The next step after virtio-blk, full SCSI command set and appears
as SCSI HBA in guest

tcm_vhost [Stefan]
  * Directly connect virtio-scsi with Linux in-kernel SCSI target
  * Pass-through of host SCSI devices