Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-19 Thread Michael S. Tsirkin
On Thu, Jun 16, 2011 at 11:01:22PM +0800, Asias He wrote:
 On 06/16/2011 10:28 PM, Michael S. Tsirkin wrote:
  On Wed, Jun 15, 2011 at 06:53:34PM +0300, Pekka Enberg wrote:
  Hi all,
 
  We’re proud to announce the second version of the Native Linux KVM tool! 
  We’re
  now officially aiming for merging to mainline in 3.1.
 
  Highlights:
 
  - Experimental GUI support using SDL and VNC
 
  - SMP support. tools/kvm/ now has a highly scalable, largely lockless 
  driver
interface and the individual drivers are using finegrained locks.
 
  - TAP-based virtio networking
  
  Wanted to ask for a while: would it make sense to use vhost-net?
  Or maybe use that exclusively?
  Less hypervisor code to support would help the focus.
  
 
 Sure. We are planning to use vhost-net. Just out of time right now, we
 are currently working on simple user model network support which allows
 plain user to use network without root privilege.

Yes, qemu does this by implementing NAT and the
TCP stack in userspace.  What always made me unhappy
about this solution is that we have a perfectly fine NAT and TCP
in kernel, we just lack APIs to make an unpriveledged
user make use of it the way we want.
I hope you can avoid this duplication.

Another question is whether you want to implement a dhcp server.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-17 Thread Jeff Garzik

On 06/16/2011 07:22 AM, Ingo Molnar wrote:


* Christoph Hellwigh...@infradead.org  wrote:


On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote:

Uh-oh. Someone needs to apply this patch to sync_file_range():


There actually are a few cases where using it makes sense. [...]


Such as? I don't think apps can actually know whether disk blocks
have been 'instantiated' by a particular filesystem or not, so the
manpage:

Some details
None  of these operations write out the file’s metadata.  Therefore, 
unless the appli-
cation is strictly performing overwrites of already-instantiated  disk  
blocks,  there
are no guarantees that the data will be available after a crash.

is rather misleading. This is a dangerous (and rather pointless)
syscall and this should be made much clearer in the manpage.


Not pointless at all -- see Linus's sync_file_range() examples in Re: 
Unexpected splice always copy behavior observed thread from May 2010.


Apps like MythTV may use it for streaming data to disk, basically 
shoving the VM out of the way to give the app more fine-grained writeout 
control.


Just don't mistake sync_file_range() for a data integrity syscall.

Jeff



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-17 Thread justin

On 2011年06月15日 23:53, Pekka Enberg wrote:

or alternatively, if you already have a kernel source tree:

   git remote add kvm-tool git://github.com/penberg/linux-kvm.git
   git remote update
   git checkout -b kvm-tool/master kvm-tool
I tried this, but it do not work, there is something wrong when I 
executed the 3rd git command.

I tried git checkout -b tools/kvm kvm-tool/master, it seems works fine.

--
justin

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-17 Thread Sasha Levin
On Fri, 2011-06-17 at 06:00 +0100, Stefan Hajnoczi wrote:
 On Fri, Jun 17, 2011 at 2:03 AM, Sasha Levin levinsasha...@gmail.com wrote:
  On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote:
  On 06/16/2011 09:48 AM, Pekka Enberg wrote:
   On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enbergpenb...@kernel.org  wrote:
   - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. 
   See the
 following URL for test result details: https://gist.github.com/1026888
  
   It turns out we were benchmarking the wrong guest kernel version for
   qemu-kvm which is why it performed so much worse. Here's a summary of
   qemu-kvm beating tools/kvm:
  
   https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt
  
   I'd ask for a brown paper bag if I wasn't so busy eating my hat at the 
   moment.
 
  np, it happens.
 
  Is that still with QEMU with IDE emulation, cache=writethrough, and
  128MB of guest memory?
 
  Does your raw driver support multiple parallel requests?  It doesn't
  look like it does from how I read the code.  At some point, I'd be happy
  to help ya'll do some benchmarking against QEMU.
 
 
  Each virtio-blk device can process requests regardless of other
  virtio-blk devices, which means that we can do parallel requests for
  devices.
 
  Within each device, we support parallel requests in the sense that we do
  vectored IO for each head (which may contain multiple blocks) in the
  vring, we don't do multiple heads because when I've tried adding AIO
  I've noticed that at most there are 2-3 possible heads - and since it
  points to the same device it doesn't really help running them in
  parallel.
 
 One thing that QEMU does but I'm a little suspicious of is request
 merging.  virtio-blk will submit those 2-3 heads using
 bdrv_aio_multiwrite() if they become available in the same virtqueue
 notify.  The requests will be merged if possible.
 
 My feeling is that we should already have merged requests coming
 through virtio-blk and there should be no need to do any merging -
 which could be a workaround for a poor virtio-blk vring configuration
 that prevented the guest from sending large requests.  However, this
 feature did yield performance improvements with qcow2 image files when
 it was introduced, so that would be interesting to look at.
 
 Are you enabling indirect descriptors on the virtio-blk vring?  That
 should allow more requests to be made available because you don't run
 out of vring descriptors so easily.

No, but we're usually not getting close to running out of vring
descriptors either.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-17 Thread Anthony Liguori

On 06/16/2011 08:03 PM, Sasha Levin wrote:

On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote:
Each virtio-blk device can process requests regardless of other
virtio-blk devices, which means that we can do parallel requests for
devices.

Within each device, we support parallel requests in the sense that we do
vectored IO for each head (which may contain multiple blocks) in the
vring, we don't do multiple heads because when I've tried adding AIO


A scatter/gather list isn't multiple requests, it's just one.  So you 
handle one request at a time ATM.  There's nothing with that, but 
there's no use in saying we support it in the sense... :-)



I've noticed that at most there are 2-3 possible heads - and since it
points to the same device it doesn't really help running them in
parallel.


Sure it does.  If you use the host page cache (and you do), then if you 
have two requests, A and B, and request A requires a disk access and 
request B can be satisfied from the page cache, then being able to 
submit both requests means that you can return B almost immediately 
instead of stalling out to finish A before starting B.


Not to mention that modern disks work better with multiple in flight 
requests because they have their own cache and reordering algorithms in 
the drives cache.  With RAID and higher end storage devices, a single 
device may map to multiple spindles.  The only way to have them all spin 
at once is to submit parallel requests.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Pekka Enberg
On Thu, Jun 16, 2011 at 1:44 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 That's probably why it's fast, it doesn't preserve data integrity :(

 Actually, I misread the code.  It does unstable writes but it does do
 fsync() on FLUSH.

On Thu, Jun 16, 2011 at 8:41 AM, Pekka Enberg penb...@kernel.org wrote:
 Yes. That's fine, right? Or did we misread how virtio block devices
 are supposed to work?

And btw, we use sync_file_range() to make sure the metadata part of a
QCOW2 image is never corrupted. The rational here is that if the guest
doesn't do VIRTIO_BLK_T_FLUSH, you can corrupt your _guest filesystem_
but the _image_ will still work just fine and you can do fsck on it.

Also, Prasad ran xfstests and did over-night stress tests to iron out
corruption issues. Now we obviously can't promise that we'll never eat
your data but I can assure you that we've done as much as we've been
able to with the resources we have at the moment.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Ingo Molnar

* Pekka Enberg penb...@kernel.org wrote:

 On Thu, Jun 16, 2011 at 1:07 AM, Alexander Graf ag...@suse.de wrote:
  qemu-system-x86_64 -drive 
  file=/dev/shm/test.qcow2,cache=writeback,if=virtio
 
  Wouldn't this still be using threaded AIO mode? I thought KVM tools used 
  native AIO?
 
 We don't use AIO at all. It's just normal read()/write() with a 
 thread pool. I actually looked at AIO but didn't really see why 
 we'd want to use it.

We could certainly try kernel AIO, it would allow us to do all the 
virtio-blk logic from the vcpu thread, without single threading it - 
turning the QCOW2 logic into an AIO driven state machine in essence.

Advantages:

 - we wouldnt do context-switching between the vcpu thread and the 
   helper threads

 - it would potentially provide tighter caching and potentially would 
   allow higher scalability.

Disadvantages:

 - the kaio codepaths are actually *more* complex than the regular 
   read()/write() IO codepaths - they keep track of an 'IO context', 
   so part of the efficiency advantages are spent on AIO tracking.

 - executing AIO in the vcpu thread eats up precious vcpu execution 
   time: combined QCOW2 throughput would be limited by a single 
   core's performance, and any time spent on QCOW2 processing would 
   not be spent running the guest CPU. (In such a model we certainly 
   couldnt do more intelligent, CPU-intense storage solutions like on 
   the fly compress/decompress of QCOW2 data.)

 - state machines are also fragile in the sense that any 
   unintentional blocking of the vcpu context will kill the 
   performance and latencies of *all* processing in certain 
   circumstances. So we generally strive to keep the vcpu demux path 
   obvious, simple and atomic.

 - more advanced security models go out the window as well: we 
   couldnt isolate drivers from each other if all of them execute in 
   the same vcpu context ...

 - state machines are also notoriously difficult to develop,
   debug and maintain.

So careful performance, scalability, IO delay and maintainability 
measurements have to accompany such a model switch, because the 
disadvantages are numerous.

I'd only consider KAIO it if it provides some *real* measurable 
performance advantage of at least 10% in some important usecase.
A few percent probably wouldnt be worth it.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Pekka Enberg
Hi Ingo,

On Thu, Jun 16, 2011 at 10:24 AM, Ingo Molnar mi...@elte.hu wrote:
  - executing AIO in the vcpu thread eats up precious vcpu execution
   time: combined QCOW2 throughput would be limited by a single
   core's performance, and any time spent on QCOW2 processing would
   not be spent running the guest CPU. (In such a model we certainly
   couldnt do more intelligent, CPU-intense storage solutions like on
   the fly compress/decompress of QCOW2 data.)

Most image formats have optional on-the-fly compression/decompression
so we'd need to keep the current I/O thread scheme anyway.

 I'd only consider KAIO it if it provides some *real* measurable
 performance advantage of at least 10% in some important usecase.
 A few percent probably wouldnt be worth it.

I've only been following AIO kernel development from the sidelines but
I really haven't seen any reports of significant gains over
read()/write() from a thread pool. Are there any such reports?

  Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Ingo Molnar

* Pekka Enberg penb...@kernel.org wrote:

 Hi Ingo,
 
 On Thu, Jun 16, 2011 at 10:24 AM, Ingo Molnar mi...@elte.hu wrote:
   - executing AIO in the vcpu thread eats up precious vcpu execution
    time: combined QCOW2 throughput would be limited by a single
    core's performance, and any time spent on QCOW2 processing would
    not be spent running the guest CPU. (In such a model we certainly
    couldnt do more intelligent, CPU-intense storage solutions like on
    the fly compress/decompress of QCOW2 data.)
 
 Most image formats have optional on-the-fly 
 compression/decompression so we'd need to keep the current I/O 
 thread scheme anyway.

Yeah - although high-performance setups will probably not use that.

  I'd only consider KAIO it if it provides some *real* measurable 
  performance advantage of at least 10% in some important usecase. 
  A few percent probably wouldnt be worth it.
 
 I've only been following AIO kernel development from the sidelines 
 but I really haven't seen any reports of significant gains over 
 read()/write() from a thread pool. Are there any such reports?

I've measured such gains myself a couple of years ago, using an 
Oracle DB and a well-known OLTP benchmark, on a 64-way system.

I also profiled+tuned the kernel-side AIO implementation to be more 
scalable so i'm reasonably certain that the gains exist, and they 
were above 10%.

So the kaio gains existed back then but they needed sane userspace 
(POSIX AIO with signal notification sucks) and needed a well-tuned 
in-kernel implementation as well. (the current AIO code might have 
bitrotted)

Also, synchronous read()/write() [and scheduler() :-)] scalability 
improvements have not stopped in the past few years so the 
performance picture might have shifted in favor of a thread pool.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Stefan Hajnoczi
On Thu, Jun 16, 2011 at 8:24 AM, Ingo Molnar mi...@elte.hu wrote:
  - executing AIO in the vcpu thread eats up precious vcpu execution
   time: combined QCOW2 throughput would be limited by a single
   core's performance, and any time spent on QCOW2 processing would
   not be spent running the guest CPU. (In such a model we certainly
   couldnt do more intelligent, CPU-intense storage solutions like on
   the fly compress/decompress of QCOW2 data.)

This has been a problem in qemu-kvm.  io_submit(2) steals time from
the guest (I think it was around 20us on the system I measured last
year).

Add the fact that the guest kernel might be holding a spinlock and it
becomes a scalability problem for SMP guests.

Anything that takes noticable CPU time should be done outside the vcpu thread.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote:
 And btw, we use sync_file_range() 

Which doesn't help you at all.  sync_file_range is just a hint for VM
writeback, but never commits filesystem metadata nor the physical
disk's write cache.  In short it's a completely dangerous interface, and
that is pretty well documented in the man page.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Pekka Enberg
Hi Christoph,

On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote:
 And btw, we use sync_file_range()

On Thu, Jun 16, 2011 at 12:24 PM, Christoph Hellwig h...@infradead.org wrote:
 Which doesn't help you at all.  sync_file_range is just a hint for VM
 writeback, but never commits filesystem metadata nor the physical
 disk's write cache.  In short it's a completely dangerous interface, and
 that is pretty well documented in the man page.

Doh - I didn't read it carefully enough and got hung up with:

Therefore, unless the application is strictly performing overwrites of
already-instantiated disk blocks, there are no guarantees that the data will
be available after a crash.

without noticing that it obviously doesn't work with filesystems like
btrfs that do copy-on-write.

What's the right thing to do here? Is fdatasync() sufficient?

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2011 at 12:34:04PM +0300, Pekka Enberg wrote:
 Hi Christoph,
 
 On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote:
  And btw, we use sync_file_range()
 
 On Thu, Jun 16, 2011 at 12:24 PM, Christoph Hellwig h...@infradead.org 
 wrote:
  Which doesn't help you at all. ?sync_file_range is just a hint for VM
  writeback, but never commits filesystem metadata nor the physical
  disk's write cache. ?In short it's a completely dangerous interface, and
  that is pretty well documented in the man page.
 
 Doh - I didn't read it carefully enough and got hung up with:
 
 Therefore, unless the application is strictly performing overwrites of
 already-instantiated disk blocks, there are no guarantees that the data 
 will
 be available after a crash.
 
 without noticing that it obviously doesn't work with filesystems like
 btrfs that do copy-on-write.

You also missed:

 This system call does not flush disk write caches and thus does not
  provide any data integrity on systems with volatile disk write
  caches.

so it's not safe if you either have a cache, or are using btrfs, or
are using a sparse image, or are using an image preallocated using
fallocate/posix_fallocate.

 What's the right thing to do here? Is fdatasync() sufficient?

Yes.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Ingo Molnar

* Christoph Hellwig h...@infradead.org wrote:

  What's the right thing to do here? Is fdatasync() sufficient?
 
 Yes.

Prasad, Pekka, mind redoing the numbers with fdatasync()?

I'd be surprised if they were significantly worse but it has to be 
done to have apples-to-apples numbers.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Pekka Enberg
On Thu, Jun 16, 2011 at 12:48 PM, Christoph Hellwig h...@infradead.org wrote:
 You also missed:

  This system call does not flush disk write caches and thus does not
  provide any data integrity on systems with volatile disk write
  caches.

 so it's not safe if you either have a cache, or are using btrfs, or
 are using a sparse image, or are using an image preallocated using
 fallocate/posix_fallocate.

Uh-oh. Someone needs to apply this patch to sync_file_range():

diff --git a/fs/sync.c b/fs/sync.c
index ba76b96..32078aa 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -277,6 +277,8 @@ SYSCALL_DEFINE(sync_file_range)(int fd, loff_t
offset, loff_t nbytes,
int fput_needed;
umode_t i_mode;

+   WARN_ONCE(1, when this breaks, you get to keep both pieces);
+
ret = -EINVAL;
if (flags  ~VALID_FLAGS)
goto out;


 What's the right thing to do here? Is fdatasync() sufficient?

 Yes.

We'll fix that up. Thanks Christoph!

 Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote:
 Uh-oh. Someone needs to apply this patch to sync_file_range():

There actually are a few cases where using it makes sense.  It's just
the minority.  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Ingo Molnar

* Christoph Hellwig h...@infradead.org wrote:

 On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote:
  Uh-oh. Someone needs to apply this patch to sync_file_range():
 
 There actually are a few cases where using it makes sense. [...]

Such as? I don't think apps can actually know whether disk blocks 
have been 'instantiated' by a particular filesystem or not, so the 
manpage:

   Some details
   None  of these operations write out the file’s metadata.  Therefore, 
unless the appli-
   cation is strictly performing overwrites of already-instantiated  disk  
blocks,  there
   are no guarantees that the data will be available after a crash.

is rather misleading. This is a dangerous (and rather pointless) 
syscall and this should be made much clearer in the manpage.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2011 at 01:22:30PM +0200, Ingo Molnar wrote:
 Such as? I don't think apps can actually know whether disk blocks 
 have been 'instantiated' by a particular filesystem or not, so the 
 manpage:

In general they can't.  The only good use case for sync_file_range
is to paper over^H^H^H^H^H^H^H^H^Hcontrol write back behaviour.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Ingo Molnar

* Christoph Hellwig h...@infradead.org wrote:

 On Thu, Jun 16, 2011 at 01:22:30PM +0200, Ingo Molnar wrote:

  Such as? I don't think apps can actually know whether disk blocks 
  have been 'instantiated' by a particular filesystem or not, so 
  the manpage:
 
 In general they can't.  The only good use case for sync_file_range 
 is to paper over^H^H^H^H^H^H^H^H^Hcontrol write back behaviour.

Well, if overwrite is fundamentally safe on a filesystem (which is 
most of them) then sync_file_range() would work - and it has the big 
advantage that it's a pretty simple facility.

Filesystems that cannot guarantee that should map their 
sync_file_range() implementation to fdatasync() or so, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2011 at 01:40:45PM +0200, Ingo Molnar wrote:
 Filesystems that cannot guarantee that should map their 
 sync_file_range() implementation to fdatasync() or so, right?

Filesystems aren't even told about sync_file_range, it's purely a VM
thing, which is the root of the problem.

In-kernel we have all the infrastructure for a real ranged
fsync/fdatasync, and once we get a killer users for that can triviall
export it at the syscall level.  I don't think mapping sync_file_range
with it's weird set of flags and confusing behaviour to it is a good
idea, though.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Michael S. Tsirkin
On Wed, Jun 15, 2011 at 06:53:34PM +0300, Pekka Enberg wrote:
 Hi all,
 
 We’re proud to announce the second version of the Native Linux KVM tool! We’re
 now officially aiming for merging to mainline in 3.1.
 
 Highlights:
 
 - Experimental GUI support using SDL and VNC
 
 - SMP support. tools/kvm/ now has a highly scalable, largely lockless driver
   interface and the individual drivers are using finegrained locks.
 
 - TAP-based virtio networking

Wanted to ask for a while: would it make sense to use vhost-net?
Or maybe use that exclusively?
Less hypervisor code to support would help the focus.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Pekka Enberg
On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg penb...@kernel.org wrote:
 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the
  following URL for test result details: https://gist.github.com/1026888

It turns out we were benchmarking the wrong guest kernel version for
qemu-kvm which is why it performed so much worse. Here's a summary of
qemu-kvm beating tools/kvm:

https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt

I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Asias He
On 06/16/2011 10:28 PM, Michael S. Tsirkin wrote:
 On Wed, Jun 15, 2011 at 06:53:34PM +0300, Pekka Enberg wrote:
 Hi all,

 We’re proud to announce the second version of the Native Linux KVM tool! 
 We’re
 now officially aiming for merging to mainline in 3.1.

 Highlights:

 - Experimental GUI support using SDL and VNC

 - SMP support. tools/kvm/ now has a highly scalable, largely lockless driver
   interface and the individual drivers are using finegrained locks.

 - TAP-based virtio networking
 
 Wanted to ask for a while: would it make sense to use vhost-net?
 Or maybe use that exclusively?
 Less hypervisor code to support would help the focus.
 

Sure. We are planning to use vhost-net. Just out of time right now, we
are currently working on simple user model network support which allows
plain user to use network without root privilege.

-- 
Best Regards,
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Anthony Liguori

On 06/16/2011 09:48 AM, Pekka Enberg wrote:

On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enbergpenb...@kernel.org  wrote:

- Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the
  following URL for test result details: https://gist.github.com/1026888


It turns out we were benchmarking the wrong guest kernel version for
qemu-kvm which is why it performed so much worse. Here's a summary of
qemu-kvm beating tools/kvm:

https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt

I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment.


np, it happens.

Is that still with QEMU with IDE emulation, cache=writethrough, and 
128MB of guest memory?


Does your raw driver support multiple parallel requests?  It doesn't 
look like it does from how I read the code.  At some point, I'd be happy 
to help ya'll do some benchmarking against QEMU.


It would be very useful to compare as we have some ugly things in QEMU 
that we've never quite been able to determine how much they affect 
performance.  Having an alternative implementation to benchmark against 
would be quite helpful.


Regards,

Anthony Liguori



 Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Sasha Levin
On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote:
 On 06/16/2011 09:48 AM, Pekka Enberg wrote:
  On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enbergpenb...@kernel.org  wrote:
  - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See 
  the
following URL for test result details: https://gist.github.com/1026888
 
  It turns out we were benchmarking the wrong guest kernel version for
  qemu-kvm which is why it performed so much worse. Here's a summary of
  qemu-kvm beating tools/kvm:
 
  https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt
 
  I'd ask for a brown paper bag if I wasn't so busy eating my hat at the 
  moment.
 
 np, it happens.
 
 Is that still with QEMU with IDE emulation, cache=writethrough, and 
 128MB of guest memory?
 
 Does your raw driver support multiple parallel requests?  It doesn't 
 look like it does from how I read the code.  At some point, I'd be happy 
 to help ya'll do some benchmarking against QEMU.
 

Each virtio-blk device can process requests regardless of other
virtio-blk devices, which means that we can do parallel requests for
devices.

Within each device, we support parallel requests in the sense that we do
vectored IO for each head (which may contain multiple blocks) in the
vring, we don't do multiple heads because when I've tried adding AIO
I've noticed that at most there are 2-3 possible heads - and since it
points to the same device it doesn't really help running them in
parallel.


 It would be very useful to compare as we have some ugly things in QEMU 
 that we've never quite been able to determine how much they affect 
 performance.  Having an alternative implementation to benchmark against 
 would be quite helpful.


-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Stefan Hajnoczi
On Fri, Jun 17, 2011 at 2:03 AM, Sasha Levin levinsasha...@gmail.com wrote:
 On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote:
 On 06/16/2011 09:48 AM, Pekka Enberg wrote:
  On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enbergpenb...@kernel.org  wrote:
  - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See 
  the
    following URL for test result details: https://gist.github.com/1026888
 
  It turns out we were benchmarking the wrong guest kernel version for
  qemu-kvm which is why it performed so much worse. Here's a summary of
  qemu-kvm beating tools/kvm:
 
  https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt
 
  I'd ask for a brown paper bag if I wasn't so busy eating my hat at the 
  moment.

 np, it happens.

 Is that still with QEMU with IDE emulation, cache=writethrough, and
 128MB of guest memory?

 Does your raw driver support multiple parallel requests?  It doesn't
 look like it does from how I read the code.  At some point, I'd be happy
 to help ya'll do some benchmarking against QEMU.


 Each virtio-blk device can process requests regardless of other
 virtio-blk devices, which means that we can do parallel requests for
 devices.

 Within each device, we support parallel requests in the sense that we do
 vectored IO for each head (which may contain multiple blocks) in the
 vring, we don't do multiple heads because when I've tried adding AIO
 I've noticed that at most there are 2-3 possible heads - and since it
 points to the same device it doesn't really help running them in
 parallel.

One thing that QEMU does but I'm a little suspicious of is request
merging.  virtio-blk will submit those 2-3 heads using
bdrv_aio_multiwrite() if they become available in the same virtqueue
notify.  The requests will be merged if possible.

My feeling is that we should already have merged requests coming
through virtio-blk and there should be no need to do any merging -
which could be a workaround for a poor virtio-blk vring configuration
that prevented the guest from sending large requests.  However, this
feature did yield performance improvements with qcow2 image files when
it was introduced, so that would be interesting to look at.

Are you enabling indirect descriptors on the virtio-blk vring?  That
should allow more requests to be made available because you don't run
out of vring descriptors so easily.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-16 Thread Stefan Hajnoczi
On Thu, Jun 16, 2011 at 3:48 PM, Pekka Enberg penb...@kernel.org wrote:
 On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg penb...@kernel.org wrote:
 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the
  following URL for test result details: https://gist.github.com/1026888

 It turns out we were benchmarking the wrong guest kernel version for
 qemu-kvm which is why it performed so much worse. Here's a summary of
 qemu-kvm beating tools/kvm:

 https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt

Thanks for digging into the results so quickly and rerunning.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Avi Kivity

On 06/15/2011 06:53 PM, Pekka Enberg wrote:

- Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the
   following URL for test result details: https://gist.github.com/1026888


This is surprising.  How is qemu invoked?

btw the dump above is a little hard to interpret.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Pekka Enberg
On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity a...@redhat.com wrote:
 On 06/15/2011 06:53 PM, Pekka Enberg wrote:

 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
 the
   following URL for test result details: https://gist.github.com/1026888

 This is surprising.  How is qemu invoked?

Prasad will have the details. Please note that the above are with Qemu
defaults which doesn't use virtio. The results with virtio are little
better but still in favor of tools/kvm.

 btw the dump above is a little hard to interpret.

It's what fio reports. The relevant bits are:


Qemu:

Run status group 0 (all jobs):
  READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s,
mint=2938msec, maxt=3349msec
 WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s,
mint=2292msec, maxt=2444msec

Run status group 1 (all jobs):
  READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s,
mint=3050msec, maxt=3315msec
 WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s,
mint=2485msec, maxt=2755msec

Run status group 2 (all jobs):
  READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s,
mint=30178msec, maxt=30467msec
 WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s,
mint=29682msec, maxt=30342msec

Run status group 3 (all jobs):
  READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s,
mint=28574msec, maxt=29279msec
 WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s,
mint=27121msec, maxt=28625msec

Disk stats (read/write):
 sdb: ios=60583/6652, merge=0/164, ticks=156340/672030,
in_queue=828230, util=82.71%

tools/kvm:

Run status group 0 (all jobs):
   READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s,
maxb=46030KB/s, mint=1139msec, maxt=1373msec
  WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s,
mint=804msec, maxt=891msec

Run status group 1 (all jobs):
   READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s,
maxb=57932KB/s, mint=905msec, maxt=1088msec
  WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s,
mint=570msec, maxt=760msec

Run status group 2 (all jobs):
   READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s,
mint=9830msec, maxt=10237msec
  WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s,
mint=9558msec, maxt=10071msec

Run status group 3 (all jobs):
   READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s,
mint=2923msec, maxt=3096msec
  WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s,
mint=2530msec, maxt=2605msec

Disk stats (read/write):
  vdb: ios=61002/6654, merge=0/183, ticks=27270/205780,
in_queue=232220, util=69.46%
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Prasad Joshi
On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg penb...@kernel.org wrote:
 On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity a...@redhat.com wrote:
 On 06/15/2011 06:53 PM, Pekka Enberg wrote:

 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
 the
   following URL for test result details: https://gist.github.com/1026888

 This is surprising.  How is qemu invoked?

 Prasad will have the details. Please note that the above are with Qemu
 defaults which doesn't use virtio. The results with virtio are little
 better but still in favor of tools/kvm.


The qcow2 image used for testing was copied on to /dev/shm to avoid
the disk delays in performance measurement.

QEMU was invoked with following parameters

$ qemu-system-x86_64 -hda disk image on hard disk -hdb
/dev/shm/test.qcow2 -m 1024M

FIO job file used for measuring the numbers was

prasad@prasad-vm:~$ cat fio-mixed.job
; fio-mixed.job for autotest

[global]
name=fio-sync
directory=/mnt
rw=randrw
rwmixread=67
rwmixwrite=33
bsrange=16K-256K
direct=0
end_fsync=1
verify=crc32
;ioscheduler=x
numjobs=4

[file1]
size=50M
ioengine=sync
mem=malloc

[file2]
stonewall
size=50M
ioengine=aio
mem=shm
iodepth=4

[file3]
stonewall
size=50M
ioengine=mmap
mem=mmap
direct=1

[file4]
stonewall
size=50M
ioengine=splice
mem=malloc
direct=1

- The test generates 16 file each of ~50MB, so in total ~800MB data was written.
- The test.qcow2 was newly created before it was used with QEMU or KVM tool
- The size of the QCOW2 image was 1.5GB.
- The host machine had 2GB RAM.
- The guest machine in both the cases was started with 1GB memory.

Thanks and Regards,
Prasad

 btw the dump above is a little hard to interpret.

 It's what fio reports. The relevant bits are:


 Qemu:

 Run status group 0 (all jobs):
  READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s,
 mint=2938msec, maxt=3349msec
  WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s,
 mint=2292msec, maxt=2444msec

 Run status group 1 (all jobs):
  READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s,
 mint=3050msec, maxt=3315msec
  WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s,
 mint=2485msec, maxt=2755msec

 Run status group 2 (all jobs):
  READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s,
 mint=30178msec, maxt=30467msec
  WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s,
 mint=29682msec, maxt=30342msec

 Run status group 3 (all jobs):
  READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s,
 mint=28574msec, maxt=29279msec
  WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s,
 mint=27121msec, maxt=28625msec

 Disk stats (read/write):
  sdb: ios=60583/6652, merge=0/164, ticks=156340/672030,
 in_queue=828230, util=82.71%

 tools/kvm:

 Run status group 0 (all jobs):
   READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s,
 maxb=46030KB/s, mint=1139msec, maxt=1373msec
  WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s,
 mint=804msec, maxt=891msec

 Run status group 1 (all jobs):
   READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s,
 maxb=57932KB/s, mint=905msec, maxt=1088msec
  WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s,
 mint=570msec, maxt=760msec

 Run status group 2 (all jobs):
   READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s,
 mint=9830msec, maxt=10237msec
  WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s,
 mint=9558msec, maxt=10071msec

 Run status group 3 (all jobs):
   READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s,
 mint=2923msec, maxt=3096msec
  WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s,
 mint=2530msec, maxt=2605msec

 Disk stats (read/write):
  vdb: ios=61002/6654, merge=0/183, ticks=27270/205780,
 in_queue=232220, util=69.46%

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Sasha Levin
On Wed, 2011-06-15 at 21:13 +0100, Prasad Joshi wrote:
 On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg penb...@kernel.org wrote:
  On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity a...@redhat.com wrote:
  On 06/15/2011 06:53 PM, Pekka Enberg wrote:
 
  - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
  the
following URL for test result details: https://gist.github.com/1026888
 
  This is surprising.  How is qemu invoked?
 
  Prasad will have the details. Please note that the above are with Qemu
  defaults which doesn't use virtio. The results with virtio are little
  better but still in favor of tools/kvm.
 
 
 The qcow2 image used for testing was copied on to /dev/shm to avoid
 the disk delays in performance measurement.
 
 QEMU was invoked with following parameters
 
 $ qemu-system-x86_64 -hda disk image on hard disk -hdb
 /dev/shm/test.qcow2 -m 1024M
 

Prasad, Could you please run this test with '-drive
file=/dev/shm/test.qcow2,if=virtio' instead of the '-hdb' thing?

 FIO job file used for measuring the numbers was
 
 prasad@prasad-vm:~$ cat fio-mixed.job
 ; fio-mixed.job for autotest
 
 [global]
 name=fio-sync
 directory=/mnt
 rw=randrw
 rwmixread=67
 rwmixwrite=33
 bsrange=16K-256K
 direct=0
 end_fsync=1
 verify=crc32
 ;ioscheduler=x
 numjobs=4
 
 [file1]
 size=50M
 ioengine=sync
 mem=malloc
 
 [file2]
 stonewall
 size=50M
 ioengine=aio
 mem=shm
 iodepth=4
 
 [file3]
 stonewall
 size=50M
 ioengine=mmap
 mem=mmap
 direct=1
 
 [file4]
 stonewall
 size=50M
 ioengine=splice
 mem=malloc
 direct=1
 
 - The test generates 16 file each of ~50MB, so in total ~800MB data was 
 written.
 - The test.qcow2 was newly created before it was used with QEMU or KVM tool
 - The size of the QCOW2 image was 1.5GB.
 - The host machine had 2GB RAM.
 - The guest machine in both the cases was started with 1GB memory.
 
 Thanks and Regards,
 Prasad
 
  btw the dump above is a little hard to interpret.
 
  It's what fio reports. The relevant bits are:
 
 
  Qemu:
 
  Run status group 0 (all jobs):
   READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s,
  mint=2938msec, maxt=3349msec
   WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s,
  mint=2292msec, maxt=2444msec
 
  Run status group 1 (all jobs):
   READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s,
  mint=3050msec, maxt=3315msec
   WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s,
  mint=2485msec, maxt=2755msec
 
  Run status group 2 (all jobs):
   READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s,
  mint=30178msec, maxt=30467msec
   WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s,
  mint=29682msec, maxt=30342msec
 
  Run status group 3 (all jobs):
   READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s,
  mint=28574msec, maxt=29279msec
   WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s,
  mint=27121msec, maxt=28625msec
 
  Disk stats (read/write):
   sdb: ios=60583/6652, merge=0/164, ticks=156340/672030,
  in_queue=828230, util=82.71%
 
  tools/kvm:
 
  Run status group 0 (all jobs):
READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s,
  maxb=46030KB/s, mint=1139msec, maxt=1373msec
   WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s,
  mint=804msec, maxt=891msec
 
  Run status group 1 (all jobs):
READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s,
  maxb=57932KB/s, mint=905msec, maxt=1088msec
   WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s,
  mint=570msec, maxt=760msec
 
  Run status group 2 (all jobs):
READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s,
  mint=9830msec, maxt=10237msec
   WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s,
  mint=9558msec, maxt=10071msec
 
  Run status group 3 (all jobs):
READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s,
  mint=2923msec, maxt=3096msec
   WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s,
  mint=2530msec, maxt=2605msec
 
  Disk stats (read/write):
   vdb: ios=61002/6654, merge=0/183, ticks=27270/205780,
  in_queue=232220, util=69.46%
 

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Prasad Joshi
On Wed, Jun 15, 2011 at 9:23 PM, Sasha Levin levinsasha...@gmail.com wrote:
 On Wed, 2011-06-15 at 21:13 +0100, Prasad Joshi wrote:
 On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg penb...@kernel.org wrote:
  On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity a...@redhat.com wrote:
  On 06/15/2011 06:53 PM, Pekka Enberg wrote:
 
  - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
  the
    following URL for test result details: https://gist.github.com/1026888
 
  This is surprising.  How is qemu invoked?
 
  Prasad will have the details. Please note that the above are with Qemu
  defaults which doesn't use virtio. The results with virtio are little
  better but still in favor of tools/kvm.
 

 The qcow2 image used for testing was copied on to /dev/shm to avoid
 the disk delays in performance measurement.

 QEMU was invoked with following parameters

 $ qemu-system-x86_64 -hda disk image on hard disk -hdb
 /dev/shm/test.qcow2 -m 1024M


 Prasad, Could you please run this test with '-drive
 file=/dev/shm/test.qcow2,if=virtio' instead of the '-hdb' thing?


Infact I have already tried that. Like Pekka mentioned, the results
are still in favour of KVM tools.

I machine that I work on is not with me at the moment, I will be able
to mail the exact numbers tomorrow.

Thanks and Regards,
Prasad

 FIO job file used for measuring the numbers was

 prasad@prasad-vm:~$ cat fio-mixed.job
 ; fio-mixed.job for autotest

 [global]
 name=fio-sync
 directory=/mnt
 rw=randrw
 rwmixread=67
 rwmixwrite=33
 bsrange=16K-256K
 direct=0
 end_fsync=1
 verify=crc32
 ;ioscheduler=x
 numjobs=4

 [file1]
 size=50M
 ioengine=sync
 mem=malloc

 [file2]
 stonewall
 size=50M
 ioengine=aio
 mem=shm
 iodepth=4

 [file3]
 stonewall
 size=50M
 ioengine=mmap
 mem=mmap
 direct=1

 [file4]
 stonewall
 size=50M
 ioengine=splice
 mem=malloc
 direct=1

 - The test generates 16 file each of ~50MB, so in total ~800MB data was 
 written.
 - The test.qcow2 was newly created before it was used with QEMU or KVM tool
 - The size of the QCOW2 image was 1.5GB.
 - The host machine had 2GB RAM.
 - The guest machine in both the cases was started with 1GB memory.

 Thanks and Regards,
 Prasad

  btw the dump above is a little hard to interpret.
 
  It's what fio reports. The relevant bits are:
 
 
  Qemu:
 
  Run status group 0 (all jobs):
   READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s,
  mint=2938msec, maxt=3349msec
   WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s,
  mint=2292msec, maxt=2444msec
 
  Run status group 1 (all jobs):
   READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s,
  mint=3050msec, maxt=3315msec
   WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s,
  mint=2485msec, maxt=2755msec
 
  Run status group 2 (all jobs):
   READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s,
  mint=30178msec, maxt=30467msec
   WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s,
  mint=29682msec, maxt=30342msec
 
  Run status group 3 (all jobs):
   READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s,
  mint=28574msec, maxt=29279msec
   WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s,
  mint=27121msec, maxt=28625msec
 
  Disk stats (read/write):
   sdb: ios=60583/6652, merge=0/164, ticks=156340/672030,
  in_queue=828230, util=82.71%
 
  tools/kvm:
 
  Run status group 0 (all jobs):
    READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s,
  maxb=46030KB/s, mint=1139msec, maxt=1373msec
   WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s,
  mint=804msec, maxt=891msec
 
  Run status group 1 (all jobs):
    READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s,
  maxb=57932KB/s, mint=905msec, maxt=1088msec
   WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s,
  mint=570msec, maxt=760msec
 
  Run status group 2 (all jobs):
    READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s,
  mint=9830msec, maxt=10237msec
   WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s,
  mint=9558msec, maxt=10071msec
 
  Run status group 3 (all jobs):
    READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s,
  mint=2923msec, maxt=3096msec
   WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s,
  mint=2530msec, maxt=2605msec
 
  Disk stats (read/write):
   vdb: ios=61002/6654, merge=0/183, ticks=27270/205780,
  in_queue=232220, util=69.46%
 

 --

 Sasha.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Anthony Liguori

On 06/15/2011 10:53 AM, Pekka Enberg wrote:

Hi all,

We’re proud to announce the second version of the Native Linux KVM tool! We’re
now officially aiming for merging to mainline in 3.1.

Highlights:

- Experimental GUI support using SDL and VNC

- SMP support. tools/kvm/ now has a highly scalable, largely lockless driver
   interface and the individual drivers are using finegrained locks.

- TAP-based virtio networking

- Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the
   following URL for test result details: https://gist.github.com/1026888


What was the commit hash for the QEMU you tested?

The following caused a major regression in qcow2:


commit a16c53b101a9897b0b2be96a1bb3bde7c04380f2
Author: Anthony Liguori aligu...@us.ibm.com
Date:   Mon Jun 6 08:25:06 2011 -0500

Fix regression introduced by -machine accel=

Commit 85097db6 changed the timing when kvm_allowed is set until after
kvm is initialized.  During initialization, the ioeventfd 
initialization cod
checks kvm_enabled() and after this change, ioeventfd is 
effectively disable


If it's not in your tree, it would be useful to rerun the test with the 
latest git.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Anthony Liguori

On 06/15/2011 03:13 PM, Prasad Joshi wrote:

On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enbergpenb...@kernel.org  wrote:

On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivitya...@redhat.com  wrote:

On 06/15/2011 06:53 PM, Pekka Enberg wrote:


- Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
the
   following URL for test result details: https://gist.github.com/1026888


This is surprising.  How is qemu invoked?


Prasad will have the details. Please note that the above are with Qemu
defaults which doesn't use virtio. The results with virtio are little
better but still in favor of tools/kvm.



The qcow2 image used for testing was copied on to /dev/shm to avoid
the disk delays in performance measurement.


Our experience has been that this is actually not a great way to 
simulate fast storage.


Spindle based storage has very different characteristics than memory as 
there is a significant cost for seeking.


-hdb uses IDE too.  That's pretty unfair since IDE is limited to a 
single request at a time whereas virtio can support multiple requests 
(and native kvm tools is using virtio).


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Alexander Graf

On 16.06.2011, at 00:04, Anthony Liguori wrote:

 On 06/15/2011 03:13 PM, Prasad Joshi wrote:
 On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enbergpenb...@kernel.org  wrote:
 On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivitya...@redhat.com  wrote:
 On 06/15/2011 06:53 PM, Pekka Enberg wrote:
 
 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
 the
   following URL for test result details: https://gist.github.com/1026888
 
 This is surprising.  How is qemu invoked?
 
 Prasad will have the details. Please note that the above are with Qemu
 defaults which doesn't use virtio. The results with virtio are little
 better but still in favor of tools/kvm.
 
 
 The qcow2 image used for testing was copied on to /dev/shm to avoid
 the disk delays in performance measurement.
 
 QEMU was invoked with following parameters
 
 $ qemu-system-x86_64 -hdadisk image on hard disk  -hdb
 /dev/shm/test.qcow2 -m 1024M
 
 Looking more closely at native KVM tools, you would need to use the following 
 invocation to have an apples-to-apples comparison:
 
 qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio

Wouldn't this still be using threaded AIO mode? I thought KVM tools used native 
AIO?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Anthony Liguori

On 06/15/2011 05:07 PM, Alexander Graf wrote:


On 16.06.2011, at 00:04, Anthony Liguori wrote:


On 06/15/2011 03:13 PM, Prasad Joshi wrote:

On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enbergpenb...@kernel.org   wrote:

On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivitya...@redhat.com   wrote:

On 06/15/2011 06:53 PM, Pekka Enberg wrote:


- Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See
the
   following URL for test result details: https://gist.github.com/1026888


This is surprising.  How is qemu invoked?


Prasad will have the details. Please note that the above are with Qemu
defaults which doesn't use virtio. The results with virtio are little
better but still in favor of tools/kvm.



The qcow2 image used for testing was copied on to /dev/shm to avoid
the disk delays in performance measurement.

QEMU was invoked with following parameters

$ qemu-system-x86_64 -hdadisk image on hard disk   -hdb
/dev/shm/test.qcow2 -m 1024M


Looking more closely at native KVM tools, you would need to use the following 
invocation to have an apples-to-apples comparison:

qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio


Wouldn't this still be using threaded AIO mode? I thought KVM tools used native 
AIO?


Nope.  The relevant code is:


/* blk device ?*/
disk= blkdev__probe(filename, st);
if (disk)
return disk;

fd  = open(filename, readonly ? O_RDONLY : O_RDWR);
if (fd  0)
return NULL;

/* qcow image ?*/
disk= qcow_probe(fd, readonly);
if (disk)
return disk;

/* raw image ?*/
disk= raw_image__probe(fd, st, readonly);
if (disk)
return disk;


It uses a synchronous I/O model similar to qcow2 in QEMU with what I 
assume is a global lock that's outside of the actual implementation.


I think it lacks some of the caching that Kevin's added recently though 
so I assume that if QEMU was run with cache=writeback, it would probably 
do quite a bit better than native KVM tool.


It also turns out that while they have the infrastructure to deal with 
FLUSH, they don't implement it for qcow2 :-/


So even if the guest does an fsync(), it native KVM tool will never 
actually sync the data to disk...


That's probably why it's fast, it doesn't preserve data integrity :(

Regards,

Anthony Liguori




Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Anthony Liguori

On 06/15/2011 05:20 PM, Anthony Liguori wrote:

On 06/15/2011 05:07 PM, Alexander Graf wrote:


On 16.06.2011, at 00:04, Anthony Liguori wrote:


On 06/15/2011 03:13 PM, Prasad Joshi wrote:

On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enbergpenb...@kernel.org
wrote:

On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivitya...@redhat.com wrote:

On 06/15/2011 06:53 PM, Pekka Enberg wrote:


- Fast QCOW2 image read-write support beating Qemu in fio
benchmarks. See
the
following URL for test result details:
https://gist.github.com/1026888


This is surprising. How is qemu invoked?


Prasad will have the details. Please note that the above are with Qemu
defaults which doesn't use virtio. The results with virtio are little
better but still in favor of tools/kvm.



The qcow2 image used for testing was copied on to /dev/shm to avoid
the disk delays in performance measurement.

QEMU was invoked with following parameters

$ qemu-system-x86_64 -hdadisk image on hard disk -hdb
/dev/shm/test.qcow2 -m 1024M


Looking more closely at native KVM tools, you would need to use the
following invocation to have an apples-to-apples comparison:

qemu-system-x86_64 -drive
file=/dev/shm/test.qcow2,cache=writeback,if=virtio


Wouldn't this still be using threaded AIO mode? I thought KVM tools
used native AIO?


Nope. The relevant code is:


/* blk device ?*/
disk = blkdev__probe(filename, st);
if (disk)
return disk;

fd = open(filename, readonly ? O_RDONLY : O_RDWR);
if (fd  0)
return NULL;

/* qcow image ?*/
disk = qcow_probe(fd, readonly);
if (disk)
return disk;

/* raw image ?*/
disk = raw_image__probe(fd, st, readonly);
if (disk)
return disk;


It uses a synchronous I/O model similar to qcow2 in QEMU with what I
assume is a global lock that's outside of the actual implementation.

I think it lacks some of the caching that Kevin's added recently though
so I assume that if QEMU was run with cache=writeback, it would probably
do quite a bit better than native KVM tool.

It also turns out that while they have the infrastructure to deal with
FLUSH, they don't implement it for qcow2 :-/

So even if the guest does an fsync(), it native KVM tool will never
actually sync the data to disk...

That's probably why it's fast, it doesn't preserve data integrity :(


Actually, I misread the code.  It does unstable writes but it does do 
fsync() on FLUSH.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 11:04 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 06/15/2011 03:13 PM, Prasad Joshi wrote:

 On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enbergpenb...@kernel.org  wrote:

 On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivitya...@redhat.com  wrote:

 On 06/15/2011 06:53 PM, Pekka Enberg wrote:

 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks.
 See
 the
   following URL for test result details:
 https://gist.github.com/1026888

 This is surprising.  How is qemu invoked?

 Prasad will have the details. Please note that the above are with Qemu
 defaults which doesn't use virtio. The results with virtio are little
 better but still in favor of tools/kvm.


 The qcow2 image used for testing was copied on to /dev/shm to avoid
 the disk delays in performance measurement.

 QEMU was invoked with following parameters

 $ qemu-system-x86_64 -hdadisk image on hard disk  -hdb
 /dev/shm/test.qcow2 -m 1024M

 Looking more closely at native KVM tools, you would need to use the
 following invocation to have an apples-to-apples comparison:

 qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio

In addition to this it is important to set identical guest RAM sizes
(QEMU's -m ram_mb) option.

If you are comparing with qemu.git rather than qemu-kvm.git then you
need to ./configure --enable-io-thread and launch with QEMU's
-enable-kvm option.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Pekka Enberg
On Thu, Jun 16, 2011 at 1:44 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 That's probably why it's fast, it doesn't preserve data integrity :(

 Actually, I misread the code.  It does unstable writes but it does do
 fsync() on FLUSH.

Yes. That's fine, right? Or did we misread how virtio block devices
are supposed to work?

Btw, unstable writes doesn't really explain why *read* performance is better.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Pekka Enberg
On Thu, Jun 16, 2011 at 8:29 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 11:04 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
 On 06/15/2011 03:13 PM, Prasad Joshi wrote:

 On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enbergpenb...@kernel.org  wrote:

 On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivitya...@redhat.com  wrote:

 On 06/15/2011 06:53 PM, Pekka Enberg wrote:

 - Fast QCOW2 image read-write support beating Qemu in fio benchmarks.
 See
 the
   following URL for test result details:
 https://gist.github.com/1026888

 This is surprising.  How is qemu invoked?

 Prasad will have the details. Please note that the above are with Qemu
 defaults which doesn't use virtio. The results with virtio are little
 better but still in favor of tools/kvm.


 The qcow2 image used for testing was copied on to /dev/shm to avoid
 the disk delays in performance measurement.

 QEMU was invoked with following parameters

 $ qemu-system-x86_64 -hdadisk image on hard disk  -hdb
 /dev/shm/test.qcow2 -m 1024M

 Looking more closely at native KVM tools, you would need to use the
 following invocation to have an apples-to-apples comparison:

 qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio

 In addition to this it is important to set identical guest RAM sizes
 (QEMU's -m ram_mb) option.

Right.

 If you are comparing with qemu.git rather than qemu-kvm.git then you
 need to ./configure --enable-io-thread and launch with QEMU's
 -enable-kvm option.

I think Prasad is testing qemu-kvm.git.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Native Linux KVM tool v2

2011-06-15 Thread Pekka Enberg
On Thu, Jun 16, 2011 at 1:07 AM, Alexander Graf ag...@suse.de wrote:
 qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio

 Wouldn't this still be using threaded AIO mode? I thought KVM tools used 
 native AIO?

We don't use AIO at all. It's just normal read()/write() with a thread
pool. I actually looked at AIO but didn't really see why we'd want to
use it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html