[Qemu-devel] virtio-blk throughput

2012-02-11 Thread Prateek Sharma
Hello everyone,
   I am testing virtio-blk throughput (single thread, using hdparm -tT
). I want to know what is the guest/baremetal sequential read ratio
with the current qemu / qemu-kvm builds.
I have tried with qemu 1.0, 0.15, 0.14.1 , but my guest throughput is
limited to 65 MB/s. Baremetal is at 112 MB/s .

Here is my VM configuration

$QEMU  -cpu core2duo,+vmx  -drive file=$VM_PATH,if=virtio,aio=native
-drive file=viotest.img,if=virtio,index=2 -net tap -net
nic,macaddr=00:01:02:03:04:$VNC_OFFSET -kernel $KERNEL_IMAGE -append
'root=/dev/vda1 rw memmap=10M$100M kmemcheck=on' -m 1000 -vnc
0.0.0.0:$VNC_OFFSET

Is there something wrong in the configuration ? I was hoping
virtio-blk would give ~75% baremetal throughput atleast for large
streaming reads.

Thanks,
Prateek



Re: [Qemu-devel] virtio-blk throughput

2012-02-13 Thread Stefan Hajnoczi
On Sat, Feb 11, 2012 at 9:57 AM, Prateek Sharma  wrote:
> $QEMU  -cpu core2duo,+vmx  -drive file=$VM_PATH,if=virtio,aio=native
> -drive file=viotest.img,if=virtio,index=2

-drive cache=none is typically used for good performance when the
image is on a local disk.  Try that and I think you'll see an
improvement.

Stefan



Re: [Qemu-devel] virtio-blk throughput

2012-02-13 Thread Prateek Sharma
On Mon, Feb 13, 2012 at 4:53 PM, Stefan Hajnoczi  wrote:
> On Sat, Feb 11, 2012 at 9:57 AM, Prateek Sharma  wrote:
>> $QEMU  -cpu core2duo,+vmx  -drive file=$VM_PATH,if=virtio,aio=native
>> -drive file=viotest.img,if=virtio,index=2
>
> -drive cache=none is typically used for good performance when the
> image is on a local disk.  Try that and I think you'll see an
> improvement.
>
> Stefan

Hi Stefan,
I did try setting cache=none in one of the runs, and saw a small
performance *drop* for sequential reads. Could it be because of the
host page-cache read-ahead and other factors?
In any case, i just wanted to know what the current qemu
virtio-blk numbers are, and whether i have misconfigured things badly.
What is the "fastest" way to do IO in qemu? virtio-blk, vhost-blk,
virtio-dataplane, something else?
Thanks!



Re: [Qemu-devel] virtio-blk throughput

2012-02-13 Thread Anthony Liguori

On 02/13/2012 05:23 AM, Stefan Hajnoczi wrote:

On Sat, Feb 11, 2012 at 9:57 AM, Prateek Sharma  wrote:

$QEMU  -cpu core2duo,+vmx  -drive file=$VM_PATH,if=virtio,aio=native
-drive file=viotest.img,if=virtio,index=2


-drive cache=none is typically used for good performance when the
image is on a local disk.  Try that and I think you'll see an
improvement.


We should throw a bug on aio=native, cache != none.

linux-aio blocks on io_submit if the caching mode isn't O_DIRECT and that will 
kill performance.


Regards,

Anthony Liguori



Stefan






Re: [Qemu-devel] virtio-blk throughput

2012-02-13 Thread Stefan Hajnoczi
On Mon, Feb 13, 2012 at 11:39 AM, Prateek Sharma  wrote:
> On Mon, Feb 13, 2012 at 4:53 PM, Stefan Hajnoczi  wrote:
>> On Sat, Feb 11, 2012 at 9:57 AM, Prateek Sharma  
>> wrote:
>>> $QEMU  -cpu core2duo,+vmx  -drive file=$VM_PATH,if=virtio,aio=native
>>> -drive file=viotest.img,if=virtio,index=2
>>
>> -drive cache=none is typically used for good performance when the
>> image is on a local disk.  Try that and I think you'll see an
>> improvement.
>>
>> Stefan
>
> Hi Stefan,
>    I did try setting cache=none in one of the runs, and saw a small
> performance *drop* for sequential reads. Could it be because of the
> host page-cache read-ahead and other factors?
>    In any case, i just wanted to know what the current qemu
> virtio-blk numbers are, and whether i have misconfigured things badly.
>    What is the "fastest" way to do IO in qemu? virtio-blk, vhost-blk,
> virtio-dataplane, something else?

The fastest support way on local disks tends to be
if=virtio,cache=none,aio=native.

You are right that a pure read benchmark will "benefit" from
read-ahead.  cache=none helps for writes (compared to the default
cache=writethrough) and has less complicated performance behavior when
there is a lot of I/O going on (because it bypasses the page cache).

It would be interesting to compare the block I/O requests during a
bare metal run with your guest run.  Normally they should be identical
for the benchmark to be fair.  I'm not sure whether the I/O request
pattern is identical in your case (I haven't looked what hdparm -tT
does exactly).

Stefan