I think I have found the reason:
There's a cache in qemu that accelerates the transform of virtual LBA to 
cluster offset of qcow2 image.
The cache has a fixed size of 16x8192=128k in my configuration, which 
corresponds to a 8GB (128K*64KB)
mapping size. So when the "working set" of fio exceeds 8GB, the transform wil 
be degraded to reading
external table, and the performances goes extremely low.








At 2014-06-23 11:22:37, "Fam Zheng" <f...@redhat.com> wrote:
>Cc'ing more qcow2 experts.
>
>On Mon, 06/23 11:14, lihuiba wrote:
>> >Did you prefill the image? Amplification could come from cluster allocation.
>> Yes! 
>> I forgot to mention that I created the qcow2 image with 
>> 'preallocation=metadata', and I have allocated
>> the data blocks with dd in VM.
>> 
>> 
>> Creating image in host:
>> qemu-img create -f qcow2 -preallocation=metadata test.qcow2 100G
>> 
>> 
>> Allocating the blocks in VM:
>> dd if=/dev/zero of=/dev/vdb bs=1M
>> where vdb is the target image.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> At 2014-06-23 11:01:20, "Fam Zheng" <f...@redhat.com> wrote:
>> >On Mon, 06/23 10:06, lihuiba wrote:
>> >> Hi, all
>> >> 
>> >> 
>> >> I'm using a qcow2 image stored on a SSD RAID1 (2 x intel S3500), and I'm 
>> >> benchmarking the
>> >> system using fio. Although the throughput in VM (with KVM and virtio 
>> >> enabled) is acceptable (67%
>> >> of thoughtput in host), the IOPS performance seems is extremely low ---- 
>> >> only 2% of IOPS in host.
>> >> 
>> >> 
>> >> I was initially using qemu-1.1.2, and I also tried qemu-1.7.1 for 
>> >> comparison. There was no significant
>> >> difference.
>> >> 
>> >> 
>> >> In contrast, raw image and LVM perform very well. They usually achieve 
>> >> 90%+ of throughput and
>> >> 60%+ of IOPS. So the problem must lie in the QCOW2 image format.
>> >> 
>> >> 
>> >> And I observed that, when I perform 4KB IOPS benchmark in VM with a QCOW2 
>> >> image, fio in VM reports
>> >> it is reading 9.x MB/s, while iostat in host reports the SSD is being 
>> >> read 150+ MB/s. So QEMU or QCOW2
>> >> must have amplified the amount of read by nearly 16 times.
>> >> 
>> >> 
>> >> So, how can I fix or tune the performance issue of qcow2?
>> >
>> >Did you prefill the image? Amplification could come from cluster allocation.
>> >
>> >Fam
>> >
>> >> 
>> >> 
>> >> Thanks!
>> >> 
>> >> 
>> >> 
>> >> 
>> >> PS:
>> >> 1. qemu parameters:
>> >> -enable-kvm -cpu qemu64 -rtc base=utc,clock=host,driftfix=none -usb 
>> >> -device usb-tablet -nodefaults -nodefconfig -no-kvm-pit-reinjection 
>> >> -global kvm-pit.lost_tick_policy=discard -machine pc,accel=kvm -vga std 
>> >> -k en-us -smp 8 -m 4096 -boot order=cdn -vnc :1 -drive 
>> >> file=$1,if=none,id=drive_0,cache=none,aio=native -device 
>> >> virtio-blk-pci,drive=drive_0,bus=pci.0,addr=0x5 -drive 
>> >> file=$2,if=none,id=drive_2,cache=none,aio=native -device 
>> >> virtio-blk-pci,drive=drive_2,bus=pci.0,addr=0x7
>> >> 
>> >> 
>> >> 2. fio parameters for IOPS:
>> >> fio --filename=/dev/vdb --direct=1 --ioengine=libaio --iodepth 32 
>> >> --thread --numjobs=1 --rw=randread --bs=4k --size=100% --runtime=60s 
>> >> --group_reporting --name=test
>> >> 
>> >> 
>> >> 3. fio parameters for throughput:
>> >> fio --filename=/dev/vdb--direct=1 --ioengine=psync --thread --numjobs=3 
>> >> --rw=randread --bs=1024k --size=100% --runtime=60s --name=randread 
>> >> --group_reporting -name=test
>> >> 
>> >> 
>> >> 
>> >
>

Reply via email to