I create the attached disk with the following commands. The "qemu-img info"
is to double check the cluster size. I am running the same experiments now
with "--cache none" attached to the "virsh attach-disk". Is this sufficient
to avoid the host page cache?

qemu-img create -f qcow2 ${dir}/${disk}.qcow2 ${size} -o
preallocation=full,cluster_size=${cs}
qemu-img info ${dir}/${disk}.qcow2
virsh attach-disk --domain ${domain} ${dir}/${disk}.qcow2 --target ${disk}
--persistent --config --live

On Wed, Aug 26, 2020 at 9:18 AM Kevin Wolf <kw...@redhat.com> wrote:

> Am 26.08.2020 um 02:46 hat Yoonho Park geschrieben:
> > I have been measuring the performance of qcow2 overlays, and I am hoping
> to
> > get some help in understanding the data I collected. In my experiments, I
> > created a VM and attached a 16G qcow2 disk to it using "qemu-img create"
> > and "virsh attach-disk". I use fio to fill it. I create some number of
> > snapshots (overlays) using "virsh snapshot-create-as". To mimic user
> > activity between taking snapshots, I use fio to randomly write to 10% of
> > each overlay right after I create it. After creating the overlays, I use
> > fio to measure random read performance and random write performance with
> 2
> > different blocks sizes, 4K and 64K. 64K is the qcow2 cluster size used by
> > the 16G qcow2 disk and the overlays (verified with "qemu-img info"). fio
> is
> > using the attached disk as a block device to avoid as much file system
> > overhead as possible. The VM, 16G disk, and snapshots (overlays) all
> reside
> > on local disk. Below are the measurements I collected for up to 5
> overlays.
> >
> >
> >           4K blocks                64K blocks
> >
> > olays rd bw rd iops wr bw  wr iops rd bw rd iops wr bw  wr iops
> >
> > 0     4510  1127    438028 109507  67854 1060    521808 8153
> >
> > 1     4692  1173    2924   731     66801 1043    104297 1629
> >
> > 2     4524  1131    2781   695     66801 1043    104297 1629
> >
> > 3     4573  1143    3034   758     65500 1023    95627  1494
> >
> > 4     4556  1139    2971   742     67973 1062    108099 1689
> >
> > 5     4471  1117    2937   734     66615 1040    98472  1538
> >
> >
> > Read performance is not affected by overlays. However, write performance
> > drops even with a single overlay. My understanding is that writing 4K
> > blocks requires a read-modify-write because you must fetch a complete
> > cluster from deeper in the overlay chain before writing to the active
> > overlay. However, this does not explain the drop in performance when
> > writing 64K blocks. The performance drop is not as significant, but if
> the
> > write block size matches the cluster size then it seems that there should
> > not be any performance drop because the write can go directly to the
> active
> > overlay.
>
> Can you share the QEMU command line you used?
>
> As you say, it is expected that layer 0 is a bit faster, however not to
> this degree. My guess would be that you use the default cache mode
> (which includes cache.direct=off), so your results are skewed because
> the first requests will only write to memory (the host page cache) and
> only later requests will actually hit the disk.
>
> For benchmarking, you should always use cache.direct=on (or an alias
> that contains it, such as cache=none).
>
> > Another issue I hit is that I cannot set or change the cluster size of
> > overlays. Is this possible with "virsh snapshot-create-as"?
>
> That's a libvirt question. Peter, can you help?
>
> > I am using qemu-system-x86_64 version 4.2.0 and virsh version 6.0.0.
> >
> >
> > Thank you for any insights or advice you have.
>
> Kevin
>
>

Reply via email to