I create the attached disk with the following commands. The "qemu-img info" is to double check the cluster size. I am running the same experiments now with "--cache none" attached to the "virsh attach-disk". Is this sufficient to avoid the host page cache?
qemu-img create -f qcow2 ${dir}/${disk}.qcow2 ${size} -o preallocation=full,cluster_size=${cs} qemu-img info ${dir}/${disk}.qcow2 virsh attach-disk --domain ${domain} ${dir}/${disk}.qcow2 --target ${disk} --persistent --config --live On Wed, Aug 26, 2020 at 9:18 AM Kevin Wolf <kw...@redhat.com> wrote: > Am 26.08.2020 um 02:46 hat Yoonho Park geschrieben: > > I have been measuring the performance of qcow2 overlays, and I am hoping > to > > get some help in understanding the data I collected. In my experiments, I > > created a VM and attached a 16G qcow2 disk to it using "qemu-img create" > > and "virsh attach-disk". I use fio to fill it. I create some number of > > snapshots (overlays) using "virsh snapshot-create-as". To mimic user > > activity between taking snapshots, I use fio to randomly write to 10% of > > each overlay right after I create it. After creating the overlays, I use > > fio to measure random read performance and random write performance with > 2 > > different blocks sizes, 4K and 64K. 64K is the qcow2 cluster size used by > > the 16G qcow2 disk and the overlays (verified with "qemu-img info"). fio > is > > using the attached disk as a block device to avoid as much file system > > overhead as possible. The VM, 16G disk, and snapshots (overlays) all > reside > > on local disk. Below are the measurements I collected for up to 5 > overlays. > > > > > > 4K blocks 64K blocks > > > > olays rd bw rd iops wr bw wr iops rd bw rd iops wr bw wr iops > > > > 0 4510 1127 438028 109507 67854 1060 521808 8153 > > > > 1 4692 1173 2924 731 66801 1043 104297 1629 > > > > 2 4524 1131 2781 695 66801 1043 104297 1629 > > > > 3 4573 1143 3034 758 65500 1023 95627 1494 > > > > 4 4556 1139 2971 742 67973 1062 108099 1689 > > > > 5 4471 1117 2937 734 66615 1040 98472 1538 > > > > > > Read performance is not affected by overlays. However, write performance > > drops even with a single overlay. My understanding is that writing 4K > > blocks requires a read-modify-write because you must fetch a complete > > cluster from deeper in the overlay chain before writing to the active > > overlay. However, this does not explain the drop in performance when > > writing 64K blocks. The performance drop is not as significant, but if > the > > write block size matches the cluster size then it seems that there should > > not be any performance drop because the write can go directly to the > active > > overlay. > > Can you share the QEMU command line you used? > > As you say, it is expected that layer 0 is a bit faster, however not to > this degree. My guess would be that you use the default cache mode > (which includes cache.direct=off), so your results are skewed because > the first requests will only write to memory (the host page cache) and > only later requests will actually hit the disk. > > For benchmarking, you should always use cache.direct=on (or an alias > that contains it, such as cache=none). > > > Another issue I hit is that I cannot set or change the cluster size of > > overlays. Is this possible with "virsh snapshot-create-as"? > > That's a libvirt question. Peter, can you help? > > > I am using qemu-system-x86_64 version 4.2.0 and virsh version 6.0.0. > > > > > > Thank you for any insights or advice you have. > > Kevin > >