Below is the data with the cache disabled ("virsh attach-disk ... --cache
none"). I added the previous data for reference. Overall, random read
performance was not affected significantly. This makes sense because a
cache is probably not going to help random read performance much. BTW how
big the cache is by default? Random write performance for 4K blocks seems
more "sane" now. Random write performance for 64K blocks is interesting
because base image (0 overlay) performance is 2X slower than 1-5 overlays.
We believe this is because the random writes to an overlay actually turn
into sequential writes (appends to the overlay). Does this make sense?


NO CACHE

      4K blocks                    64K blocks

olays rd bw rd iops wr bw  wr iops rd bw rd iops wr bw  wr iops

0     4478  1119    4684   1171    57001 890     42050  657

1     4490  1122    2503   625     56656 885     93483  1460

2     4385  1096    2425   606     56055 875     94445  1475

3     4334  1083    2307   576     55422 865     95826  1497

4     4356  1089    2168   542     56070 876     95957  1499

5     4234  1058    2308   577     54039 844     92936  1452


DEFAULT CACHE (WRITEBACK)

      4K blocks                    64K blocks

olays rd bw rd iops wr bw  wr iops rd bw rd iops wr bw  wr iops

0     4510  1127    438028 109507  67854 1060    521808 8153

1     4692  1173    2924   731     66801 1043    104297 1629

2     4524  1131    2781   695     66801 1043    104297 1629

3     4573  1143    3034   758     65500 1023    95627  1494

4     4556  1139    2971   742     67973 1062    108099 1689

5     4471  1117    2937   734     66615 1040    98472  1538

On Wed, Aug 26, 2020 at 9:18 AM Kevin Wolf <kw...@redhat.com> wrote:

> Am 26.08.2020 um 02:46 hat Yoonho Park geschrieben:
> > I have been measuring the performance of qcow2 overlays, and I am hoping
> to
> > get some help in understanding the data I collected. In my experiments, I
> > created a VM and attached a 16G qcow2 disk to it using "qemu-img create"
> > and "virsh attach-disk". I use fio to fill it. I create some number of
> > snapshots (overlays) using "virsh snapshot-create-as". To mimic user
> > activity between taking snapshots, I use fio to randomly write to 10% of
> > each overlay right after I create it. After creating the overlays, I use
> > fio to measure random read performance and random write performance with
> 2
> > different blocks sizes, 4K and 64K. 64K is the qcow2 cluster size used by
> > the 16G qcow2 disk and the overlays (verified with "qemu-img info"). fio
> is
> > using the attached disk as a block device to avoid as much file system
> > overhead as possible. The VM, 16G disk, and snapshots (overlays) all
> reside
> > on local disk. Below are the measurements I collected for up to 5
> overlays.
> >
> >
> >           4K blocks                64K blocks
> >
> > olays rd bw rd iops wr bw  wr iops rd bw rd iops wr bw  wr iops
> >
> > 0     4510  1127    438028 109507  67854 1060    521808 8153
> >
> > 1     4692  1173    2924   731     66801 1043    104297 1629
> >
> > 2     4524  1131    2781   695     66801 1043    104297 1629
> >
> > 3     4573  1143    3034   758     65500 1023    95627  1494
> >
> > 4     4556  1139    2971   742     67973 1062    108099 1689
> >
> > 5     4471  1117    2937   734     66615 1040    98472  1538
> >
> >
> > Read performance is not affected by overlays. However, write performance
> > drops even with a single overlay. My understanding is that writing 4K
> > blocks requires a read-modify-write because you must fetch a complete
> > cluster from deeper in the overlay chain before writing to the active
> > overlay. However, this does not explain the drop in performance when
> > writing 64K blocks. The performance drop is not as significant, but if
> the
> > write block size matches the cluster size then it seems that there should
> > not be any performance drop because the write can go directly to the
> active
> > overlay.
>
> Can you share the QEMU command line you used?
>
> As you say, it is expected that layer 0 is a bit faster, however not to
> this degree. My guess would be that you use the default cache mode
> (which includes cache.direct=off), so your results are skewed because
> the first requests will only write to memory (the host page cache) and
> only later requests will actually hit the disk.
>
> For benchmarking, you should always use cache.direct=on (or an alias
> that contains it, such as cache=none).
>
> > Another issue I hit is that I cannot set or change the cluster size of
> > overlays. Is this possible with "virsh snapshot-create-as"?
>
> That's a libvirt question. Peter, can you help?
>
> > I am using qemu-system-x86_64 version 4.2.0 and virsh version 6.0.0.
> >
> >
> > Thank you for any insights or advice you have.
>
> Kevin
>
>

Reply via email to