Thanks. Dbench does not logically allocate new disk space all the time,
because it's a FS level benchmark that creates file and deletes them.
Therefore it also depends on the guest FS, say, a btrfs guest FS allocates
about 1.8x space of that from EXT4, due to its COW nature. It does cause
the FS to allocate some space during about 1/3 of the test duration I
think. But this does not mitigate it too much because a FS often writes in
a stride rather than consecutively, which causes write amplification at
allocation times.

So I tested it with qemu-img convert from a 400M raw file:
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t unsafe
-O vdi /run/shm/rand 1.vdi

real 0m0.402s
user 0m0.206s
sys 0m0.202s
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t
writeback -O vdi /run/shm/rand 1.vdi

real 0m8.678s
user 0m0.169s
sys 0m0.500s
zheq-PC sdb # time qemu-img convert -f raw -t writeback -O vdi
/run/shm/rand 1.vdi

real 0m4.320s
user 0m0.148s
sys 0m0.471s
zheq-PC sdb # time qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand
1.vdi
real 0m0.489s
user 0m0.173s
sys 0m0.325s

zheq-PC sdb # time qemu-img convert -f raw -O vdi /run/shm/rand 1.vdi

real 0m0.515s
user 0m0.168s
sys 0m0.357s
zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -O vdi
/run/shm/rand 1.vdi

real 0m0.431s
user 0m0.192s
sys 0m0.248s

Although 400M is not a giant file, it does show the trend.
As you can see when there's drastic allocation needs, and when there no
extra buffering from a virtualized host, the throughput drops about 50%.
But still it has no effect on "unsafe" mode, as predicted. Also I believe
that expecting to use a half-converted image is seldom the use case, while
host crash and power loss are not so unimaginable.
Looks like qemu-img convert is using "unsafe" as default as well, so even
novice "qemu-img convert" users are not likely to find performance
degradation.

I have not yet tried guest OS installation on top, but I guess a new flag
for one-time faster OS installation is not likely useful, and
"cache=unsafe" already does the trick.


On Sat, May 9, 2015 at 5:26 AM Stefan Weil <s...@weilnetz.de> wrote:

> Am 08.05.2015 um 15:55 schrieb Kevin Wolf:
> > Am 08.05.2015 um 15:14 hat Max Reitz geschrieben:
> >> On 07.05.2015 17:16, Zhe Qiu wrote:
> >>> In reference to b0ad5a45...078a458e, metadata writes to
> >>> qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes.
> >>>
> >>> Only when write is successful that bdrv_flush is called.
> >>>
> >>> Signed-off-by: Zhe Qiu <phoea...@gmail.com>
> >>> ---
> >>>   block/vdi.c | 3 +++
> >>>   1 file changed, 3 insertions(+)
> >> I missed Kevin's arguments before, but I think that adding this is
> >> more correct than not having it; and when thinking about speed, this
> >> is vdi, a format supported for compatibility.
> > If you use it only as a convert target, you probably care more about
> > speed than about leaks in case of a host crash.
> >
> >> So if we wanted to optimize it, we'd probably have to cache multiple
> >> allocations, do them at once and then flush afterwards (like the
> >> metadata cache we have in qcow2?)
> > That would defeat the purpose of this patch which aims at having
> > metadata and data written out almost at the same time. On the other
> > hand, fully avoiding the problem instead of just making the window
> > smaller would require a journal, which VDI just doesn't have.
> >
> > I'm not convinced of this patch, but I'll defer to Stefan Weil as the
> > VDI maintainer.
> >
> > Kevin
>
> Thanks for asking. I share your concerns regarding reduced performance
> caused by bdrv_flush. Conversions to VDI will take longer (how much?),
> and also installation of an OS on a new VDI disk image will be slower
> because that are the typical scenarios where the disk usage grows.
>
> @phoeagon: Did the benchmark which you used allocate additional disk
> storage? If not or if it only allocated once and then spent some time
> on already allocated blocks, that benchmark was not valid for this case.
>
> On the other hand I don't see a need for the flushing because the kind
> of failures (power failure) and their consequences seem to be acceptable
> for typical VDI usage, namely either image conversion or tests with
> existing images.
>
> That's why I'd prefer not to use bdrv_flush here. Could we make
> bdrv_flush optional (either generally or for cases like this one) so
> both people who prefer speed and people who would want
> bdrv_flush to decrease the likelihood of inconsistencies can be
> satisfied?
>
> Stefan
>
>

Reply via email to