REF- optimization

Frediano Ziglio Wed, 14 Sep 2011 04:50:06 -0700

2011/9/14 Kevin Wolf <kw...@redhat.com>:
... omissis...
>
>>>>>> To optimize REF+ I mark a range as allocated and use this range to
>>>>>> get new ones (avoiding writing refcount to disk). When a flush is
>>>>>> requested or in some situations (like snapshot) this cache is disabled
>>>>>> and flushed (written as REF-).
>>>>>> I do not consider this patch ready, it works and pass all io-tests
>>>>>> but for instance I would avoid allocating new clusters for refcount
>>>>>> during preallocation.
>>>>>
>>>>> The only question here is if improving cache=writethrough cluster
>>>>> allocation performance is worth the additional complexity in the already
>>>>> complex refcounting code.
>>>>>
>>>>
>>>> I didn't see this optimization as a second level cache, but yes, for
>>>> REF- is a second cache.
>>>>
>>>>> The alternative that was discussed before is the dirty bit approach that
>>>>> is used in QED and would allow us to use writeback for all refcount
>>>>> blocks, regardless of REF- or REF+. It would be an easier approach
>>>>> requiring less code changes, but it comes with the cost of requiring an
>>>>> fsck after a qemu crash.
>>>>>
>>>>
>>>> I was thinking about changing the header magic first time we change
>>>> refcount in order to mark image as dirty so newer Qemu recognize the
>>>> flag while former one does not recognize image. Obviously reverting
>>>> magic on image close.
>>>
>>> We've discussed this idea before and I think it wasn't considered a
>>> great idea to automagically change the header in an incompatible way.
>>> But we can always say that for improved performance you need to upgrade
>>> your image to qcow2 v3.
>>>
>>
>> I don't understand why there is not a wiki page for detailed qcow3
>> changes. I saw your post on May. I follow this ML since August so I
>> think I missed a lot of discussion on qcow improves.
>
> Unfortunately there have been almost no comments, so you can consider
> RFC v2 as the current proposal.
>


:(

>>>>>> End speed up is quite visible allocating clusters (more then 20%).
>>>>>
>>>>> What benchmark do you use for testing this?
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>> Currently I'm using bonnie++ but I noted similar improves with iozone.
>>>> The test script format an image then launch a Linux machine which run
>>>> a script and save result to a file.
>>>> The test image is seems by this virtual machine as a separate disk.
>>>> The file on hist reside in a separate LV.
>>>> I got quite consistent results (of course not working on the machine
>>>> while testing, is not actually dedicated to this job).
>>>>
>>>> Actually I'm running the test (added a test working in a snapshot image).
>>>
>>> Okay. Let me guess the remaining variables: The image is on an ext4 host
>>> filesystem, you use cache=writethrough and virtio-blk. You don't use
>>> backing files, compression and encryption. For your tests with internal
>>> snapshots you have exactly one internal snapshot that is taken
>>> immediately before the benchmark. Oh, and not to forget, KVM is enabled.
>>>
>>> Are these assumptions correct?
>>>
>>
>> change ext4 and put xfs and assumptions are ok. Yes I use internal
>> snapshots (REF- are useful only in this case). To produce "qcow2s" I
>> use these commands
>>
>> $QEMU_IMG create -f qcow2 -o preallocation=metadata $FILE 15g
>> $QEMU_IMG snapshot -c test $FILE
>
> Ah, using preallocation for this is a nice trick. :-)
>
> Anyway, thanks, I think I understand now what you're measuring.
>

Yes, and real performance are even worst cause preallocating only
metadata you get many less physical read from disk.

>> I run again tests (yesterdays test was not that consistence, today I
>> stopped all unneeded services and avoid the X session tunneled by
>> ssh).
>>
>> without patched
>> run   raw    qcow2   qcow2s
>> 1     22758  4206    1054
>> 2     28401  21745 17997
>>
>> with patches (ref-,ref+)
>> run   raw    qcow2   qcow2s
>> 1     22563  4965    4382
>> 2     23823  21043 20904
>>
>> beside I still don't understand difference in second runs for raw
>> (about 20%!!!) this confirms the huge improves with snapshot.
>
> Measuring with cache=writethrough means that you use the host page
> cache, so I think that could be where you get the difference for raw.
> Numbers should be more consistent for cache=none, but that's not what we
> want to test...
>
> Did you also test how each patch performs if applied on its own, without
> the other one? It would be interesting to see how much REF- optimisation
> contributes and how much REF+.
>
> Kevin
>

Here you are

only ref-
run   raw    qcow2   qcow2s
1     22579  4072    4053
2     30322  19667   16830

so mostly REF- for snapshot.

Frediano

Re: [Qemu-devel] [PATCH][RFC][0/2] REF+/REF- optimization

Reply via email to