Re: [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty()

Max Reitz Wed, 09 Jul 2014 16:24:29 -0700

On 30.06.2014 13:33, Kevin Wolf wrote:

Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:

bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by creating an empty L1
table and dropping all data clusters at once by recreating the refcount
structure accordingly instead of normally discarding all clusters.


If there are snapshots, fall back to the simple implementation (discard
all clusters).

Signed-off-by: Max Reitz <mre...@redhat.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>

This approach looks a bit too complicated to me, and calulating the
required metadata size seems error-prone.

How about this:

1. Set the dirty flag in the header so we can mess with the L1 table
    without keeping the refcounts consistent

2. Overwrite the L1 table with zeros

3. Overwrite the first n clusters after the header with zeros
    (n = 2 + l1_clusters).

4. Update the header:
    refcount_table_offset = cluster_size
    refcount_table_clusters = 1
    l1_table_offset = 3 * cluster_size

6. bdrv_truncate to n + 1 clusters

7. Now update the first 8 bytes at cluster_size (the first new refcount
    table entry) to point to 2 * cluster_size (new refcount block)

8. Reset refcount block and L2 cache

9. Allocate n + 1 clusters (the header, too) and make sure you get
    offset 0

10. Remove the dirty flag

Okay, after some fixing around and getting it to work, I noticed a(seemingly to me) rather big problem: If something bad happens between 3and 7 (especially between 4 and 7), the image cannot be repaired. Thereason is that the refcount table is empty and a new refcount blockcannot be allocated because the consistency checks correctly signal anoverlap with the refcount table (I guess, I would have expected theimage header instead, but well...); this is because nothing is allocatedand the first cluster offset returned by an allocation will probably bezero (the image header) or $cluster_size (where the reftable resides).

So I think we absolutely have to make sure that whenever therefcount_table_offset is changed on disk, the reftable it points toalready contains a valid offset. We could pull 7 before 4, but then we'dhave to guarantee that 3 did not already overwrite the reftable (whichit probably does). Well, maybe we could change 3 so it checks whetherthe reftable is already part of that area, and if it is, overwrite itsfirst entry not with zero, but with 2 * cluster_size; if the offset ofthe reftable is not 2 * cluster_size, in which case we'd have to takesome other offset. Then we could either try to write a new reftableanyway or just place everything behind that old reftable, just ignoringthe "lost" space.

In any case, I doubt it'll be much shorter overall with these additionalchecks. The current code has 340 LOC with extremely verbose commentary;my new code (failing to address the problem described above) has 100 LOCwithout any comments.

So I guess the main issue is how *complicated* the code actually is; inmy opinion, the most complicated and hardest to review piece of code inthis patch (patch v8 3/14) is minimal_blob_size(); which, as far as Ithink, we will need in one form or another eventually anyway.create_refcount_l1() is pretty long, but due to the commentary should bewell comprehensible.

In any case, I still have the code for your proposal here and I'd beabsolutely fine with working further on it. So if you think it'll beworth it anyway (which I personally don't have any opinion on), I'llcontinue on it.

Max

Re: [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty()

Reply via email to