On 30.06.2014 13:33, Kevin Wolf wrote:
Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:
bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by creating an empty L1
table and dropping all data clusters at once by recreating the refcount
structure accordingly instead of normally discarding all clusters.
If there are snapshots, fall back to the simple implementation (discard
all clusters).
Signed-off-by: Max Reitz <mre...@redhat.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
This approach looks a bit too complicated to me, and calulating the
required metadata size seems error-prone.
How about this:
1. Set the dirty flag in the header so we can mess with the L1 table
without keeping the refcounts consistent
2. Overwrite the L1 table with zeros
3. Overwrite the first n clusters after the header with zeros
(n = 2 + l1_clusters).
4. Update the header:
refcount_table_offset = cluster_size
refcount_table_clusters = 1
l1_table_offset = 3 * cluster_size
6. bdrv_truncate to n + 1 clusters
7. Now update the first 8 bytes at cluster_size (the first new refcount
table entry) to point to 2 * cluster_size (new refcount block)
8. Reset refcount block and L2 cache
9. Allocate n + 1 clusters (the header, too) and make sure you get
offset 0
10. Remove the dirty flag
Okay, after some fixing around and getting it to work, I noticed a
(seemingly to me) rather big problem: If something bad happens between 3
and 7 (especially between 4 and 7), the image cannot be repaired. The
reason is that the refcount table is empty and a new refcount block
cannot be allocated because the consistency checks correctly signal an
overlap with the refcount table (I guess, I would have expected the
image header instead, but well...); this is because nothing is allocated
and the first cluster offset returned by an allocation will probably be
zero (the image header) or $cluster_size (where the reftable resides).
So I think we absolutely have to make sure that whenever the
refcount_table_offset is changed on disk, the reftable it points to
already contains a valid offset. We could pull 7 before 4, but then we'd
have to guarantee that 3 did not already overwrite the reftable (which
it probably does). Well, maybe we could change 3 so it checks whether
the reftable is already part of that area, and if it is, overwrite its
first entry not with zero, but with 2 * cluster_size; if the offset of
the reftable is not 2 * cluster_size, in which case we'd have to take
some other offset. Then we could either try to write a new reftable
anyway or just place everything behind that old reftable, just ignoring
the "lost" space.
In any case, I doubt it'll be much shorter overall with these additional
checks. The current code has 340 LOC with extremely verbose commentary;
my new code (failing to address the problem described above) has 100 LOC
without any comments.
So I guess the main issue is how *complicated* the code actually is; in
my opinion, the most complicated and hardest to review piece of code in
this patch (patch v8 3/14) is minimal_blob_size(); which, as far as I
think, we will need in one form or another eventually anyway.
create_refcount_l1() is pretty long, but due to the commentary should be
well comprehensible.
In any case, I still have the code for your proposal here and I'd be
absolutely fine with working further on it. So if you think it'll be
worth it anyway (which I personally don't have any opinion on), I'll
continue on it.
Max