Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-18 Thread Eric Blake
On 11/18/2014 11:58 AM, Max Reitz wrote:
> On 18.11.2014 18:55, Eric Blake wrote:
>> On 11/14/2014 06:06 AM, Max Reitz wrote:
>>> Add a function qcow2_change_refcount_order() which allows changing the
>>> refcount order of a qcow2 image.
>>>
>>> Signed-off-by: Max Reitz 
>>> ---

>>> +if (new_allocation) {
>>> +if (new_reftable_offset) {
>>> +qcow2_free_clusters(bs, new_reftable_offset,
>>> +allocated_reftable_size *
>>> sizeof(uint64_t),
>>> +QCOW2_DISCARD_NEVER);
>> Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?
> 
> Ah, discarding is always interesting... Last year I used
> QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to
> use ALWAYS unless one is really sure about it. I could have used
> QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like
> this is that the clusters may get picked up by the following allocation,
> in which case having discarded them is not a good idea (there are some
> other places in the qcow2 code which use NEVER for the same reason).
> 
> So, in this case, I think NEVER is good.

Makes sense.  Yes, for THIS case, we are probably going to reuse the
just-discarded cluster on the very next walk, so it's not worth punching
a hole just to reinstate it.


>>> +done:
>>> +if (new_reftable) {
>>> +/* On success, new_reftable actually points to the old
>>> reftable (and
>>> + * new_reftable_size is the old reftable's size); but that
>>> is just
>>> + * fine */
>>> +for (i = 0; i < new_reftable_size; i++) {
>>> +uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
>>> +if (offset) {
>>> +qcow2_free_clusters(bs, offset, s->cluster_size,
>>> +QCOW2_DISCARD_NEVER);
>> Again, why the QCOW2_DISCARD_NEVER policy?
> 
> Here, I have nothing to justify it. I'll use QCOW2_DISCARD_OTHER in v3.

Thanks, and now I know a bit more about discard policy.

> 
>> Fix the MIN vs. MAX bug, and the two dead assignment statements, and you
>> can have:
>>
>> Reviewed-by: Eric Blake 
> 
> I'll also use QCOW2_DISCARD_OTHER for freeing the refblocks and the
> reftable after the "done" label, if you're fine with that.

Yes, works for me.

> 
> Once again, thanks a lot!

And thank you for a mentally engaging review :)  I'm still in the middle
of an email on a possible test you can write to provoke a different
3-pass scenario thanks to all-zero refblocks, so you may want to wait
for that before posting v3...

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-18 Thread Max Reitz

On 18.11.2014 18:55, Eric Blake wrote:

On 11/14/2014 06:06 AM, Max Reitz wrote:

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz 
---
  block/qcow2-refcount.c | 457 +
  block/qcow2.h  |   4 +
  2 files changed, 461 insertions(+)

+static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
+
+status_cb(bs, (uint64_t)index * s->refcount_table_size + 
reftable_index,
+  (uint64_t)total * s->refcount_table_size, cb_opaque);

Not sure if the casts are needed (isn't s->refcount_table_size already
uint64_t,


Surprise, it isn't. I thought otherwise, too, but then got told by 
clang_complete (it's uint32_t).



and 'int * uint64_t' does the right thing); but I guess it
doesn't hurt to leave them.


+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+BlockDriverAmendStatusCB *status_cb,
+void *cb_opaque, Error **errp)
+{
+do {
+int total_walks;
+
+new_allocation = false;
+
+/* At least we have to do this walk and the one which writes the
+ * refblocks; also, at least we have to do this loop here at least
+ * twice (normally), first to do the allocations, and second to
+ * determine that everything is correctly allocated, this then makes
+ * three walks in total */
+total_walks = MIN(walk_index + 2, 3);

This feels wrong...


Yes, I noticed already when preparing v3 and it's already fixed in my 
local v3 branch. *cough*



+
+/* First, allocate the structures so they are present in the refcount
+ * structures */
+ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+ &new_reftable_size, NULL, new_refblock_size,
+ new_refcount_bits, &alloc_refblock,
+ &new_allocation, NULL, status_cb, cb_opaque,
+ walk_index++, total_walks, errp);

...In the common case of just two iterations of the do loop (second
iteration confirms no allocations needed), you call with index 0/2, 1/3,
and then the later non-allocation walk is index 2/3.

In the rare case of three iterations of the do loop, you call with index
0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4.

I highly doubt that it is possible to trigger four iterations of the do
loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5.

I think you instead want to have:

total_walks = MAX(walk_index + 2, 3)

then the common case will call with 0/3, 1/3, and the later walk as 2/3

the three-iteration loop will call with 0/3, 1/3, 2/4, and the later
walk as 3/4

the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and
the later walk as 4/5.


+
+new_reftable_index = 0;
+
+if (new_allocation) {
+if (new_reftable_offset) {
+qcow2_free_clusters(bs, new_reftable_offset,
+allocated_reftable_size * sizeof(uint64_t),
+QCOW2_DISCARD_NEVER);

Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?


Ah, discarding is always interesting... Last year I used 
QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to 
use ALWAYS unless one is really sure about it. I could have used 
QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like 
this is that the clusters may get picked up by the following allocation, 
in which case having discarded them is not a good idea (there are some 
other places in the qcow2 code which use NEVER for the same reason).


So, in this case, I think NEVER is good.


Why not punch holes in the file when throwing out a failed too-small
new table, or when cleaning up the old table once the new table is good?


+}
+
+new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
+   sizeof(uint64_t));
+if (new_reftable_offset < 0) {
+error_setg_errno(errp, -new_reftable_offset,
+ "Failed to allocate the new reftable");
+ret = new_reftable_offset;
+goto done;
+}
+allocated_reftable_size = new_reftable_size;
+
+new_allocation = true;

This assignment is dead code (it already occurs inside an 'if
(new_allocation)' condition).


Right. Though I somehow like its explicitness... I'll remove it.


+}
+} while (new_allocation);
+
+/* Second, write the new refblocks */
+new_allocation = false;

This assignment is dead code (it can only be reached if the earlier do
loop ended, which is only possible when no allocations are recorded).


Right again.


+ret = wa

Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-18 Thread Eric Blake
On 11/14/2014 06:06 AM, Max Reitz wrote:
> Add a function qcow2_change_refcount_order() which allows changing the
> refcount order of a qcow2 image.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-refcount.c | 457 
> +
>  block/qcow2.h  |   4 +
>  2 files changed, 461 insertions(+)
> 

> +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,

> +
> +status_cb(bs, (uint64_t)index * s->refcount_table_size + 
> reftable_index,
> +  (uint64_t)total * s->refcount_table_size, cb_opaque);

Not sure if the casts are needed (isn't s->refcount_table_size already
uint64_t, and 'int * uint64_t' does the right thing); but I guess it
doesn't hurt to leave them.

> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
> +BlockDriverAmendStatusCB *status_cb,
> +void *cb_opaque, Error **errp)
> +{

> +do {
> +int total_walks;
> +
> +new_allocation = false;
> +
> +/* At least we have to do this walk and the one which writes the
> + * refblocks; also, at least we have to do this loop here at least
> + * twice (normally), first to do the allocations, and second to
> + * determine that everything is correctly allocated, this then makes
> + * three walks in total */
> +total_walks = MIN(walk_index + 2, 3);

This feels wrong...

> +
> +/* First, allocate the structures so they are present in the refcount
> + * structures */
> +ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
> + &new_reftable_size, NULL, new_refblock_size,
> + new_refcount_bits, &alloc_refblock,
> + &new_allocation, NULL, status_cb, cb_opaque,
> + walk_index++, total_walks, errp);

...In the common case of just two iterations of the do loop (second
iteration confirms no allocations needed), you call with index 0/2, 1/3,
and then the later non-allocation walk is index 2/3.

In the rare case of three iterations of the do loop, you call with index
0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4.

I highly doubt that it is possible to trigger four iterations of the do
loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5.

I think you instead want to have:

total_walks = MAX(walk_index + 2, 3)

then the common case will call with 0/3, 1/3, and the later walk as 2/3

the three-iteration loop will call with 0/3, 1/3, 2/4, and the later
walk as 3/4

the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and
the later walk as 4/5.

> +
> +new_reftable_index = 0;
> +
> +if (new_allocation) {
> +if (new_reftable_offset) {
> +qcow2_free_clusters(bs, new_reftable_offset,
> +allocated_reftable_size * 
> sizeof(uint64_t),
> +QCOW2_DISCARD_NEVER);

Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?
 Why not punch holes in the file when throwing out a failed too-small
new table, or when cleaning up the old table once the new table is good?

> +}
> +
> +new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size 
> *
> +   sizeof(uint64_t));
> +if (new_reftable_offset < 0) {
> +error_setg_errno(errp, -new_reftable_offset,
> + "Failed to allocate the new reftable");
> +ret = new_reftable_offset;
> +goto done;
> +}
> +allocated_reftable_size = new_reftable_size;
> +
> +new_allocation = true;

This assignment is dead code (it already occurs inside an 'if
(new_allocation)' condition).

> +}
> +} while (new_allocation);
> +
> +/* Second, write the new refblocks */
> +new_allocation = false;

This assignment is dead code (it can only be reached if the earlier do
loop ended, which is only possible when no allocations are recorded).

> +ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
> + &new_reftable_size, new_refblock,
> + new_refblock_size, new_refcount_bits,
> + &flush_refblock, &new_allocation, 
> new_set_refcount,
> + status_cb, cb_opaque, walk_index, walk_index + 
> 1,
> + errp);
> +if (ret < 0) {
> +goto done;
> +}
> +assert(!new_allocation);
> +

Correct.

> +done:
> +if (new_reftable) {
> +/* On success, new_reftable actually points to the old reftable (and
> + * new_reftable_size is the old reftable's size); but that is just
> + * fine */
> +

[Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-14 Thread Max Reitz
Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz 
---
 block/qcow2-refcount.c | 457 +
 block/qcow2.h  |   4 +
 2 files changed, 461 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2e13a9c..b59a028 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2484,3 +2484,460 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, 
int ign, int64_t offset,
 
 return 0;
 }
+
+/* A pointer to a function of this type is given to walk_over_reftable(). That
+ * function will create refblocks and pass them to a RefblockFinishOp once they
+ * are completed (@refblock). @refblock_empty is set if the refblock is
+ * completely empty.
+ *
+ * Along with the refblock, a corresponding reftable entry is passed, in the
+ * reftable @reftable (which may be reallocated) at @reftable_index.
+ *
+ * @allocated should be set to true if a new cluster has been allocated.
+ */
+typedef int (RefblockFinishOp)(BlockDriverState *bs, uint64_t **reftable,
+   uint64_t reftable_index, uint64_t 
*reftable_size,
+   void *refblock, bool refblock_empty,
+   bool *allocated, Error **errp);
+
+/**
+ * This "operation" for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t offset;
+
+if (!refblock_empty && reftable_index >= *reftable_size) {
+uint64_t *new_reftable;
+uint64_t new_reftable_size;
+
+new_reftable_size = ROUND_UP(reftable_index + 1,
+ s->cluster_size / sizeof(uint64_t));
+if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+error_setg(errp,
+   "This operation would make the refcount table grow "
+   "beyond the maximum size supported by QEMU, aborting");
+return -ENOTSUP;
+}
+
+new_reftable = g_try_realloc(*reftable, new_reftable_size *
+sizeof(uint64_t));
+if (!new_reftable) {
+error_setg(errp, "Failed to increase reftable buffer size");
+return -ENOMEM;
+}
+
+memset(new_reftable + *reftable_size, 0,
+   (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+*reftable  = new_reftable;
+*reftable_size = new_reftable_size;
+}
+
+if (!refblock_empty && !(*reftable)[reftable_index]) {
+offset = qcow2_alloc_clusters(bs, s->cluster_size);
+if (offset < 0) {
+error_setg_errno(errp, -offset, "Failed to allocate refblock");
+return offset;
+}
+(*reftable)[reftable_index] = offset;
+*allocated = true;
+}
+
+return 0;
+}
+
+/**
+ * This "operation" for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, bool *allocated,
+  Error **errp)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t offset;
+int ret;
+
+if (reftable_index < *reftable_size && (*reftable)[reftable_index]) {
+offset = (*reftable)[reftable_index];
+
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Overlap check failed");
+return ret;
+}
+
+ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write refblock");
+return ret;
+}
+} else {
+assert(refblock_empty);
+}
+
+return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced 
refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status 
callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the 
total
+ * number of walk_ove