Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
On 11/18/2014 11:58 AM, Max Reitz wrote: > On 18.11.2014 18:55, Eric Blake wrote: >> On 11/14/2014 06:06 AM, Max Reitz wrote: >>> Add a function qcow2_change_refcount_order() which allows changing the >>> refcount order of a qcow2 image. >>> >>> Signed-off-by: Max Reitz >>> --- >>> +if (new_allocation) { >>> +if (new_reftable_offset) { >>> +qcow2_free_clusters(bs, new_reftable_offset, >>> +allocated_reftable_size * >>> sizeof(uint64_t), >>> +QCOW2_DISCARD_NEVER); >> Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy? > > Ah, discarding is always interesting... Last year I used > QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to > use ALWAYS unless one is really sure about it. I could have used > QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like > this is that the clusters may get picked up by the following allocation, > in which case having discarded them is not a good idea (there are some > other places in the qcow2 code which use NEVER for the same reason). > > So, in this case, I think NEVER is good. Makes sense. Yes, for THIS case, we are probably going to reuse the just-discarded cluster on the very next walk, so it's not worth punching a hole just to reinstate it. >>> +done: >>> +if (new_reftable) { >>> +/* On success, new_reftable actually points to the old >>> reftable (and >>> + * new_reftable_size is the old reftable's size); but that >>> is just >>> + * fine */ >>> +for (i = 0; i < new_reftable_size; i++) { >>> +uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK; >>> +if (offset) { >>> +qcow2_free_clusters(bs, offset, s->cluster_size, >>> +QCOW2_DISCARD_NEVER); >> Again, why the QCOW2_DISCARD_NEVER policy? > > Here, I have nothing to justify it. I'll use QCOW2_DISCARD_OTHER in v3. Thanks, and now I know a bit more about discard policy. > >> Fix the MIN vs. MAX bug, and the two dead assignment statements, and you >> can have: >> >> Reviewed-by: Eric Blake > > I'll also use QCOW2_DISCARD_OTHER for freeing the refblocks and the > reftable after the "done" label, if you're fine with that. Yes, works for me. > > Once again, thanks a lot! And thank you for a mentally engaging review :) I'm still in the middle of an email on a possible test you can write to provoke a different 3-pass scenario thanks to all-zero refblocks, so you may want to wait for that before posting v3... -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
On 18.11.2014 18:55, Eric Blake wrote: On 11/14/2014 06:06 AM, Max Reitz wrote: Add a function qcow2_change_refcount_order() which allows changing the refcount order of a qcow2 image. Signed-off-by: Max Reitz --- block/qcow2-refcount.c | 457 + block/qcow2.h | 4 + 2 files changed, 461 insertions(+) +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable, + +status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index, + (uint64_t)total * s->refcount_table_size, cb_opaque); Not sure if the casts are needed (isn't s->refcount_table_size already uint64_t, Surprise, it isn't. I thought otherwise, too, but then got told by clang_complete (it's uint32_t). and 'int * uint64_t' does the right thing); but I guess it doesn't hurt to leave them. +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order, +BlockDriverAmendStatusCB *status_cb, +void *cb_opaque, Error **errp) +{ +do { +int total_walks; + +new_allocation = false; + +/* At least we have to do this walk and the one which writes the + * refblocks; also, at least we have to do this loop here at least + * twice (normally), first to do the allocations, and second to + * determine that everything is correctly allocated, this then makes + * three walks in total */ +total_walks = MIN(walk_index + 2, 3); This feels wrong... Yes, I noticed already when preparing v3 and it's already fixed in my local v3 branch. *cough* + +/* First, allocate the structures so they are present in the refcount + * structures */ +ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index, + &new_reftable_size, NULL, new_refblock_size, + new_refcount_bits, &alloc_refblock, + &new_allocation, NULL, status_cb, cb_opaque, + walk_index++, total_walks, errp); ...In the common case of just two iterations of the do loop (second iteration confirms no allocations needed), you call with index 0/2, 1/3, and then the later non-allocation walk is index 2/3. In the rare case of three iterations of the do loop, you call with index 0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4. I highly doubt that it is possible to trigger four iterations of the do loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5. I think you instead want to have: total_walks = MAX(walk_index + 2, 3) then the common case will call with 0/3, 1/3, and the later walk as 2/3 the three-iteration loop will call with 0/3, 1/3, 2/4, and the later walk as 3/4 the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and the later walk as 4/5. + +new_reftable_index = 0; + +if (new_allocation) { +if (new_reftable_offset) { +qcow2_free_clusters(bs, new_reftable_offset, +allocated_reftable_size * sizeof(uint64_t), +QCOW2_DISCARD_NEVER); Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy? Ah, discarding is always interesting... Last year I used QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to use ALWAYS unless one is really sure about it. I could have used QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like this is that the clusters may get picked up by the following allocation, in which case having discarded them is not a good idea (there are some other places in the qcow2 code which use NEVER for the same reason). So, in this case, I think NEVER is good. Why not punch holes in the file when throwing out a failed too-small new table, or when cleaning up the old table once the new table is good? +} + +new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size * + sizeof(uint64_t)); +if (new_reftable_offset < 0) { +error_setg_errno(errp, -new_reftable_offset, + "Failed to allocate the new reftable"); +ret = new_reftable_offset; +goto done; +} +allocated_reftable_size = new_reftable_size; + +new_allocation = true; This assignment is dead code (it already occurs inside an 'if (new_allocation)' condition). Right. Though I somehow like its explicitness... I'll remove it. +} +} while (new_allocation); + +/* Second, write the new refblocks */ +new_allocation = false; This assignment is dead code (it can only be reached if the earlier do loop ended, which is only possible when no allocations are recorded). Right again. +ret = wa
Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
On 11/14/2014 06:06 AM, Max Reitz wrote: > Add a function qcow2_change_refcount_order() which allows changing the > refcount order of a qcow2 image. > > Signed-off-by: Max Reitz > --- > block/qcow2-refcount.c | 457 > + > block/qcow2.h | 4 + > 2 files changed, 461 insertions(+) > > +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable, > + > +status_cb(bs, (uint64_t)index * s->refcount_table_size + > reftable_index, > + (uint64_t)total * s->refcount_table_size, cb_opaque); Not sure if the casts are needed (isn't s->refcount_table_size already uint64_t, and 'int * uint64_t' does the right thing); but I guess it doesn't hurt to leave them. > +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order, > +BlockDriverAmendStatusCB *status_cb, > +void *cb_opaque, Error **errp) > +{ > +do { > +int total_walks; > + > +new_allocation = false; > + > +/* At least we have to do this walk and the one which writes the > + * refblocks; also, at least we have to do this loop here at least > + * twice (normally), first to do the allocations, and second to > + * determine that everything is correctly allocated, this then makes > + * three walks in total */ > +total_walks = MIN(walk_index + 2, 3); This feels wrong... > + > +/* First, allocate the structures so they are present in the refcount > + * structures */ > +ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index, > + &new_reftable_size, NULL, new_refblock_size, > + new_refcount_bits, &alloc_refblock, > + &new_allocation, NULL, status_cb, cb_opaque, > + walk_index++, total_walks, errp); ...In the common case of just two iterations of the do loop (second iteration confirms no allocations needed), you call with index 0/2, 1/3, and then the later non-allocation walk is index 2/3. In the rare case of three iterations of the do loop, you call with index 0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4. I highly doubt that it is possible to trigger four iterations of the do loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5. I think you instead want to have: total_walks = MAX(walk_index + 2, 3) then the common case will call with 0/3, 1/3, and the later walk as 2/3 the three-iteration loop will call with 0/3, 1/3, 2/4, and the later walk as 3/4 the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and the later walk as 4/5. > + > +new_reftable_index = 0; > + > +if (new_allocation) { > +if (new_reftable_offset) { > +qcow2_free_clusters(bs, new_reftable_offset, > +allocated_reftable_size * > sizeof(uint64_t), > +QCOW2_DISCARD_NEVER); Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy? Why not punch holes in the file when throwing out a failed too-small new table, or when cleaning up the old table once the new table is good? > +} > + > +new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size > * > + sizeof(uint64_t)); > +if (new_reftable_offset < 0) { > +error_setg_errno(errp, -new_reftable_offset, > + "Failed to allocate the new reftable"); > +ret = new_reftable_offset; > +goto done; > +} > +allocated_reftable_size = new_reftable_size; > + > +new_allocation = true; This assignment is dead code (it already occurs inside an 'if (new_allocation)' condition). > +} > +} while (new_allocation); > + > +/* Second, write the new refblocks */ > +new_allocation = false; This assignment is dead code (it can only be reached if the earlier do loop ended, which is only possible when no allocations are recorded). > +ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index, > + &new_reftable_size, new_refblock, > + new_refblock_size, new_refcount_bits, > + &flush_refblock, &new_allocation, > new_set_refcount, > + status_cb, cb_opaque, walk_index, walk_index + > 1, > + errp); > +if (ret < 0) { > +goto done; > +} > +assert(!new_allocation); > + Correct. > +done: > +if (new_reftable) { > +/* On success, new_reftable actually points to the old reftable (and > + * new_reftable_size is the old reftable's size); but that is just > + * fine */ > +
[Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
Add a function qcow2_change_refcount_order() which allows changing the refcount order of a qcow2 image. Signed-off-by: Max Reitz --- block/qcow2-refcount.c | 457 + block/qcow2.h | 4 + 2 files changed, 461 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 2e13a9c..b59a028 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -2484,3 +2484,460 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset, return 0; } + +/* A pointer to a function of this type is given to walk_over_reftable(). That + * function will create refblocks and pass them to a RefblockFinishOp once they + * are completed (@refblock). @refblock_empty is set if the refblock is + * completely empty. + * + * Along with the refblock, a corresponding reftable entry is passed, in the + * reftable @reftable (which may be reallocated) at @reftable_index. + * + * @allocated should be set to true if a new cluster has been allocated. + */ +typedef int (RefblockFinishOp)(BlockDriverState *bs, uint64_t **reftable, + uint64_t reftable_index, uint64_t *reftable_size, + void *refblock, bool refblock_empty, + bool *allocated, Error **errp); + +/** + * This "operation" for walk_over_reftable() allocates the refblock on disk (if + * it is not empty) and inserts its offset into the new reftable. The size of + * this new reftable is increased as required. + */ +static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable, + uint64_t reftable_index, uint64_t *reftable_size, + void *refblock, bool refblock_empty, bool *allocated, + Error **errp) +{ +BDRVQcowState *s = bs->opaque; +int64_t offset; + +if (!refblock_empty && reftable_index >= *reftable_size) { +uint64_t *new_reftable; +uint64_t new_reftable_size; + +new_reftable_size = ROUND_UP(reftable_index + 1, + s->cluster_size / sizeof(uint64_t)); +if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) { +error_setg(errp, + "This operation would make the refcount table grow " + "beyond the maximum size supported by QEMU, aborting"); +return -ENOTSUP; +} + +new_reftable = g_try_realloc(*reftable, new_reftable_size * +sizeof(uint64_t)); +if (!new_reftable) { +error_setg(errp, "Failed to increase reftable buffer size"); +return -ENOMEM; +} + +memset(new_reftable + *reftable_size, 0, + (new_reftable_size - *reftable_size) * sizeof(uint64_t)); + +*reftable = new_reftable; +*reftable_size = new_reftable_size; +} + +if (!refblock_empty && !(*reftable)[reftable_index]) { +offset = qcow2_alloc_clusters(bs, s->cluster_size); +if (offset < 0) { +error_setg_errno(errp, -offset, "Failed to allocate refblock"); +return offset; +} +(*reftable)[reftable_index] = offset; +*allocated = true; +} + +return 0; +} + +/** + * This "operation" for walk_over_reftable() writes the refblock to disk at the + * offset specified by the new reftable's entry. It does not modify the new + * reftable or change any refcounts. + */ +static int flush_refblock(BlockDriverState *bs, uint64_t **reftable, + uint64_t reftable_index, uint64_t *reftable_size, + void *refblock, bool refblock_empty, bool *allocated, + Error **errp) +{ +BDRVQcowState *s = bs->opaque; +int64_t offset; +int ret; + +if (reftable_index < *reftable_size && (*reftable)[reftable_index]) { +offset = (*reftable)[reftable_index]; + +ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size); +if (ret < 0) { +error_setg_errno(errp, -ret, "Overlap check failed"); +return ret; +} + +ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size); +if (ret < 0) { +error_setg_errno(errp, -ret, "Failed to write refblock"); +return ret; +} +} else { +assert(refblock_empty); +} + +return 0; +} + +/** + * This function walks over the existing reftable and every referenced refblock; + * if @new_set_refcount is non-NULL, it is called for every refcount entry to + * create an equal new entry in the passed @new_refblock. Once that + * @new_refblock is completely filled, @operation will be called. + * + * @status_cb and @cb_opaque are used for the amend operation's status callback. + * @index is the index of the walk_over_reftable() calls and @total is the total + * number of walk_ove