Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
On 10/16/2014 09:27 AM, Max Reitz wrote: >>> +memset(*refcount_table + old_nb_clusters, 0, >>> + (*nb_clusters - old_nb_clusters) * sizeof(uint16_t)); >> Is this calculation unnecessarily hard-coded to refcount_order==4? > > While now in the process of editing this patch, no, this is not > hard-coded to that refcount_order. It's hard-coded to *refcount_table > being uint16_t *. I think this fine. Correct - our choice of uint16_t* is what hard-codes our current dependence on refcount_order==4, and we assert that is the case elsewhere. > Making the in-memory refcount table > support variable refcount orders would be pretty hard and in fact we do > not need it before we actually support other refcount_orders. Agreed. Particularly for refcount orders < 3, where you pack more than one refcount in a single byte. > When doing > that, I (or anyone else) will probably add some function read_refcount() > which reads a refcount from a refcount block or a concatenation of > refblocks (such as this in-memory refcount table) while taking into > account a variable refcount_order. When that is done, we can rework this > code. Fair enough. Don't hold up this series for that future improvement. > > But for now it's fine to make the in-memory refcount table entries > uint16_t, which is the reason for all the sizeof(uint16_t). > > Max > > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
Am 09.10.2014 um 01:09 schrieb Eric Blake: On 08/29/2014 03:41 PM, Max Reitz wrote: + * cluster_count clusters; therefore, we have to allocate + * cluster_count - contiguous_free_clusters new clusters at the end of + * the image (which is the current value of cluster; note that cluster + * may exceed old_nb_clusters if *first_free_cluster pointed beyond the + * image end) */ +*nb_clusters = cluster + cluster_count - contiguous_free_clusters; +*refcount_table = g_try_realloc(*refcount_table, +*nb_clusters * sizeof(uint16_t)); +if (!*refcount_table) { +return -ENOMEM; +} + +memset(*refcount_table + old_nb_clusters, 0, + (*nb_clusters - old_nb_clusters) * sizeof(uint16_t)); Is this calculation unnecessarily hard-coded to refcount_order==4? While now in the process of editing this patch, no, this is not hard-coded to that refcount_order. It's hard-coded to *refcount_table being uint16_t *. I think this fine. Making the in-memory refcount table support variable refcount orders would be pretty hard and in fact we do not need it before we actually support other refcount_orders. When doing that, I (or anyone else) will probably add some function read_refcount() which reads a refcount from a refcount block or a concatenation of refblocks (such as this in-memory refcount table) while taking into account a variable refcount_order. When that is done, we can rework this code. But for now it's fine to make the in-memory refcount table entries uint16_t, which is the reason for all the sizeof(uint16_t). Max
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
Am 11.10.2014 um 20:56 schrieb Benoît Canet: +int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0; +int64_t rb_ofs, rb_start, rb_index; Everytime a few day pass and I read rb_ofs and rt_ofs again I found these names obfuscated. I know Linus says that C is a spartan language but these look odd. Maybe refcount_block_offset, refcount_block_* and refcount_table_offset would be better. I would use longer names if there was no line length limit. ;-) I'll try and see how it looks. Max
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
> +int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0; > +int64_t rb_ofs, rb_start, rb_index; Everytime a few day pass and I read rb_ofs and rt_ofs again I found these names obfuscated. I know Linus says that C is a spartan language but these look odd. Maybe refcount_block_offset, refcount_block_* and refcount_table_offset would be better. I'll try to read this patch for real before you repost it. Best regards Benoît
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
Am 10.10.2014 um 14:44 schrieb Benoît Canet: +*nb_clusters = cluster + cluster_count - contiguous_free_clusters; +*refcount_table = g_try_realloc(*refcount_table, +*nb_clusters * sizeof(uint16_t)); Something tells me that these sizeof(uint16_t) are connected to s->refcount_order and indirectely to REFCOUNT_SHIFT and that this code could benefit from this relationship thus probably saving you work in the future. Yes, see the answer to Eric. I'm totally in favor of variable refcounts, see my "qcow2: Drop REFCOUNT_SHIFT series". ;-) Once again, thank you for your reviews! Max +if (!*refcount_table) { +return -ENOMEM; +} + +memset(*refcount_table + old_nb_clusters, 0, + (*nb_clusters - old_nb_clusters) * sizeof(uint16_t)); +} + +/* Go back to the first free cluster */ +cluster -= contiguous_free_clusters; +for (i = 0; i < cluster_count; i++) { +(*refcount_table)[cluster + i] = 1; +} + +return cluster << s->cluster_bits; +} + +/* + * Creates a new refcount structure based solely on the in-memory information + * given through *refcount_table. All necessary allocations will be reflected + * in that array. + * + * On success, the old refcount structure is leaked (it will be covered by the + * new refcount structure). + */ +static int rebuild_refcount_structure(BlockDriverState *bs, + BdrvCheckResult *res, + uint16_t **refcount_table, + int64_t *nb_clusters) +{ +BDRVQcowState *s = bs->opaque; +int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0; +int64_t rb_ofs, rb_start, rb_index; +uint32_t reftable_size = 0; +uint64_t *reftable = NULL; +uint16_t *on_disk_rb; +int i, ret = 0; +struct { +uint64_t rt_offset; +uint32_t rt_clusters; +} QEMU_PACKED rt_offset_and_clusters; + +qcow2_cache_empty(bs, s->refcount_block_cache); + +write_refblocks: +for (; cluster < *nb_clusters; cluster++) { +if (!(*refcount_table)[cluster]) { +continue; +} + +rb_index = cluster >> s->refcount_block_bits; +rb_start = rb_index << s->refcount_block_bits; + +/* Don't allocate a cluster in a refblock already written to disk */ +if (first_free_cluster < rb_start) { +first_free_cluster = rb_start; +} +rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters, + &first_free_cluster); +if (rb_ofs < 0) { +fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret)); +res->check_errors++; +ret = rb_ofs; +goto fail; +} + +if (reftable_size <= rb_index) { +uint32_t old_rt_size = reftable_size; +reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t), + s->cluster_size) / sizeof(uint64_t); +reftable = g_try_realloc(reftable, + reftable_size * sizeof(uint64_t)); +if (!reftable) { +res->check_errors++; +ret = -ENOMEM; +goto fail; +} + +memset(reftable + old_rt_size, 0, + (reftable_size - old_rt_size) * sizeof(uint64_t)); + +/* The offset we have for the reftable is now no longer valid; + * this will leak that range, but we can easily fix that by running + * a leak-fixing check after this rebuild operation */ +rt_ofs = -1; +} +reftable[rb_index] = rb_ofs; + +/* If this is apparently the last refblock (for now), try to squeeze the + * reftable in */ +if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits && +rt_ofs < 0) +{ +rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size * + sizeof(uint64_t)), + refcount_table, nb_clusters, + &first_free_cluster); +if (rt_ofs < 0) { +fprintf(stderr, "ERROR allocating reftable: %s\n", +strerror(-ret)); +res->check_errors++; +ret = rt_ofs; +goto fail; +} +} + +ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size); +if (ret < 0) { +fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret)); +goto fail; +} + +on_disk_rb = g_malloc0(s->cluster_size); +for (i = 0; i < s->cluster_size / sizeof(uint16_t) && +rb_start + i < *nb_clusters; i++) +{ +on_disk_rb[i] = cpu_to_be16((*ref
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
Am 09.10.2014 um 01:09 schrieb Eric Blake: On 08/29/2014 03:41 PM, Max Reitz wrote: The previous commit introduced the "rebuild" variable to qcow2's implementation of the image consistency check. Now make use of this by adding a function which creates a completely new refcount structure based solely on the in-memory information gathered before. The old refcount structure will be leaked, however. Might be worth mentioning in the commit message that a later commit will deal with the leak. Signed-off-by: Max Reitz --- block/qcow2-refcount.c | 286 - 1 file changed, 283 insertions(+), 3 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 6300cec..318c152 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res, } /* + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to + * the on-disk refcount structures. + * + * *first_free_cluster does not necessarily point to the first free cluster, but + * may point to one cluster as close as possible before it. The offset returned + * will never be before that cluster. Took me a couple reads of the comment and code to understand that. If I'm correct, this alternative wording may be better: On input, *first_free_cluster tells where to start looking, and need not actually be a free cluster; the returned offset will not be before that cluster. On output, *first_free_cluster points to the actual first free cluster found. Or, depending on the semantics you intended [1]: On input, *first_free_cluster tells where to start looking, and need not actually be a free cluster; the returned offset will not be before that cluster. On output, *first_free_cluster points to the first gap found, even if that gap was too small to be used as the returned offset. Yes, *first_free_cluster has nothing to do with the allocated cluster range. The offset of that allocated range will be returned by the function. *first_free_cluster should merely always point somewhere before or at the first gap (of any size) just so alloc_clusters_imrt() does not have to start searching at the beginning of the IMRT next time it is called. However, this also makes it useful to limit the search of the cluster range to the end of the IMRT (or even after the end) by the caller setting it to some arbitrary value, so it has a dual-use. + * + * Note that *first_free_cluster is a cluster index whereas the return value is + * an offset. + */ +static int64_t alloc_clusters_imrt(BlockDriverState *bs, + int cluster_count, + uint16_t **refcount_table, + int64_t *nb_clusters, + int64_t *first_free_cluster) +{ +BDRVQcowState *s = bs->opaque; +int64_t cluster = *first_free_cluster, i; +bool first_gap = true; +int contiguous_free_clusters; + +/* Starting at *first_free_cluster, find a range of at least cluster_count + * continuously free clusters */ +for (contiguous_free_clusters = 0; + cluster < *nb_clusters && contiguous_free_clusters < cluster_count; + cluster++) +{ +if (!(*refcount_table)[cluster]) { +contiguous_free_clusters++; +if (first_gap) { +/* If this is the first free cluster found, update + * *first_free_cluster accordingly */ +*first_free_cluster = cluster; +first_gap = false; +} +} else if (contiguous_free_clusters) { +contiguous_free_clusters = 0; +} [1] Should you be resetting first_gap in the 'else'? If you don't, then *first_free_cluster is NOT the start of the cluster just allocated, but the first free cluster encountered on the way to the eventual allocation. I guess it depends on how the callers are using the information; since the function is static, I guess I'll find out later in my review. Yes, *first_free_cluster is not that allocated cluster. I don't keep the offset in a dedicated variable, because it'll always be cluster - contiguous_free_clusters (see the next comment). +} + +/* If contiguous_free_clusters is greater than zero, it contains the number + * of continuously free clusters until the current cluster; the first free + * cluster in the current "gap" is therefore + * cluster - contiguous_free_clusters */ + +/* If no such range could be found, grow the in-memory refcount table + * accordingly to append free clusters at the end of the image */ +if (contiguous_free_clusters < cluster_count) { +int64_t old_nb_clusters = *nb_clusters; + +/* There already is a gap of contiguous_free_clusters; we need s/gap/tail/, since we are at the end of the table? Well, tail doesn't imply that it's e
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
> +*nb_clusters = cluster + cluster_count - contiguous_free_clusters; > +*refcount_table = g_try_realloc(*refcount_table, > +*nb_clusters * sizeof(uint16_t)); Something tells me that these sizeof(uint16_t) are connected to s->refcount_order and indirectely to REFCOUNT_SHIFT and that this code could benefit from this relationship thus probably saving you work in the future. > +if (!*refcount_table) { > +return -ENOMEM; > +} > + > +memset(*refcount_table + old_nb_clusters, 0, > + (*nb_clusters - old_nb_clusters) * sizeof(uint16_t)); > +} > + > +/* Go back to the first free cluster */ > +cluster -= contiguous_free_clusters; > +for (i = 0; i < cluster_count; i++) { > +(*refcount_table)[cluster + i] = 1; > +} > + > +return cluster << s->cluster_bits; > +} > + > +/* > + * Creates a new refcount structure based solely on the in-memory information > + * given through *refcount_table. All necessary allocations will be reflected > + * in that array. > + * > + * On success, the old refcount structure is leaked (it will be covered by > the > + * new refcount structure). > + */ > +static int rebuild_refcount_structure(BlockDriverState *bs, > + BdrvCheckResult *res, > + uint16_t **refcount_table, > + int64_t *nb_clusters) > +{ > +BDRVQcowState *s = bs->opaque; > +int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0; > +int64_t rb_ofs, rb_start, rb_index; > +uint32_t reftable_size = 0; > +uint64_t *reftable = NULL; > +uint16_t *on_disk_rb; > +int i, ret = 0; > +struct { > +uint64_t rt_offset; > +uint32_t rt_clusters; > +} QEMU_PACKED rt_offset_and_clusters; > + > +qcow2_cache_empty(bs, s->refcount_block_cache); > + > +write_refblocks: > +for (; cluster < *nb_clusters; cluster++) { > +if (!(*refcount_table)[cluster]) { > +continue; > +} > + > +rb_index = cluster >> s->refcount_block_bits; > +rb_start = rb_index << s->refcount_block_bits; > + > +/* Don't allocate a cluster in a refblock already written to disk */ > +if (first_free_cluster < rb_start) { > +first_free_cluster = rb_start; > +} > +rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters, > + &first_free_cluster); > +if (rb_ofs < 0) { > +fprintf(stderr, "ERROR allocating refblock: %s\n", > strerror(-ret)); > +res->check_errors++; > +ret = rb_ofs; > +goto fail; > +} > + > +if (reftable_size <= rb_index) { > +uint32_t old_rt_size = reftable_size; > +reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t), > + s->cluster_size) / sizeof(uint64_t); > +reftable = g_try_realloc(reftable, > + reftable_size * sizeof(uint64_t)); > +if (!reftable) { > +res->check_errors++; > +ret = -ENOMEM; > +goto fail; > +} > + > +memset(reftable + old_rt_size, 0, > + (reftable_size - old_rt_size) * sizeof(uint64_t)); > + > +/* The offset we have for the reftable is now no longer valid; > + * this will leak that range, but we can easily fix that by > running > + * a leak-fixing check after this rebuild operation */ > +rt_ofs = -1; > +} > +reftable[rb_index] = rb_ofs; > + > +/* If this is apparently the last refblock (for now), try to squeeze > the > + * reftable in */ > +if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits && > +rt_ofs < 0) > +{ > +rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, > reftable_size * > + > sizeof(uint64_t)), > + refcount_table, nb_clusters, > + &first_free_cluster); > +if (rt_ofs < 0) { > +fprintf(stderr, "ERROR allocating reftable: %s\n", > +strerror(-ret)); > +res->check_errors++; > +ret = rt_ofs; > +goto fail; > +} > +} > + > +ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size); > +if (ret < 0) { > +fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret)); > +goto fail; > +} > + > +on_disk_rb = g_malloc0(s->cluster_size); > +for (i = 0; i < s->cluster_size / sizeof(uint16_t) && > +rb_start + i < *nb_clusters; i++) > +{ > +
Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
On 08/29/2014 03:41 PM, Max Reitz wrote: > The previous commit introduced the "rebuild" variable to qcow2's > implementation of the image consistency check. Now make use of this by > adding a function which creates a completely new refcount structure > based solely on the in-memory information gathered before. > > The old refcount structure will be leaked, however. Might be worth mentioning in the commit message that a later commit will deal with the leak. > > Signed-off-by: Max Reitz > --- > block/qcow2-refcount.c | 286 > - > 1 file changed, 283 insertions(+), 3 deletions(-) > > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c > index 6300cec..318c152 100644 > --- a/block/qcow2-refcount.c > +++ b/block/qcow2-refcount.c > @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, > BdrvCheckResult *res, > } > > /* > + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast > to > + * the on-disk refcount structures. > + * > + * *first_free_cluster does not necessarily point to the first free cluster, > but > + * may point to one cluster as close as possible before it. The offset > returned > + * will never be before that cluster. Took me a couple reads of the comment and code to understand that. If I'm correct, this alternative wording may be better: On input, *first_free_cluster tells where to start looking, and need not actually be a free cluster; the returned offset will not be before that cluster. On output, *first_free_cluster points to the actual first free cluster found. Or, depending on the semantics you intended [1]: On input, *first_free_cluster tells where to start looking, and need not actually be a free cluster; the returned offset will not be before that cluster. On output, *first_free_cluster points to the first gap found, even if that gap was too small to be used as the returned offset. > + * > + * Note that *first_free_cluster is a cluster index whereas the return value > is > + * an offset. > + */ > +static int64_t alloc_clusters_imrt(BlockDriverState *bs, > + int cluster_count, > + uint16_t **refcount_table, > + int64_t *nb_clusters, > + int64_t *first_free_cluster) > +{ > +BDRVQcowState *s = bs->opaque; > +int64_t cluster = *first_free_cluster, i; > +bool first_gap = true; > +int contiguous_free_clusters; > + > +/* Starting at *first_free_cluster, find a range of at least > cluster_count > + * continuously free clusters */ > +for (contiguous_free_clusters = 0; > + cluster < *nb_clusters && contiguous_free_clusters < cluster_count; > + cluster++) > +{ > +if (!(*refcount_table)[cluster]) { > +contiguous_free_clusters++; > +if (first_gap) { > +/* If this is the first free cluster found, update > + * *first_free_cluster accordingly */ > +*first_free_cluster = cluster; > +first_gap = false; > +} > +} else if (contiguous_free_clusters) { > +contiguous_free_clusters = 0; > +} [1] Should you be resetting first_gap in the 'else'? If you don't, then *first_free_cluster is NOT the start of the cluster just allocated, but the first free cluster encountered on the way to the eventual allocation. I guess it depends on how the callers are using the information; since the function is static, I guess I'll find out later in my review. > +} > + > +/* If contiguous_free_clusters is greater than zero, it contains the > number > + * of continuously free clusters until the current cluster; the first > free > + * cluster in the current "gap" is therefore > + * cluster - contiguous_free_clusters */ > + > +/* If no such range could be found, grow the in-memory refcount table > + * accordingly to append free clusters at the end of the image */ > +if (contiguous_free_clusters < cluster_count) { > +int64_t old_nb_clusters = *nb_clusters; > + > +/* There already is a gap of contiguous_free_clusters; we need s/gap/tail/, since we are at the end of the table? > + * cluster_count clusters; therefore, we have to allocate > + * cluster_count - contiguous_free_clusters new clusters at the end > of > + * the image (which is the current value of cluster; note that > cluster > + * may exceed old_nb_clusters if *first_free_cluster pointed beyond > the > + * image end) */ > +*nb_clusters = cluster + cluster_count - contiguous_free_clusters; > +*refcount_table = g_try_realloc(*refcount_table, > +*nb_clusters * sizeof(uint16_t)); > +if (!*refcount_table) { > +return -ENOMEM; > +} > + > +memset(*refcount_tab