Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-16 Thread Eric Blake
On 10/16/2014 09:27 AM, Max Reitz wrote:

>>> +memset(*refcount_table + old_nb_clusters, 0,
>>> +   (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
>> Is this calculation unnecessarily hard-coded to refcount_order==4?
> 
> While now in the process of editing this patch, no, this is not
> hard-coded to that refcount_order. It's hard-coded to *refcount_table
> being uint16_t *. I think this fine.

Correct - our choice of uint16_t* is what hard-codes our current
dependence on refcount_order==4, and we assert that is the case elsewhere.

> Making the in-memory refcount table
> support variable refcount orders would be pretty hard and in fact we do
> not need it before we actually support other refcount_orders.

Agreed. Particularly for refcount orders < 3, where you pack more than
one refcount in a single byte.

> When doing
> that, I (or anyone else) will probably add some function read_refcount()
> which reads a refcount from a refcount block or a concatenation of
> refblocks (such as this in-memory refcount table) while taking into
> account a variable refcount_order. When that is done, we can rework this
> code.

Fair enough. Don't hold up this series for that future improvement.

> 
> But for now it's fine to make the in-memory refcount table entries
> uint16_t, which is the reason for all the sizeof(uint16_t).
> 
> Max
> 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-16 Thread Max Reitz

Am 09.10.2014 um 01:09 schrieb Eric Blake:

On 08/29/2014 03:41 PM, Max Reitz wrote:

+ * cluster_count clusters; therefore, we have to allocate
+ * cluster_count - contiguous_free_clusters new clusters at the end of
+ * the image (which is the current value of cluster; note that cluster
+ * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
+ * image end) */
+*nb_clusters = cluster + cluster_count - contiguous_free_clusters;
+*refcount_table = g_try_realloc(*refcount_table,
+*nb_clusters * sizeof(uint16_t));
+if (!*refcount_table) {
+return -ENOMEM;
+}
+
+memset(*refcount_table + old_nb_clusters, 0,
+   (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));

Is this calculation unnecessarily hard-coded to refcount_order==4?


While now in the process of editing this patch, no, this is not 
hard-coded to that refcount_order. It's hard-coded to *refcount_table 
being uint16_t *. I think this fine. Making the in-memory refcount table 
support variable refcount orders would be pretty hard and in fact we do 
not need it before we actually support other refcount_orders. When doing 
that, I (or anyone else) will probably add some function read_refcount() 
which reads a refcount from a refcount block or a concatenation of 
refblocks (such as this in-memory refcount table) while taking into 
account a variable refcount_order. When that is done, we can rework this 
code.


But for now it's fine to make the in-memory refcount table entries 
uint16_t, which is the reason for all the sizeof(uint16_t).


Max



Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-12 Thread Max Reitz

Am 11.10.2014 um 20:56 schrieb Benoît Canet:

+int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
+int64_t rb_ofs, rb_start, rb_index;

Everytime a few day pass and I read rb_ofs and rt_ofs again I found these names
obfuscated. I know Linus says that C is a spartan language but these look odd.
Maybe refcount_block_offset, refcount_block_* and refcount_table_offset would be
better.


I would use longer names if there was no line length limit. ;-)

I'll try and see how it looks.

Max



Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-11 Thread Benoît Canet

> +int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
> +int64_t rb_ofs, rb_start, rb_index;

Everytime a few day pass and I read rb_ofs and rt_ofs again I found these names
obfuscated. I know Linus says that C is a spartan language but these look odd.
Maybe refcount_block_offset, refcount_block_* and refcount_table_offset would be
better.

I'll try to read this patch for real before you repost it.

Best regards

Benoît



Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-11 Thread Max Reitz

Am 10.10.2014 um 14:44 schrieb Benoît Canet:

+*nb_clusters = cluster + cluster_count - contiguous_free_clusters;
+*refcount_table = g_try_realloc(*refcount_table,
+*nb_clusters * sizeof(uint16_t));

Something tells me that these sizeof(uint16_t) are connected to 
s->refcount_order
and indirectely to REFCOUNT_SHIFT and that this code could benefit from this
relationship thus probably saving you work in the future.


Yes, see the answer to Eric. I'm totally in favor of variable refcounts, 
see my "qcow2: Drop REFCOUNT_SHIFT series". ;-)


Once again, thank you for your reviews!

Max


+if (!*refcount_table) {
+return -ENOMEM;
+}
+
+memset(*refcount_table + old_nb_clusters, 0,
+   (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
+}
+
+/* Go back to the first free cluster */
+cluster -= contiguous_free_clusters;
+for (i = 0; i < cluster_count; i++) {
+(*refcount_table)[cluster + i] = 1;
+}
+
+return cluster << s->cluster_bits;
+}
+
+/*
+ * Creates a new refcount structure based solely on the in-memory information
+ * given through *refcount_table. All necessary allocations will be reflected
+ * in that array.
+ *
+ * On success, the old refcount structure is leaked (it will be covered by the
+ * new refcount structure).
+ */
+static int rebuild_refcount_structure(BlockDriverState *bs,
+  BdrvCheckResult *res,
+  uint16_t **refcount_table,
+  int64_t *nb_clusters)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
+int64_t rb_ofs, rb_start, rb_index;
+uint32_t reftable_size = 0;
+uint64_t *reftable = NULL;
+uint16_t *on_disk_rb;
+int i, ret = 0;
+struct {
+uint64_t rt_offset;
+uint32_t rt_clusters;
+} QEMU_PACKED rt_offset_and_clusters;
+
+qcow2_cache_empty(bs, s->refcount_block_cache);
+
+write_refblocks:
+for (; cluster < *nb_clusters; cluster++) {
+if (!(*refcount_table)[cluster]) {
+continue;
+}
+
+rb_index = cluster >> s->refcount_block_bits;
+rb_start = rb_index << s->refcount_block_bits;
+
+/* Don't allocate a cluster in a refblock already written to disk */
+if (first_free_cluster < rb_start) {
+first_free_cluster = rb_start;
+}
+rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
+ &first_free_cluster);
+if (rb_ofs < 0) {
+fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
+res->check_errors++;
+ret = rb_ofs;
+goto fail;
+}
+
+if (reftable_size <= rb_index) {
+uint32_t old_rt_size = reftable_size;
+reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
+ s->cluster_size) / sizeof(uint64_t);
+reftable = g_try_realloc(reftable,
+ reftable_size * sizeof(uint64_t));
+if (!reftable) {
+res->check_errors++;
+ret = -ENOMEM;
+goto fail;
+}
+
+memset(reftable + old_rt_size, 0,
+   (reftable_size - old_rt_size) * sizeof(uint64_t));
+
+/* The offset we have for the reftable is now no longer valid;
+ * this will leak that range, but we can easily fix that by running
+ * a leak-fixing check after this rebuild operation */
+rt_ofs = -1;
+}
+reftable[rb_index] = rb_ofs;
+
+/* If this is apparently the last refblock (for now), try to squeeze 
the
+ * reftable in */
+if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
+rt_ofs < 0)
+{
+rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size 
*
+  
sizeof(uint64_t)),
+ refcount_table, nb_clusters,
+ &first_free_cluster);
+if (rt_ofs < 0) {
+fprintf(stderr, "ERROR allocating reftable: %s\n",
+strerror(-ret));
+res->check_errors++;
+ret = rt_ofs;
+goto fail;
+}
+}
+
+ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
+if (ret < 0) {
+fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+goto fail;
+}
+
+on_disk_rb = g_malloc0(s->cluster_size);
+for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
+rb_start + i < *nb_clusters; i++)
+{
+on_disk_rb[i] = cpu_to_be16((*ref

Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-11 Thread Max Reitz

Am 09.10.2014 um 01:09 schrieb Eric Blake:

On 08/29/2014 03:41 PM, Max Reitz wrote:

The previous commit introduced the "rebuild" variable to qcow2's
implementation of the image consistency check. Now make use of this by
adding a function which creates a completely new refcount structure
based solely on the in-memory information gathered before.

The old refcount structure will be leaked, however.

Might be worth mentioning in the commit message that a later commit will
deal with the leak.


Signed-off-by: Max Reitz 
---
  block/qcow2-refcount.c | 286 -
  1 file changed, 283 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6300cec..318c152 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  }
  
  /*

+ * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
+ * the on-disk refcount structures.
+ *
+ * *first_free_cluster does not necessarily point to the first free cluster, 
but
+ * may point to one cluster as close as possible before it. The offset returned
+ * will never be before that cluster.

Took me a couple reads of the comment and code to understand that.  If
I'm correct, this alternative wording may be better:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the actual first free
cluster found.

Or, depending on the semantics you intended [1]:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the first gap found,
even if that gap was too small to be used as the returned offset.


Yes, *first_free_cluster has nothing to do with the allocated cluster 
range. The offset of that allocated range will be returned by the 
function. *first_free_cluster should merely always point somewhere 
before or at the first gap (of any size) just so alloc_clusters_imrt() 
does not have to start searching at the beginning of the IMRT next time 
it is called. However, this also makes it useful to limit the search of 
the cluster range to the end of the IMRT (or even after the end) by the 
caller setting it to some arbitrary value, so it has a dual-use.



+ *
+ * Note that *first_free_cluster is a cluster index whereas the return value is
+ * an offset.
+ */
+static int64_t alloc_clusters_imrt(BlockDriverState *bs,
+   int cluster_count,
+   uint16_t **refcount_table,
+   int64_t *nb_clusters,
+   int64_t *first_free_cluster)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t cluster = *first_free_cluster, i;
+bool first_gap = true;
+int contiguous_free_clusters;
+
+/* Starting at *first_free_cluster, find a range of at least cluster_count
+ * continuously free clusters */
+for (contiguous_free_clusters = 0;
+ cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
+ cluster++)
+{
+if (!(*refcount_table)[cluster]) {
+contiguous_free_clusters++;
+if (first_gap) {
+/* If this is the first free cluster found, update
+ * *first_free_cluster accordingly */
+*first_free_cluster = cluster;
+first_gap = false;
+}
+} else if (contiguous_free_clusters) {
+contiguous_free_clusters = 0;
+}

[1] Should you be resetting first_gap in the 'else'?  If you don't, then
*first_free_cluster is NOT the start of the cluster just allocated, but
the first free cluster encountered on the way to the eventual
allocation.  I guess it depends on how the callers are using the
information; since the function is static, I guess I'll find out later
in my review.


Yes, *first_free_cluster is not that allocated cluster. I don't keep the 
offset in a dedicated variable, because it'll always be cluster - 
contiguous_free_clusters (see the next comment).



+}
+
+/* If contiguous_free_clusters is greater than zero, it contains the number
+ * of continuously free clusters until the current cluster; the first free
+ * cluster in the current "gap" is therefore
+ * cluster - contiguous_free_clusters */
+
+/* If no such range could be found, grow the in-memory refcount table
+ * accordingly to append free clusters at the end of the image */
+if (contiguous_free_clusters < cluster_count) {
+int64_t old_nb_clusters = *nb_clusters;
+
+/* There already is a gap of contiguous_free_clusters; we need

s/gap/tail/, since we are at the end of the table?


Well, tail doesn't imply that it's e

Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-10 Thread Benoît Canet
> +*nb_clusters = cluster + cluster_count - contiguous_free_clusters;
> +*refcount_table = g_try_realloc(*refcount_table,
> +*nb_clusters * sizeof(uint16_t));

Something tells me that these sizeof(uint16_t) are connected to 
s->refcount_order
and indirectely to REFCOUNT_SHIFT and that this code could benefit from this
relationship thus probably saving you work in the future.


> +if (!*refcount_table) {
> +return -ENOMEM;
> +}
> +
> +memset(*refcount_table + old_nb_clusters, 0,
> +   (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
> +}
> +
> +/* Go back to the first free cluster */
> +cluster -= contiguous_free_clusters;
> +for (i = 0; i < cluster_count; i++) {
> +(*refcount_table)[cluster + i] = 1;
> +}
> +
> +return cluster << s->cluster_bits;
> +}
> +
> +/*
> + * Creates a new refcount structure based solely on the in-memory information
> + * given through *refcount_table. All necessary allocations will be reflected
> + * in that array.
> + *
> + * On success, the old refcount structure is leaked (it will be covered by 
> the
> + * new refcount structure).
> + */
> +static int rebuild_refcount_structure(BlockDriverState *bs,
> +  BdrvCheckResult *res,
> +  uint16_t **refcount_table,
> +  int64_t *nb_clusters)
> +{
> +BDRVQcowState *s = bs->opaque;
> +int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
> +int64_t rb_ofs, rb_start, rb_index;
> +uint32_t reftable_size = 0;
> +uint64_t *reftable = NULL;
> +uint16_t *on_disk_rb;
> +int i, ret = 0;
> +struct {
> +uint64_t rt_offset;
> +uint32_t rt_clusters;
> +} QEMU_PACKED rt_offset_and_clusters;
> +
> +qcow2_cache_empty(bs, s->refcount_block_cache);
> +
> +write_refblocks:
> +for (; cluster < *nb_clusters; cluster++) {
> +if (!(*refcount_table)[cluster]) {
> +continue;
> +}
> +
> +rb_index = cluster >> s->refcount_block_bits;
> +rb_start = rb_index << s->refcount_block_bits;
> +
> +/* Don't allocate a cluster in a refblock already written to disk */
> +if (first_free_cluster < rb_start) {
> +first_free_cluster = rb_start;
> +}
> +rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
> + &first_free_cluster);
> +if (rb_ofs < 0) {
> +fprintf(stderr, "ERROR allocating refblock: %s\n", 
> strerror(-ret));
> +res->check_errors++;
> +ret = rb_ofs;
> +goto fail;
> +}
> +
> +if (reftable_size <= rb_index) {
> +uint32_t old_rt_size = reftable_size;
> +reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
> + s->cluster_size) / sizeof(uint64_t);
> +reftable = g_try_realloc(reftable,
> + reftable_size * sizeof(uint64_t));
> +if (!reftable) {
> +res->check_errors++;
> +ret = -ENOMEM;
> +goto fail;
> +}
> +
> +memset(reftable + old_rt_size, 0,
> +   (reftable_size - old_rt_size) * sizeof(uint64_t));
> +
> +/* The offset we have for the reftable is now no longer valid;
> + * this will leak that range, but we can easily fix that by 
> running
> + * a leak-fixing check after this rebuild operation */
> +rt_ofs = -1;
> +}
> +reftable[rb_index] = rb_ofs;
> +
> +/* If this is apparently the last refblock (for now), try to squeeze 
> the
> + * reftable in */
> +if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> +rt_ofs < 0)
> +{
> +rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, 
> reftable_size *
> +  
> sizeof(uint64_t)),
> + refcount_table, nb_clusters,
> + &first_free_cluster);
> +if (rt_ofs < 0) {
> +fprintf(stderr, "ERROR allocating reftable: %s\n",
> +strerror(-ret));
> +res->check_errors++;
> +ret = rt_ofs;
> +goto fail;
> +}
> +}
> +
> +ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
> +if (ret < 0) {
> +fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +goto fail;
> +}
> +
> +on_disk_rb = g_malloc0(s->cluster_size);
> +for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
> +rb_start + i < *nb_clusters; i++)
> +{
> +

Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check

2014-10-08 Thread Eric Blake
On 08/29/2014 03:41 PM, Max Reitz wrote:
> The previous commit introduced the "rebuild" variable to qcow2's
> implementation of the image consistency check. Now make use of this by
> adding a function which creates a completely new refcount structure
> based solely on the in-memory information gathered before.
> 
> The old refcount structure will be leaked, however.

Might be worth mentioning in the commit message that a later commit will
deal with the leak.

> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2-refcount.c | 286 
> -
>  1 file changed, 283 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 6300cec..318c152 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  }
>  
>  /*
> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast 
> to
> + * the on-disk refcount structures.
> + *
> + * *first_free_cluster does not necessarily point to the first free cluster, 
> but
> + * may point to one cluster as close as possible before it. The offset 
> returned
> + * will never be before that cluster.

Took me a couple reads of the comment and code to understand that.  If
I'm correct, this alternative wording may be better:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the actual first free
cluster found.

Or, depending on the semantics you intended [1]:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the first gap found,
even if that gap was too small to be used as the returned offset.

> + *
> + * Note that *first_free_cluster is a cluster index whereas the return value 
> is
> + * an offset.
> + */
> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
> +   int cluster_count,
> +   uint16_t **refcount_table,
> +   int64_t *nb_clusters,
> +   int64_t *first_free_cluster)
> +{
> +BDRVQcowState *s = bs->opaque;
> +int64_t cluster = *first_free_cluster, i;
> +bool first_gap = true;
> +int contiguous_free_clusters;
> +
> +/* Starting at *first_free_cluster, find a range of at least 
> cluster_count
> + * continuously free clusters */
> +for (contiguous_free_clusters = 0;
> + cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
> + cluster++)
> +{
> +if (!(*refcount_table)[cluster]) {
> +contiguous_free_clusters++;
> +if (first_gap) {
> +/* If this is the first free cluster found, update
> + * *first_free_cluster accordingly */
> +*first_free_cluster = cluster;
> +first_gap = false;
> +}
> +} else if (contiguous_free_clusters) {
> +contiguous_free_clusters = 0;
> +}

[1] Should you be resetting first_gap in the 'else'?  If you don't, then
*first_free_cluster is NOT the start of the cluster just allocated, but
the first free cluster encountered on the way to the eventual
allocation.  I guess it depends on how the callers are using the
information; since the function is static, I guess I'll find out later
in my review.

> +}
> +
> +/* If contiguous_free_clusters is greater than zero, it contains the 
> number
> + * of continuously free clusters until the current cluster; the first 
> free
> + * cluster in the current "gap" is therefore
> + * cluster - contiguous_free_clusters */
> +
> +/* If no such range could be found, grow the in-memory refcount table
> + * accordingly to append free clusters at the end of the image */
> +if (contiguous_free_clusters < cluster_count) {
> +int64_t old_nb_clusters = *nb_clusters;
> +
> +/* There already is a gap of contiguous_free_clusters; we need

s/gap/tail/, since we are at the end of the table?

> + * cluster_count clusters; therefore, we have to allocate
> + * cluster_count - contiguous_free_clusters new clusters at the end 
> of
> + * the image (which is the current value of cluster; note that 
> cluster
> + * may exceed old_nb_clusters if *first_free_cluster pointed beyond 
> the
> + * image end) */
> +*nb_clusters = cluster + cluster_count - contiguous_free_clusters;
> +*refcount_table = g_try_realloc(*refcount_table,
> +*nb_clusters * sizeof(uint16_t));
> +if (!*refcount_table) {
> +return -ENOMEM;
> +}
> +
> +memset(*refcount_tab