Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-20 Thread Kevin Wolf
Am 19.04.2012 23:18, schrieb Marcelo Tosatti:
 On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
 There is one intended change in functionality in this patch, which is
 that it allocates new clusters even when it could satisfy the first part
 of the request with already allocated clusters. In order to check if
 there is a problem with this scenario, the following patch should revert
 to the old behaviour:

 --- a/block/qcow2-cluster.c
 +++ b/block/qcow2-cluster.c
 @@ -847,7 +847,7 @@ again:
  keep_clusters = count_contiguous_clusters(nb_clusters,
 s-cluster_size,
l2_table[l2_index],
 0, 0);
  assert(keep_clusters = nb_clusters);
 -nb_clusters -= keep_clusters;
 +nb_clusters = 0;
  } else {
  /* For the moment, overwrite compressed clusters one by one */
  if (cluster_offset  QCOW_OFLAG_COMPRESSED) {

 The rest is meant to be a functionally equivalent rewrite of the old
 code that was required in order to allow this change.

 Testing.
 
 Corruption gone with patch above.

Okay, so something must be wrong only with the new code paths, which
makes things a bit easier. I'll have another closer look. Stefan, can
you re-review 250196f1 as well?

Kevin



Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-20 Thread Stefan Hajnoczi
On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 19.04.2012 23:18, schrieb Marcelo Tosatti:
 On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
 There is one intended change in functionality in this patch, which is
 that it allocates new clusters even when it could satisfy the first part
 of the request with already allocated clusters. In order to check if
 there is a problem with this scenario, the following patch should revert
 to the old behaviour:

 --- a/block/qcow2-cluster.c
 +++ b/block/qcow2-cluster.c
 @@ -847,7 +847,7 @@ again:
          keep_clusters = count_contiguous_clusters(nb_clusters,
 s-cluster_size,
                                                    l2_table[l2_index],
 0, 0);
          assert(keep_clusters = nb_clusters);
 -        nb_clusters -= keep_clusters;
 +        nb_clusters = 0;
      } else {
          /* For the moment, overwrite compressed clusters one by one */
          if (cluster_offset  QCOW_OFLAG_COMPRESSED) {

 The rest is meant to be a functionally equivalent rewrite of the old
 code that was required in order to allow this change.

 Testing.

 Corruption gone with patch above.

 Okay, so something must be wrong only with the new code paths, which
 makes things a bit easier. I'll have another closer look. Stefan, can
 you re-review 250196f1 as well?

I'm taking a look.

Stefan



Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-20 Thread Stefan Hajnoczi
On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 19.04.2012 23:18, schrieb Marcelo Tosatti:
 On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
 There is one intended change in functionality in this patch, which is
 that it allocates new clusters even when it could satisfy the first part
 of the request with already allocated clusters. In order to check if
 there is a problem with this scenario, the following patch should revert
 to the old behaviour:

 --- a/block/qcow2-cluster.c
 +++ b/block/qcow2-cluster.c
 @@ -847,7 +847,7 @@ again:
          keep_clusters = count_contiguous_clusters(nb_clusters,
 s-cluster_size,
                                                    l2_table[l2_index],
 0, 0);
          assert(keep_clusters = nb_clusters);
 -        nb_clusters -= keep_clusters;
 +        nb_clusters = 0;
      } else {
          /* For the moment, overwrite compressed clusters one by one */
          if (cluster_offset  QCOW_OFLAG_COMPRESSED) {

 The rest is meant to be a functionally equivalent rewrite of the old
 code that was required in order to allow this change.

 Testing.

 Corruption gone with patch above.

 Okay, so something must be wrong only with the new code paths, which
 makes things a bit easier. I'll have another closer look. Stefan, can
 you re-review 250196f1 as well?

 I'm taking a look.

Just looking at the qemu-img check output that Marcelo posted I'm
interpreting that as OFLAG_COPIED was set on the offset but its
refcount was 2.

The refcount shouldn't be 2 unless internal snapshots were used.

So we probably incremented the refcount either too often or for the
wrong block (which is more likely since the guest sees corruption).

Stefan



Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-20 Thread Kevin Wolf
Am 20.04.2012 15:06, schrieb Stefan Hajnoczi:
 On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 19.04.2012 23:18, schrieb Marcelo Tosatti:
 On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
 There is one intended change in functionality in this patch, which is
 that it allocates new clusters even when it could satisfy the first part
 of the request with already allocated clusters. In order to check if
 there is a problem with this scenario, the following patch should revert
 to the old behaviour:

 --- a/block/qcow2-cluster.c
 +++ b/block/qcow2-cluster.c
 @@ -847,7 +847,7 @@ again:
  keep_clusters = count_contiguous_clusters(nb_clusters,
 s-cluster_size,
l2_table[l2_index],
 0, 0);
  assert(keep_clusters = nb_clusters);
 -nb_clusters -= keep_clusters;
 +nb_clusters = 0;
  } else {
  /* For the moment, overwrite compressed clusters one by one */
  if (cluster_offset  QCOW_OFLAG_COMPRESSED) {

 The rest is meant to be a functionally equivalent rewrite of the old
 code that was required in order to allow this change.

 Testing.

 Corruption gone with patch above.

 Okay, so something must be wrong only with the new code paths, which
 makes things a bit easier. I'll have another closer look. Stefan, can
 you re-review 250196f1 as well?

 I'm taking a look.
 
 Just looking at the qemu-img check output that Marcelo posted I'm
 interpreting that as OFLAG_COPIED was set on the offset but its
 refcount was 2.
 
 The refcount shouldn't be 2 unless internal snapshots were used.
 
 So we probably incremented the refcount either too often or for the
 wrong block (which is more likely since the guest sees corruption).

I've looked at the same thing now and I think the same cluster was used
both for a regular data allocation and for a new refcount block. This
may happen because alloc_refcount_block() uses s-free_cluster_index
which is updated by qcow2_alloc_cluster_noref(), but not by
qcow2_alloc_cluster_at(). Just a theory so far, though, and not yet
tried out in practice. If this is it, a patch would look like this
(completely untested as well):

--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -587,6 +587,7 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs,
uint64_t offset,
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t cluster_index;
+uint64_t old_free_cluster_index;
 int i, refcount, ret;

 /* Check how many clusters there are free */
@@ -602,11 +603,16 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs,
uint64_t offset,
 }

 /* And then allocate them */
+old_free_cluster_index = s-free_cluster_index;
+s-free_cluster_index = cluster_index + i;
+
 ret = update_refcount(bs, offset, i  s-cluster_bits, 1);
 if (ret  0) {
 return ret;
 }

+s-free_cluster_index = old_free_cluster_index;
+
 return i;
 }

Kevin



Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-19 Thread Kevin Wolf
Am 19.04.2012 04:44, schrieb Marcelo Tosatti:
 On Mon, Mar 12, 2012 at 04:19:45PM +0100, Kevin Wolf wrote:
 Signed-off-by: Kevin Wolf kw...@redhat.com
 Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  block/qcow2-cluster.c |   55 
 -
  1 files changed, 36 insertions(+), 19 deletions(-)
 
 Kevin,
 
 Autotest installed Fedora.8.64 guests boot with a corrupt disk 
 (see screenshot).
 
 Reverting
 
 qcow2: Reduce number of I/O requests
 qcow2: Add qcow2_alloc_clusters_at()
 qcow2: Factor out count_cow_clusters
 
 Fixes the problem.

Do you really need to revert the latter two? They should be really
harmless, the first is adding yet dead code, the second one is trivial
code motion.

 Let me know if there is a fix available or you need further information.

No, this is the first report of such corruption, so any further
information would be very helpful. Does qemu-img check find any problems?

There is one intended change in functionality in this patch, which is
that it allocates new clusters even when it could satisfy the first part
of the request with already allocated clusters. In order to check if
there is a problem with this scenario, the following patch should revert
to the old behaviour:

--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -847,7 +847,7 @@ again:
 keep_clusters = count_contiguous_clusters(nb_clusters,
s-cluster_size,
   l2_table[l2_index],
0, 0);
 assert(keep_clusters = nb_clusters);
-nb_clusters -= keep_clusters;
+nb_clusters = 0;
 } else {
 /* For the moment, overwrite compressed clusters one by one */
 if (cluster_offset  QCOW_OFLAG_COMPRESSED) {

The rest is meant to be a functionally equivalent rewrite of the old
code that was required in order to allow this change.

Kevin



Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-19 Thread Marcelo Tosatti
On Thu, Apr 19, 2012 at 09:38:29AM +0200, Kevin Wolf wrote:
 Am 19.04.2012 04:44, schrieb Marcelo Tosatti:
  On Mon, Mar 12, 2012 at 04:19:45PM +0100, Kevin Wolf wrote:
  Signed-off-by: Kevin Wolf kw...@redhat.com
  Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
  ---
   block/qcow2-cluster.c |   55 
  -
   1 files changed, 36 insertions(+), 19 deletions(-)
  
  Kevin,
  
  Autotest installed Fedora.8.64 guests boot with a corrupt disk 
  (see screenshot).
  
  Reverting
  
  qcow2: Reduce number of I/O requests
  qcow2: Add qcow2_alloc_clusters_at()
  qcow2: Factor out count_cow_clusters
  
  Fixes the problem.
 
 Do you really need to revert the latter two? They should be really
 harmless, the first is adding yet dead code, the second one is trivial
 code motion.
 
  Let me know if there is a fix available or you need further information.
 
 No, this is the first report of such corruption, so any further
 information would be very helpful. Does qemu-img check find any problems?

[root@nehalem kvm]# qemu-img info
/root/git/kvm-autotest/client/tests/kvm/images/fc8-64.qcow2 
image: /root/git/kvm-autotest/client/tests/kvm/images/fc8-64.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 4.0G
cluster_size: 65536
[root@nehalem kvm]# qemu-img check
/root/git/kvm-autotest/client/tests/kvm/images/fc8-64.qcow2 
ERROR OFLAG_COPIED: offset=80008000 refcount=2
ERROR refcount block 1 refcount=2

2 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

 There is one intended change in functionality in this patch, which is
 that it allocates new clusters even when it could satisfy the first part
 of the request with already allocated clusters. In order to check if
 there is a problem with this scenario, the following patch should revert
 to the old behaviour:
 
 --- a/block/qcow2-cluster.c
 +++ b/block/qcow2-cluster.c
 @@ -847,7 +847,7 @@ again:
  keep_clusters = count_contiguous_clusters(nb_clusters,
 s-cluster_size,
l2_table[l2_index],
 0, 0);
  assert(keep_clusters = nb_clusters);
 -nb_clusters -= keep_clusters;
 +nb_clusters = 0;
  } else {
  /* For the moment, overwrite compressed clusters one by one */
  if (cluster_offset  QCOW_OFLAG_COMPRESSED) {
 
 The rest is meant to be a functionally equivalent rewrite of the old
 code that was required in order to allow this change.

Testing.




Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-19 Thread Marcelo Tosatti
On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
  There is one intended change in functionality in this patch, which is
  that it allocates new clusters even when it could satisfy the first part
  of the request with already allocated clusters. In order to check if
  there is a problem with this scenario, the following patch should revert
  to the old behaviour:
  
  --- a/block/qcow2-cluster.c
  +++ b/block/qcow2-cluster.c
  @@ -847,7 +847,7 @@ again:
   keep_clusters = count_contiguous_clusters(nb_clusters,
  s-cluster_size,
 l2_table[l2_index],
  0, 0);
   assert(keep_clusters = nb_clusters);
  -nb_clusters -= keep_clusters;
  +nb_clusters = 0;
   } else {
   /* For the moment, overwrite compressed clusters one by one */
   if (cluster_offset  QCOW_OFLAG_COMPRESSED) {
  
  The rest is meant to be a functionally equivalent rewrite of the old
  code that was required in order to allow this change.
 
 Testing.

Corruption gone with patch above.




Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-04-18 Thread Marcelo Tosatti
On Mon, Mar 12, 2012 at 04:19:45PM +0100, Kevin Wolf wrote:
 Signed-off-by: Kevin Wolf kw...@redhat.com
 Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  block/qcow2-cluster.c |   55 
 -
  1 files changed, 36 insertions(+), 19 deletions(-)

Kevin,

Autotest installed Fedora.8.64 guests boot with a corrupt disk 
(see screenshot).

Reverting

qcow2: Reduce number of I/O requests
qcow2: Add qcow2_alloc_clusters_at()
qcow2: Factor out count_cow_clusters

Fixes the problem.

Let me know if there is a fix available or you need further information.
attachment: fedora8-64.png

[Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters

2012-03-12 Thread Kevin Wolf
Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 block/qcow2-cluster.c |   55 -
 1 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a791bbe..903454d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -677,6 +677,41 @@ err:
  }
 
 /*
+ * Returns the number of contiguous clusters that can be used for an allocating
+ * write, but require COW to be performed (this includes yet unallocated space,
+ * which must copy from the backing file)
+ */
+static int count_cow_clusters(BDRVQcowState *s, int nb_clusters,
+uint64_t *l2_table, int l2_index)
+{
+int i = 0;
+uint64_t cluster_offset;
+
+while (i  nb_clusters) {
+i += count_contiguous_clusters(nb_clusters - i, s-cluster_size,
+l2_table[l2_index], i, 0);
+if ((i = nb_clusters) || be64_to_cpu(l2_table[l2_index + i])) {
+break;
+}
+
+i += count_contiguous_free_clusters(nb_clusters - i,
+l2_table[l2_index + i]);
+if (i = nb_clusters) {
+break;
+}
+
+cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
+
+if ((cluster_offset  QCOW_OFLAG_COPIED) ||
+(cluster_offset  QCOW_OFLAG_COMPRESSED))
+break;
+}
+
+assert(i = nb_clusters);
+return i;
+}
+
+/*
  * alloc_cluster_offset
  *
  * For a given offset of the disk image, return cluster offset in qcow2 file.
@@ -739,25 +774,7 @@ again:
 
 /* how many available clusters ? */
 
-while (i  nb_clusters) {
-i += count_contiguous_clusters(nb_clusters - i, s-cluster_size,
-l2_table[l2_index], i, 0);
-if ((i = nb_clusters) || be64_to_cpu(l2_table[l2_index + i])) {
-break;
-}
-
-i += count_contiguous_free_clusters(nb_clusters - i,
-l2_table[l2_index + i]);
-if (i = nb_clusters) {
-break;
-}
-
-cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
-
-if ((cluster_offset  QCOW_OFLAG_COPIED) ||
-(cluster_offset  QCOW_OFLAG_COMPRESSED))
-break;
-}
+i = count_cow_clusters(s, nb_clusters, l2_table, l2_index);
 assert(i = nb_clusters);
 nb_clusters = i;
 
-- 
1.7.6.5