On 14.08.2014 14:11, Benoît Canet wrote:
The Wednesday 13 Aug 2014 à 23:01:45 (+0200), Max Reitz wrote :
If the qcow2 check function detects a refcount block located beyond the
image end, grow the image appropriately. This cannot break anything and
is the logical fix for such a case.

Signed-off-by: Max Reitz <mre...@redhat.com>
---
  block/qcow2-refcount.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++----
  1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d1da8d5..a1d93e5 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1504,7 +1504,8 @@ static int check_refblocks(BlockDriverState *bs, 
BdrvCheckResult *res,
                             int64_t *nb_clusters)
  {
      BDRVQcowState *s = bs->opaque;
-    int64_t i;
+    int64_t i, size;
+    int ret;
for (i = 0; i < s->refcount_table_size; i++) {
          uint64_t offset, cluster;
@@ -1520,9 +1521,50 @@ static int check_refblocks(BlockDriverState *bs, 
BdrvCheckResult *res,
          }
if (cluster >= *nb_clusters) {
-            fprintf(stderr, "ERROR refcount block %" PRId64
-                    " is outside image\n", i);
-            res->corruptions++;
+            fprintf(stderr, "%s refcount block %" PRId64 " is outside image\n",
+                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+
+            if (fix & BDRV_FIX_ERRORS) {
+                int64_t old_nb_clusters = *nb_clusters;
+
+                ret = bdrv_truncate(bs->file, offset + s->cluster_size);
+                if (ret < 0) {
+                    goto resize_fail;
+                }
+                size = bdrv_getlength(bs->file);
+                if (size < 0) {
+                    ret = size;
+                    goto resize_fail;
+                }
+
+                *nb_clusters = size_to_clusters(s, size);
+                assert(*nb_clusters >= old_nb_clusters);
+
+                *refcount_table = g_try_realloc(*refcount_table,
+                        *nb_clusters * sizeof(uint16_t));
+                if (!*refcount_table) {
+                    res->check_errors++;
+                    return -ENOMEM;
So you really want to make sure the code is not trying anything more
by directly returning -ENOMEM and not doing goto resize_fail.

This makes sense though.

+                }
+
+                memset(*refcount_table + old_nb_clusters, 0,
+                       (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
+
+                if (cluster >= *nb_clusters) {
+                    ret = -EINVAL;
+                    goto resize_fail;
+                }
+
+                res->corruptions_fixed++;
+                continue;
+
+resize_fail:
+                res->corruptions++;
+                fprintf(stderr, "ERROR could not resize image: %s\n",
+                        strerror(-ret));
Isn't a "return ret;" missing here ?
the code will fall in the continue statement without it.

And that it should. A corruption is reported to stderr, res->corruptions is incremented and that's it - just as it was without this patch. The only reason I see why we should completely abort here is because resizing the file should always work; if it doesn't, something may be completely wrong. But even that is no real reason to jump the shark; we can still continue with the check and if everything is indeed completely broken, we'll receive EIOs soon enough.

Perhaps I should add a *rebuild = true; here and in the else branch in the next patch, though.

Max

+            } else {
+                res->corruptions++;
+            }
              continue;
          }
--
2.0.3




Reply via email to