Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity

Paolo Bonzini Fri, 18 Jan 2013 08:23:10 -0800

Il 18/01/2013 16:13, Kevin Wolf ha scritto:
> Am 16.01.2013 18:31, schrieb Paolo Bonzini:
>> When mirroring runs, the backing files for the target may not yet be
>> ready.  However, this means that a copy-on-write operation on the target
>> would fill the missing sectors with zeros.  Copy-on-write only happens
>> if the granularity of the dirty bitmap is smaller than the cluster size
>> (and only for clusters that are allocated in the source after the job
>> has started copying).  So far, the granularity was fixed to 1MB; to avoid
>> the problem we detected the situation and required the backing files to
>> be available in that case only.
>>
>> However, we want to lower the granularity for efficiency, so we need
>> a better solution.  The solution is to always copy a whole cluster the
>> first time it is touched.  The code keeps a bitmap of clusters that
>> have already been allocated by the mirroring job, and only does "manual"
>> copy-on-write if the chunk being copied is zero in the bitmap.
>>
>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
>> ---
>>         v1->v2: rebased for moved include files
>>
>>  block/mirror.c             |   60 
>> +++++++++++++++++++++++++++++++++++++------
>>  blockdev.c                 |   15 ++---------
>>  tests/qemu-iotests/041     |   21 +++++++++++++++
>>  tests/qemu-iotests/041.out |    4 +-
>>  trace-events               |    1 +
>>  5 files changed, 78 insertions(+), 23 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 20cb1e7..ee45e2e 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -15,6 +15,7 @@
>>  #include "block/blockjob.h"
>>  #include "block/block_int.h"
>>  #include "qemu/ratelimit.h"
>> +#include "qemu/bitmap.h"
>>  
>>  enum {
>>      /*
>> @@ -36,6 +37,8 @@ typedef struct MirrorBlockJob {
>>      bool synced;
>>      bool should_complete;
>>      int64_t sector_num;
>> +    size_t buf_size;
>> +    unsigned long *cow_bitmap;
>>      HBitmapIter hbi;
>>      uint8_t *buf;
>>  } MirrorBlockJob;
>> @@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
>>      BlockDriverState *target = s->target;
>>      QEMUIOVector qiov;
>>      int ret, nb_sectors;
>> -    int64_t end;
>> +    int64_t end, sector_num, cluster_num;
>>      struct iovec iov;
>>  
>>      s->sector_num = hbitmap_iter_next(&s->hbi);
>> @@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob 
>> *s,
>>          assert(s->sector_num >= 0);
>>      }
>>  
>> +    /* If we have no backing file yet in the destination, and the cluster 
>> size
>> +     * is very large, we need to do COW ourselves.  The first time a 
>> cluster is
>> +     * copied, copy it entirely.
>> +     *
>> +     * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
>> +     * powers of two, the number of sectors to copy cannot exceed one 
>> cluster.
>> +     */
>> +    sector_num = s->sector_num;
>> +    nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>> +    cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
>> +    if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
>> +        trace_mirror_cow(s, sector_num);
>> +        bdrv_round_to_clusters(s->target,
>> +                               sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
>> +                               &sector_num, &nb_sectors);
>> +        bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
>> +                   nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
> 
> Here the bit in the cow_bitmap is set before the COW has actually been
> performed. It could still fail.
> 
>> +    }
>> +
>>      end = s->common.len >> BDRV_SECTOR_BITS;
>> -    nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
>> -    bdrv_reset_dirty(source, s->sector_num, nb_sectors);
>> +    nb_sectors = MIN(nb_sectors, end - sector_num);
>> +    bdrv_reset_dirty(source, sector_num, nb_sectors);
>>  
>>      /* Copy the dirty cluster.  */
>>      iov.iov_base = s->buf;
>>      iov.iov_len  = nb_sectors * 512;
>>      qemu_iovec_init_external(&qiov, &iov, 1);
>>  
>> -    trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
>> -    ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
>> +    trace_mirror_one_iteration(s, sector_num, nb_sectors);
>> +    ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
>>      if (ret < 0) {
>>          *p_action = mirror_error_action(s, true, -ret);
>>          goto fail;
>>      }
>> -    ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
>> +    ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
>>      if (ret < 0) {
>>          *p_action = mirror_error_action(s, false, -ret);
>>          s->synced = false;
>> @@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob 
>> *s,
>>  
>>  fail:
>>      /* Try again later.  */
>> -    bdrv_set_dirty(source, s->sector_num, nb_sectors);
>> +    bdrv_set_dirty(source, sector_num, nb_sectors);
> 
> If it does, we mark the whole cluster dirty now, but in the cow_bitmap
> it's still marked at present on the target. When restarting the job,
> wouldn't it copy only the start of the cluster next time and corrupt the
> rest of it?


Yes, very good catch.

I think this should fix it.

diff --git a/block/mirror.c b/block/mirror.c
index 82abc2f..0fc140a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -87,6 +87,9 @@ static void mirror_iteration_done(MirrorOp *op)
     cluster_num = op->sector_num / s->granularity;
     nb_chunks = op->nb_sectors / s->granularity;
     bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks);
+    if (s->cow_bitmap) {
+        bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
+    }
 
     trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
     g_slice_free(MirrorOp, op);
@@ -217,9 +220,6 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 
         /* We have enough free space to copy these sectors.  */
         bitmap_set(s->in_flight_bitmap, next_cluster, added_chunks);
-        if (s->cow_bitmap) {
-            bitmap_set(s->cow_bitmap, next_cluster, added_chunks);
-        }
 
         nb_sectors += added_sectors;
         nb_chunks += added_chunks;

I haven't written a testcase for it, it's tricky but should be doable.
Do you want me to respin, or can it be done as a followup?

I would prefer a followup also because it will give a better pointer when
we backport this fix to the RHEL6 code.

Paolo

Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity

Reply via email to