Il 18/01/2013 16:13, Kevin Wolf ha scritto: > Am 16.01.2013 18:31, schrieb Paolo Bonzini: >> When mirroring runs, the backing files for the target may not yet be >> ready. However, this means that a copy-on-write operation on the target >> would fill the missing sectors with zeros. Copy-on-write only happens >> if the granularity of the dirty bitmap is smaller than the cluster size >> (and only for clusters that are allocated in the source after the job >> has started copying). So far, the granularity was fixed to 1MB; to avoid >> the problem we detected the situation and required the backing files to >> be available in that case only. >> >> However, we want to lower the granularity for efficiency, so we need >> a better solution. The solution is to always copy a whole cluster the >> first time it is touched. The code keeps a bitmap of clusters that >> have already been allocated by the mirroring job, and only does "manual" >> copy-on-write if the chunk being copied is zero in the bitmap. >> >> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> >> --- >> v1->v2: rebased for moved include files >> >> block/mirror.c | 60 >> +++++++++++++++++++++++++++++++++++++------ >> blockdev.c | 15 ++--------- >> tests/qemu-iotests/041 | 21 +++++++++++++++ >> tests/qemu-iotests/041.out | 4 +- >> trace-events | 1 + >> 5 files changed, 78 insertions(+), 23 deletions(-) >> >> diff --git a/block/mirror.c b/block/mirror.c >> index 20cb1e7..ee45e2e 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -15,6 +15,7 @@ >> #include "block/blockjob.h" >> #include "block/block_int.h" >> #include "qemu/ratelimit.h" >> +#include "qemu/bitmap.h" >> >> enum { >> /* >> @@ -36,6 +37,8 @@ typedef struct MirrorBlockJob { >> bool synced; >> bool should_complete; >> int64_t sector_num; >> + size_t buf_size; >> + unsigned long *cow_bitmap; >> HBitmapIter hbi; >> uint8_t *buf; >> } MirrorBlockJob; >> @@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, >> BlockDriverState *target = s->target; >> QEMUIOVector qiov; >> int ret, nb_sectors; >> - int64_t end; >> + int64_t end, sector_num, cluster_num; >> struct iovec iov; >> >> s->sector_num = hbitmap_iter_next(&s->hbi); >> @@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob >> *s, >> assert(s->sector_num >= 0); >> } >> >> + /* If we have no backing file yet in the destination, and the cluster >> size >> + * is very large, we need to do COW ourselves. The first time a >> cluster is >> + * copied, copy it entirely. >> + * >> + * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are >> + * powers of two, the number of sectors to copy cannot exceed one >> cluster. >> + */ >> + sector_num = s->sector_num; >> + nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; >> + cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK; >> + if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) { >> + trace_mirror_cow(s, sector_num); >> + bdrv_round_to_clusters(s->target, >> + sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK, >> + §or_num, &nb_sectors); >> + bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK, >> + nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK); > > Here the bit in the cow_bitmap is set before the COW has actually been > performed. It could still fail. > >> + } >> + >> end = s->common.len >> BDRV_SECTOR_BITS; >> - nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num); >> - bdrv_reset_dirty(source, s->sector_num, nb_sectors); >> + nb_sectors = MIN(nb_sectors, end - sector_num); >> + bdrv_reset_dirty(source, sector_num, nb_sectors); >> >> /* Copy the dirty cluster. */ >> iov.iov_base = s->buf; >> iov.iov_len = nb_sectors * 512; >> qemu_iovec_init_external(&qiov, &iov, 1); >> >> - trace_mirror_one_iteration(s, s->sector_num, nb_sectors); >> - ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov); >> + trace_mirror_one_iteration(s, sector_num, nb_sectors); >> + ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov); >> if (ret < 0) { >> *p_action = mirror_error_action(s, true, -ret); >> goto fail; >> } >> - ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov); >> + ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov); >> if (ret < 0) { >> *p_action = mirror_error_action(s, false, -ret); >> s->synced = false; >> @@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob >> *s, >> >> fail: >> /* Try again later. */ >> - bdrv_set_dirty(source, s->sector_num, nb_sectors); >> + bdrv_set_dirty(source, sector_num, nb_sectors); > > If it does, we mark the whole cluster dirty now, but in the cow_bitmap > it's still marked at present on the target. When restarting the job, > wouldn't it copy only the start of the cluster next time and corrupt the > rest of it?
Yes, very good catch. I think this should fix it. diff --git a/block/mirror.c b/block/mirror.c index 82abc2f..0fc140a 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -87,6 +87,9 @@ static void mirror_iteration_done(MirrorOp *op) cluster_num = op->sector_num / s->granularity; nb_chunks = op->nb_sectors / s->granularity; bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks); + if (s->cow_bitmap) { + bitmap_set(s->cow_bitmap, cluster_num, nb_chunks); + } trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors); g_slice_free(MirrorOp, op); @@ -217,9 +220,6 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s) /* We have enough free space to copy these sectors. */ bitmap_set(s->in_flight_bitmap, next_cluster, added_chunks); - if (s->cow_bitmap) { - bitmap_set(s->cow_bitmap, next_cluster, added_chunks); - } nb_sectors += added_sectors; nb_chunks += added_chunks; I haven't written a testcase for it, it's tricky but should be doable. Do you want me to respin, or can it be done as a followup? I would prefer a followup also because it will give a better pointer when we backport this fix to the RHEL6 code. Paolo