Currently, the dirty bitmap is disabled too early and the following
bad scenario is possible:

1. Dirty bitmap is disabled in mirror_start_job()
2. Some request are started in mirror_top_bs while s->job == NULL
3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because
   the request hasn't completed yet, the block isn't allocated
4. The request completes, still sees s->job == NULL and skips the
   bitmap, and nothing else will mark it dirty either

One ingredient is that mirror_top_opaque->job is only set after the
job is fully initialized. For the rationale, see commit 32125b1460
("mirror: Fix access of uninitialised fields during start").

Disabling the dirty bitmap is safe once bdrv_mirror_top_do_write()
sees that the job is set, because then:

1. When not using MIRROR_COPY_MODE_WRITE_BLOCKING, the dirty bitmap
   will be set by bdrv_mirror_top_do_write().

2. When using MIRROR_COPY_MODE_WRITE_BLOCKING, writes will be done
   synchronously to the target.

At least with virtio-blk using iothread-vq-mapping, mirror_run() and
bdrv_mirror_top_do_write() might be called in different threads.
bdrv_disable_dirty_bitmap() acquires and releases the dirty bitmap
mutex, so the memory is synchronized between threads.

Many thanks to Kevin Wolf for discussing the issue and solutions with
me! [0]

[0]: 
https://lore.kernel.org/qemu-devel/[email protected]/T/

Cc: [email protected]
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273
Signed-off-by: Fiona Ebner <[email protected]>
---
 block/mirror.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b344182c74..eadd4501e8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1123,6 +1123,21 @@ static int coroutine_fn mirror_run(Job *job, Error 
**errp)
      */
     mirror_top_opaque->job = s;
 
+    /*
+     * Disabling the dirty bitmap is safe once bdrv_mirror_top_do_write() sees
+     * that the job is set, because then:
+     *
+     * 1. When not using MIRROR_COPY_MODE_WRITE_BLOCKING, the dirty bitmap will
+     * be set by bdrv_mirror_top_do_write().
+     *
+     * 2. When using MIRROR_COPY_MODE_WRITE_BLOCKING, writes will be done
+     * synchronously to the target.
+     *
+     * bdrv_disable_dirty_bitmap() acquires and releases the dirty bitmap 
mutex,
+     * so the memory is synchronized between threads.
+     */
+    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+
     assert(!s->dbi);
     s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
     for (;;) {
@@ -2014,12 +2029,6 @@ static BlockJob *mirror_start_job(
         goto fail;
     }
 
-    /*
-     * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active
-     * mode.
-     */
-    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
-
     bdrv_graph_wrlock_drained();
     ret = block_job_add_bdrv(&s->common, "source", bs, 0,
                              BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
-- 
2.47.3



Reply via email to