Am 16.02.26 um 11:25 AM schrieb Kevin Wolf:
> Am 12.02.2026 um 13:02 hat Fiona Ebner geschrieben:
>> Currently, the dirty bitmap is disabled too early and the following
>> bad scenario is possible:
>>
>> 1. Dirty bitmap is disabled in mirror_start_job()
>> 2. Some request are started in mirror_top_bs while s->job == NULL
>> 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because
>>    the request hasn't completed yet, the block isn't allocated
>> 4. The request completes, still sees s->job == NULL and skips the
>>    bitmap, and nothing else will mark it dirty either
>>
>> One ingredient is that mirror_top_opaque->job is only set after the
>> job is fully initialized. For the rationale, see commit 32125b1460
>> ("mirror: Fix access of uninitialised fields during start").
>>
>> Disabling the dirty bitmap is safe once bdrv_mirror_top_do_write()
>> sees that the job is set, because then:
>>
>> 1. When not using MIRROR_COPY_MODE_WRITE_BLOCKING, the dirty bitmap
>>    will be set by bdrv_mirror_top_do_write().
>>
>> 2. When using MIRROR_COPY_MODE_WRITE_BLOCKING, writes will be done
>>    synchronously to the target.
>>
>> At least with virtio-blk using iothread-vq-mapping, mirror_run() and
>> bdrv_mirror_top_do_write() might be called in different threads.
>> bdrv_disable_dirty_bitmap() acquires and releases the dirty bitmap
>> mutex, so the memory is synchronized between threads.
>>
>> Many thanks to Kevin Wolf for discussing the issue and solutions with
>> me! [0]
>>
>> [0]: 
>> https://lore.kernel.org/qemu-devel/[email protected]/T/
>>
>> Cc: [email protected]
>> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273
>> Signed-off-by: Fiona Ebner <[email protected]>
>> ---
>>  block/mirror.c | 21 +++++++++++++++------
>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index b344182c74..eadd4501e8 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1123,6 +1123,21 @@ static int coroutine_fn mirror_run(Job *job, Error 
>> **errp)
>>       */
>>      mirror_top_opaque->job = s;
>>  
>> +    /*
>> +     * Disabling the dirty bitmap is safe once bdrv_mirror_top_do_write() 
>> sees
>> +     * that the job is set, because then:
>> +     *
>> +     * 1. When not using MIRROR_COPY_MODE_WRITE_BLOCKING, the dirty bitmap 
>> will
>> +     * be set by bdrv_mirror_top_do_write().
>> +     *
>> +     * 2. When using MIRROR_COPY_MODE_WRITE_BLOCKING, writes will be done
>> +     * synchronously to the target.
>> +     *
>> +     * bdrv_disable_dirty_bitmap() acquires and releases the dirty bitmap 
>> mutex,
>> +     * so the memory is synchronized between threads.
>> +     */
>> +    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>> +
>>      assert(!s->dbi);
>>      s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
>>      for (;;) {
>> @@ -2014,12 +2029,6 @@ static BlockJob *mirror_start_job(
>>          goto fail;
>>      }
>>  
>> -    /*
>> -     * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in 
>> active
>> -     * mode.
>> -     */
>> -    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>> -
>>      bdrv_graph_wrlock_drained();
>>      ret = block_job_add_bdrv(&s->common, "source", bs, 0,
>>                               BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
> 
> The thing I meant in the other thread is if we don't need something like
> this additionally:
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index b344182c747..159954158ba 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1672,9 +1672,17 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, 
> MirrorMethod method,
>          abort();
>      }
>  
> -    if (!copy_to_target && s->job && s->job->dirty_bitmap) {
> +    if (!copy_to_target) {
>          qatomic_set(&s->job->actively_synced, false);
> -        bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
> +        if (s->job && s->job->dirty_bitmap) {
> +            bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
> +        } else {
> +            /*
> +             * Avoid race in the case that mirror_run() disables the bitmap
> +             * between here and bdrv_co_write_req_finish().
> +             */
> +            bdrv_set_dirty(bs, offset, bytes);
> +        }
>      }
>  
>      if (ret < 0) {

You're right! It's not enough to ensure that the job is set before the
bitmap is disabled, because both could happen after the check for job here.

Isn't this change enough by itself? After the change,
bdrv_mirror_top_do_write() ensures that:

1. When copy_to_target == true, the write is done synchronously to the
target, no need to set the dirty bitmap.

2. When copy_to_target == false, the dirty bitmap is set.

So it doesn't really matter at which point the dirty bitmap is disabled?
Or am I missing something again?

> By the way, can it ever happen that s->job && !s->job->dirty_bitmap or
> is that second part of the condition redundant?

In mirror_exit_common(), we do call bdrv_release_dirty_bitmap() before
the drained section replacing the node and before setting
bs_opaque->job = NULL; But AFAICS, we do not actually set
bs_opaque->job->dirty_bitmap = NULL anywhere. So yes, the second part of
the condition seems redundant to me too. I think the fact that we don't
set bs_opaque->job->dirty_bitmap = NULL can't lead to a use-after-free
either, because mirror_exit_common() happens while drained, because
mirror_run() drains the source (and thus also the filter on top) before
returning.

Best Regards,
Fiona


Reply via email to