Am 16.02.26 um 1:48 PM schrieb Fiona Ebner:
> Am 16.02.26 um 11:25 AM schrieb Kevin Wolf:
>> Am 12.02.2026 um 13:02 hat Fiona Ebner geschrieben:
>>> Currently, the dirty bitmap is disabled too early and the following
>>> bad scenario is possible:
>>>
>>> 1. Dirty bitmap is disabled in mirror_start_job()
>>> 2. Some request are started in mirror_top_bs while s->job == NULL
>>> 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because
>>>    the request hasn't completed yet, the block isn't allocated
>>> 4. The request completes, still sees s->job == NULL and skips the
>>>    bitmap, and nothing else will mark it dirty either
>>>
>>> One ingredient is that mirror_top_opaque->job is only set after the
>>> job is fully initialized. For the rationale, see commit 32125b1460
>>> ("mirror: Fix access of uninitialised fields during start").
>>>
>>> Disabling the dirty bitmap is safe once bdrv_mirror_top_do_write()
>>> sees that the job is set, because then:
>>>
>>> 1. When not using MIRROR_COPY_MODE_WRITE_BLOCKING, the dirty bitmap
>>>    will be set by bdrv_mirror_top_do_write().
>>>
>>> 2. When using MIRROR_COPY_MODE_WRITE_BLOCKING, writes will be done
>>>    synchronously to the target.
>>>
>>> At least with virtio-blk using iothread-vq-mapping, mirror_run() and
>>> bdrv_mirror_top_do_write() might be called in different threads.
>>> bdrv_disable_dirty_bitmap() acquires and releases the dirty bitmap
>>> mutex, so the memory is synchronized between threads.
>>>
>>> Many thanks to Kevin Wolf for discussing the issue and solutions with
>>> me! [0]
>>>
>>> [0]: 
>>> https://lore.kernel.org/qemu-devel/[email protected]/T/
>>>
>>> Cc: [email protected]
>>> Closes: https://gitlab.com/qemu-project/qemu/-/issues/3273
>>> Signed-off-by: Fiona Ebner <[email protected]>
>>> ---
>>>  block/mirror.c | 21 +++++++++++++++------
>>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index b344182c74..eadd4501e8 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -1123,6 +1123,21 @@ static int coroutine_fn mirror_run(Job *job, Error 
>>> **errp)
>>>       */
>>>      mirror_top_opaque->job = s;
>>>  
>>> +    /*
>>> +     * Disabling the dirty bitmap is safe once bdrv_mirror_top_do_write() 
>>> sees
>>> +     * that the job is set, because then:
>>> +     *
>>> +     * 1. When not using MIRROR_COPY_MODE_WRITE_BLOCKING, the dirty bitmap 
>>> will
>>> +     * be set by bdrv_mirror_top_do_write().
>>> +     *
>>> +     * 2. When using MIRROR_COPY_MODE_WRITE_BLOCKING, writes will be done
>>> +     * synchronously to the target.
>>> +     *
>>> +     * bdrv_disable_dirty_bitmap() acquires and releases the dirty bitmap 
>>> mutex,
>>> +     * so the memory is synchronized between threads.
>>> +     */
>>> +    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>>> +
>>>      assert(!s->dbi);
>>>      s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
>>>      for (;;) {
>>> @@ -2014,12 +2029,6 @@ static BlockJob *mirror_start_job(
>>>          goto fail;
>>>      }
>>>  
>>> -    /*
>>> -     * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in 
>>> active
>>> -     * mode.
>>> -     */
>>> -    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>>> -
>>>      bdrv_graph_wrlock_drained();
>>>      ret = block_job_add_bdrv(&s->common, "source", bs, 0,
>>>                               BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
>>
>> The thing I meant in the other thread is if we don't need something like
>> this additionally:
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index b344182c747..159954158ba 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1672,9 +1672,17 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, 
>> MirrorMethod method,
>>          abort();
>>      }
>>  
>> -    if (!copy_to_target && s->job && s->job->dirty_bitmap) {
>> +    if (!copy_to_target) {
>>          qatomic_set(&s->job->actively_synced, false);
>> -        bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
>> +        if (s->job && s->job->dirty_bitmap) {
>> +            bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
>> +        } else {
>> +            /*
>> +             * Avoid race in the case that mirror_run() disables the bitmap
>> +             * between here and bdrv_co_write_req_finish().
>> +             */
>> +            bdrv_set_dirty(bs, offset, bytes);
>> +        }
>>      }
>>  
>>      if (ret < 0) {
> 
> You're right! It's not enough to ensure that the job is set before the
> bitmap is disabled, because both could happen after the check for job here.
> 
> Isn't this change enough by itself? After the change,
> bdrv_mirror_top_do_write() ensures that:
> 
> 1. When copy_to_target == true, the write is done synchronously to the
> target, no need to set the dirty bitmap.
> 
> 2. When copy_to_target == false, the dirty bitmap is set.
> 
> So it doesn't really matter at which point the dirty bitmap is disabled?
> Or am I missing something again?

Ah right, bdrv_set_dirty() only sets it if still enabled :)

> 
>> By the way, can it ever happen that s->job && !s->job->dirty_bitmap or
>> is that second part of the condition redundant?
> 
> In mirror_exit_common(), we do call bdrv_release_dirty_bitmap() before
> the drained section replacing the node and before setting
> bs_opaque->job = NULL; But AFAICS, we do not actually set
> bs_opaque->job->dirty_bitmap = NULL anywhere. So yes, the second part of
> the condition seems redundant to me too. I think the fact that we don't
> set bs_opaque->job->dirty_bitmap = NULL can't lead to a use-after-free
> either, because mirror_exit_common() happens while drained, because
> mirror_run() drains the source (and thus also the filter on top) before
> returning.
> 
> Best Regards,
> Fiona



Reply via email to