Re: (in guest) disk corruption during snapshots

Fiona Ebner Thu, 12 Feb 2026 01:56:47 -0800

Am 10.02.26 um 10:18 PM schrieb Kevin Wolf:
> Am 09.02.2026 um 12:42 hat Fiona Ebner geschrieben:
>> Am 05.02.26 um 5:26 PM schrieb Kevin Wolf:
>>> Am 05.02.2026 um 11:45 hat Fiona Ebner geschrieben:
>>>> Am 05.02.26 um 10:28 AM schrieb Kevin Wolf:
>>>>> Am 05.02.2026 um 09:59 hat Fiona Ebner geschrieben:
>>>>>> Hi,
>>>>>>
>>>>>> Am 03.02.26 um 3:24 PM schrieb Jean-Louis Dupond:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Since some months we were observing disk corruption within the VM when
>>>>>>> enabling backups (which triggers snapshots).
>>>>>>> After a lot of troubleshooting, we were able to track down the commit
>>>>>>> that caused it:
>>>>>>> https://gitlab.com/qemu-project/qemu/-/
>>>>>>> commit/058cfca5645a9ed7cb2bdb77d15f2eacaf343694
>>>>>>>
>>>>>>> More info in the issue:
>>>>>>> https://gitlab.com/qemu-project/qemu/-/issues/3273
>>>>>>>
>>>>>>> Now this seems to be caused by a race between disabling the
>>>>>>> dirty_bitmaps and the tracking implemented in the mirror top layer.
>>>>>>> Kevin shared me a possible solution:
>>>>>>>
>>>>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>>>>> index b344182c747..f76e43f22c1 100644
>>>>>>> --- a/block/mirror.c
>>>>>>> +++ b/block/mirror.c
>>>>>>> @@ -1122,6 +1122,9 @@ static int coroutine_fn mirror_run(Job *job, 
>>>>>>> Error **errp)
>>>>>>>       * accessing it.
>>>>>>>       */
>>>>>>>      mirror_top_opaque->job = s;
>>>>>>> +    if (s->copy_mode != MIRROR_COPY_MODE_WRITE_BLOCKING) {
>>>>>>> +        bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>>>>>>> +    }
>>>>>>>  
>>>>>>>      assert(!s->dbi);
>>>>>>>      s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
>>>>>>> @@ -2018,7 +2021,9 @@ static BlockJob *mirror_start_job(
>>>>>>>       * The dirty bitmap is set by bdrv_mirror_top_do_write() when not 
>>>>>>> in active
>>>>>>>       * mode.
>>>>>>>       */
>>>>>>> -    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>>>>>>> +    if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) {
>>>>>>> +        bdrv_disable_dirty_bitmap(s->dirty_bitmap);
>>>>>>> +    }
>>>>>>>  
>>>>>>>      bdrv_graph_wrlock_drained();
>>>>>>>      ret = block_job_add_bdrv(&s->common, "source", bs, 0,
>>>>>>>
>>>>>>>
>>>>>>> Running this for some hours already, and it seems to fix the issue.
>>>>>>>
>>>>>>> Let's open up the discussion if this is the proper way to fix it, or if
>>>>>>> there are better alternatives :)
>>>>>>
>>>>>> Yes, I do think this is the proper solution. I ended up with essentially
>>>>>> the same yesterday (see the Gitlab issue). I moved the disabling
>>>>>> unconditionally, but there should be no need to delay the disabling if
>>>>>> using write-blocking mode. I would suggest moving the comment on top of
>>>>>> the changed hunk in mirror_start_job() along to the new hunk in
>>>>>> mirror_run().
>>>>>
>>>>> I didn't actually write this as a proper patch, but just as a quick
>>>>> thing Jean-Louis could test, so yes, all of the details are up for
>>>>> discussion.
>>>>>
>>>>> You're right that there is no need to delay disabling the bitmap in
>>>>> active mode, but it probably also doesn't hurt to keep it enabled for a
>>>>> little while? Maybe we should do it like you did just to keep the code a
>>>>> little simpler.
>>>>
>>>> Yes, it shouldn't hurt. Looking at the code again,
>>>> should_copy_to_target() will evaluate to false before the job is set, so
>>>> it might be necessary for write-blocking mode too?
>>>
>>> I didn't check that part yet, but that is actually a very good point!
>>>
>>> Yes, I agree, so your version of the fix is even more correct. It seems
>>> write blocking mode is not actually write blocking until the very same
>>> point in the code. (We should mention that in the comment as well.)
>>>
>>>>> There is another approach I had in mind, but wasn't as sure about, so I
>>>>> didn't suggest it for the testing to see if this race is even the
>>>>> problem we're seeing in practice: Is there a reason why populating the
>>>>> initial dirty bitmap (i.e. the second part of mirror_dirty_init()) can't
>>>>> just come after setting 'mirror_top_opaque->job'? Then we could simply
>>>>> leave the bitmap disabled all the time and rely solely on the mirror
>>>>> job's own tracking. That would feel a little more consistent than using
>>>>> the block layer just to plug a small race window during startup.
>>>>
>>>> Interesting idea! In write-blocking mode, it will lead to writes during
>>>> dirty bitmap initialization already being synced to the target, so
>>>> leading to a bit of duplicate work.
>>>
>>> That's true, and I don't see an easy way to avoid it if we do things in
>>> this order. Probably not a big deal in practice, but also a bit ugly.
>>
>> We do support changing the copy mode after the job is started (see
>> mirror_change()). If we really want to avoid the duplicate work, we
>> could always start in background mode and switch to write-blocking only
>> after initializing the bitmap, if that's what the user wanted. There is
>> a comment in mirror_change() stating:
>>
>> The implementation relies on the fact that copy_mode is only written
>> under the BQL. Otherwise, further synchronization would be required.
>>
>> so we'd need to be careful if going for this. Discussion from back then:
>> https://lore.kernel.org/qemu-devel/[email protected]/
> 
> Yes, I guess that could be done.
> 
> I'm not sure if that's really worthwhile, though, when we know that just
> disabling the bitmap later does the trick without adding much code and
> having to be careful.
> 
> So I'm currently inclined to just take your patch and file my thought
> away as an interesting idea without practical use.


Okay, I can send it as a proper patch, but one question below.

>>>> Git history tells me why setting the job ended up where it is now:
>>>>
>>>>> commit 32125b14606a454ed109ea6d9da32c747e94926f
>>>>> Author: Kevin Wolf <[email protected]>
>>>>> Date:   Fri Feb 3 16:21:41 2023 +0100
>>>>>
>>>>>     mirror: Fix access of uninitialised fields during start
>>>>>     
>>>>>     bdrv_mirror_top_pwritev() accesses the job object when active 
>>>>> mirroring
>>>>>     is enabled. It disables this code during early initialisation while
>>>>>     s->job isn't set yet.
>>>>>     
>>>>>     However, s->job is still set way too early when the job object isn't
>>>>>     fully initialised. For example, &s->ops_in_flight isn't initialised 
>>>>> yet
>>>>>     and the in_flight bitmap doesn't exist yet. This causes crashes when a
>>>>>     write request comes in too early.
>>>>>     
>>>>>     Move the assignment of s->job to when the mirror job is actually fully
>>>>>     initialised to make sure that the mirror_top driver doesn't access it
>>>>>     too early.
>>>>
>>>> Right before initializing the dirty bitmap, the other fields of the job
>>>> object are already initialized, so the problem described in that commit
>>>> message should not resurface.
>>>>
>>>> That said, not sure I'm not missing something.
>>>
>>> Yes, I think it requires splitting mirror_dirty_init() as I said, but
>>> otherwise I don't see a problem at the moment. I could still be missing
>>> something, of course.
>>>
>>>> Something else I'm wondering about: do we need to use atomics for
>>>> setting and accessing MirrorBDSOpaque.job? At least with virtio-blk
>>>> using iothread-vq-mapping, mirror_run() and bdrv_mirror_top_do_write()
>>>> might be called in different threads.
>>>
>>> This might not even be enough. At first I thought that barriers might be
>>> enough to make sure that MirrorBDSOpaque.job is set before the bitmap is
>>> disabled. But bdrv_set_dirty() with its check if the dirty bitmap is
>>> enabled happens only in bdrv_co_write_req_finish() after the request
>>> completes. So we have a race where the switch could happen while a
>>> request is in the middle between mirror_top_bs and bdrv_set_dirty() and
>>> neither place would mark the request dirty.
>>
>> This is for the approach with setting job and then disabling the bitmap
>> in mirror_run().
> 
> Yes.
> 
>> With your suggestion of having the bitmap disabled from the beginning
>> and setting the job before initializing the bitmap, using barriers would
>> be enough, or?
> 
> It's not obvious what the barrier would synchronise with. Correct me if
> I'm wrong, the idea is that the two possible cases are:
> 
> 1. Some request started in mirror_top_bs while s->job == NULL. This
>    implies copy_to_target == false.
> 2. Set mirror_top_opaque->job
> 3. The request completes, sees s->job and calls bdrv_set_dirty_bitmap()
> 
> 1. Some request started in mirror_top_bs while s->job == NULL
> 2. Set mirror_top_opaque->job
> 3. The request completes, still sees s->job == NULL and skips the bitmap
> 4. mirror_dirty_init() -> bdrv_co_is_allocated_above() picks up the
>    change and dirties the bitmap
> 
> The important question is if this bad case is possible, too:
> 
> 1. Some request started in mirror_top_bs while s->job == NULL
> 2. Set mirror_top_opaque->job
> 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because
>    the request hasn't completed yet, the block isn't allocated
> 4. The request completes, still sees s->job == NULL and skips the
>    bitmap, and nothing else will mark it dirty either
> 
> And that is where atomics/barriers will make sure that the new s->job is
> seen in bdrv_mirror_top_do_write(). Ok, fair enough, but it doesn't feel
> obvious and I'm still confused what the best qemu/atomic.h operation for
> this is.
> 
> Let me try another angle:
> 
> The whole trouble starts with the possibility of having !copy_to_target
> && !s->job in bdrv_mirror_top_do_write() and then we neither copy nor
> mark something dirty. What if we still have the dirty bitmap enabled on
> the block layer level and explicitly handle the !copy_to_target &&
> !s->job case by calling bdrv_set_dirty() in bdrv_mirror_top_do_write()?
> bdrv_set_dirty_bitmap() is then just the slightly optimised version for
> once the job has finished initialisation.
> 
> During job startup we would then usually call bdrv_set_dirty() twice,
> but who cares? And it would fix the race because then we're guaranteed
> that bdrv_mirror_top_do_write() either copies or sets a dirty bit, but
> it never does neither.

Thanks! Makes sense to me :)

With my patch, what we do need is to ensure that
1. set job
2. disable bitmap
are not re-ordered.

Otherwise, the following bad case could happen:

1. Some request started in mirror_top_bs while s->job == NULL
2. bdrv_co_write_req_finish() sees the bitmap as disabled
3. bdrv_mirror_top_do_write() still sees job == NULL and also does not
dirty the bitmap

But IIUC, we are already protected implicitly by the fact that
qemu_mutex_lock() is used in bdrv_disable_dirty_bitmap(), which means
that memory is synchronized (according to the POSIX documentation [0]).
Is this correct?

[0]:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap04.html#tag_04_15_02

Re: (in guest) disk corruption during snapshots

Reply via email to