Am 12.02.2026 um 10:55 hat Fiona Ebner geschrieben: > Am 10.02.26 um 10:18 PM schrieb Kevin Wolf: > > Am 09.02.2026 um 12:42 hat Fiona Ebner geschrieben: > >> Am 05.02.26 um 5:26 PM schrieb Kevin Wolf: > >>> Am 05.02.2026 um 11:45 hat Fiona Ebner geschrieben: > >>>> Am 05.02.26 um 10:28 AM schrieb Kevin Wolf: > >>>>> Am 05.02.2026 um 09:59 hat Fiona Ebner geschrieben: > >>>>>> Hi, > >>>>>> > >>>>>> Am 03.02.26 um 3:24 PM schrieb Jean-Louis Dupond: > >>>>>>> Hi, > >>>>>>> > >>>>>>> Since some months we were observing disk corruption within the VM when > >>>>>>> enabling backups (which triggers snapshots). > >>>>>>> After a lot of troubleshooting, we were able to track down the commit > >>>>>>> that caused it: > >>>>>>> https://gitlab.com/qemu-project/qemu/-/ > >>>>>>> commit/058cfca5645a9ed7cb2bdb77d15f2eacaf343694 > >>>>>>> > >>>>>>> More info in the issue: > >>>>>>> https://gitlab.com/qemu-project/qemu/-/issues/3273 > >>>>>>> > >>>>>>> Now this seems to be caused by a race between disabling the > >>>>>>> dirty_bitmaps and the tracking implemented in the mirror top layer. > >>>>>>> Kevin shared me a possible solution: > >>>>>>> > >>>>>>> diff --git a/block/mirror.c b/block/mirror.c > >>>>>>> index b344182c747..f76e43f22c1 100644 > >>>>>>> --- a/block/mirror.c > >>>>>>> +++ b/block/mirror.c > >>>>>>> @@ -1122,6 +1122,9 @@ static int coroutine_fn mirror_run(Job *job, > >>>>>>> Error **errp) > >>>>>>> * accessing it. > >>>>>>> */ > >>>>>>> mirror_top_opaque->job = s; > >>>>>>> + if (s->copy_mode != MIRROR_COPY_MODE_WRITE_BLOCKING) { > >>>>>>> + bdrv_disable_dirty_bitmap(s->dirty_bitmap); > >>>>>>> + } > >>>>>>> > >>>>>>> assert(!s->dbi); > >>>>>>> s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap); > >>>>>>> @@ -2018,7 +2021,9 @@ static BlockJob *mirror_start_job( > >>>>>>> * The dirty bitmap is set by bdrv_mirror_top_do_write() when > >>>>>>> not in active > >>>>>>> * mode. > >>>>>>> */ > >>>>>>> - bdrv_disable_dirty_bitmap(s->dirty_bitmap); > >>>>>>> + if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) { > >>>>>>> + bdrv_disable_dirty_bitmap(s->dirty_bitmap); > >>>>>>> + } > >>>>>>> > >>>>>>> bdrv_graph_wrlock_drained(); > >>>>>>> ret = block_job_add_bdrv(&s->common, "source", bs, 0, > >>>>>>> > >>>>>>> > >>>>>>> Running this for some hours already, and it seems to fix the issue. > >>>>>>> > >>>>>>> Let's open up the discussion if this is the proper way to fix it, or > >>>>>>> if > >>>>>>> there are better alternatives :) > >>>>>> > >>>>>> Yes, I do think this is the proper solution. I ended up with > >>>>>> essentially > >>>>>> the same yesterday (see the Gitlab issue). I moved the disabling > >>>>>> unconditionally, but there should be no need to delay the disabling if > >>>>>> using write-blocking mode. I would suggest moving the comment on top of > >>>>>> the changed hunk in mirror_start_job() along to the new hunk in > >>>>>> mirror_run(). > >>>>> > >>>>> I didn't actually write this as a proper patch, but just as a quick > >>>>> thing Jean-Louis could test, so yes, all of the details are up for > >>>>> discussion. > >>>>> > >>>>> You're right that there is no need to delay disabling the bitmap in > >>>>> active mode, but it probably also doesn't hurt to keep it enabled for a > >>>>> little while? Maybe we should do it like you did just to keep the code a > >>>>> little simpler. > >>>> > >>>> Yes, it shouldn't hurt. Looking at the code again, > >>>> should_copy_to_target() will evaluate to false before the job is set, so > >>>> it might be necessary for write-blocking mode too? > >>> > >>> I didn't check that part yet, but that is actually a very good point! > >>> > >>> Yes, I agree, so your version of the fix is even more correct. It seems > >>> write blocking mode is not actually write blocking until the very same > >>> point in the code. (We should mention that in the comment as well.) > >>> > >>>>> There is another approach I had in mind, but wasn't as sure about, so I > >>>>> didn't suggest it for the testing to see if this race is even the > >>>>> problem we're seeing in practice: Is there a reason why populating the > >>>>> initial dirty bitmap (i.e. the second part of mirror_dirty_init()) can't > >>>>> just come after setting 'mirror_top_opaque->job'? Then we could simply > >>>>> leave the bitmap disabled all the time and rely solely on the mirror > >>>>> job's own tracking. That would feel a little more consistent than using > >>>>> the block layer just to plug a small race window during startup. > >>>> > >>>> Interesting idea! In write-blocking mode, it will lead to writes during > >>>> dirty bitmap initialization already being synced to the target, so > >>>> leading to a bit of duplicate work. > >>> > >>> That's true, and I don't see an easy way to avoid it if we do things in > >>> this order. Probably not a big deal in practice, but also a bit ugly. > >> > >> We do support changing the copy mode after the job is started (see > >> mirror_change()). If we really want to avoid the duplicate work, we > >> could always start in background mode and switch to write-blocking only > >> after initializing the bitmap, if that's what the user wanted. There is > >> a comment in mirror_change() stating: > >> > >> The implementation relies on the fact that copy_mode is only written > >> under the BQL. Otherwise, further synchronization would be required. > >> > >> so we'd need to be careful if going for this. Discussion from back then: > >> https://lore.kernel.org/qemu-devel/[email protected]/ > > > > Yes, I guess that could be done. > > > > I'm not sure if that's really worthwhile, though, when we know that just > > disabling the bitmap later does the trick without adding much code and > > having to be careful. > > > > So I'm currently inclined to just take your patch and file my thought > > away as an interesting idea without practical use. > > Okay, I can send it as a proper patch, but one question below. > > >>>> Git history tells me why setting the job ended up where it is now: > >>>> > >>>>> commit 32125b14606a454ed109ea6d9da32c747e94926f > >>>>> Author: Kevin Wolf <[email protected]> > >>>>> Date: Fri Feb 3 16:21:41 2023 +0100 > >>>>> > >>>>> mirror: Fix access of uninitialised fields during start > >>>>> > >>>>> bdrv_mirror_top_pwritev() accesses the job object when active > >>>>> mirroring > >>>>> is enabled. It disables this code during early initialisation while > >>>>> s->job isn't set yet. > >>>>> > >>>>> However, s->job is still set way too early when the job object isn't > >>>>> fully initialised. For example, &s->ops_in_flight isn't initialised > >>>>> yet > >>>>> and the in_flight bitmap doesn't exist yet. This causes crashes > >>>>> when a > >>>>> write request comes in too early. > >>>>> > >>>>> Move the assignment of s->job to when the mirror job is actually > >>>>> fully > >>>>> initialised to make sure that the mirror_top driver doesn't access > >>>>> it > >>>>> too early. > >>>> > >>>> Right before initializing the dirty bitmap, the other fields of the job > >>>> object are already initialized, so the problem described in that commit > >>>> message should not resurface. > >>>> > >>>> That said, not sure I'm not missing something. > >>> > >>> Yes, I think it requires splitting mirror_dirty_init() as I said, but > >>> otherwise I don't see a problem at the moment. I could still be missing > >>> something, of course. > >>> > >>>> Something else I'm wondering about: do we need to use atomics for > >>>> setting and accessing MirrorBDSOpaque.job? At least with virtio-blk > >>>> using iothread-vq-mapping, mirror_run() and bdrv_mirror_top_do_write() > >>>> might be called in different threads. > >>> > >>> This might not even be enough. At first I thought that barriers might be > >>> enough to make sure that MirrorBDSOpaque.job is set before the bitmap is > >>> disabled. But bdrv_set_dirty() with its check if the dirty bitmap is > >>> enabled happens only in bdrv_co_write_req_finish() after the request > >>> completes. So we have a race where the switch could happen while a > >>> request is in the middle between mirror_top_bs and bdrv_set_dirty() and > >>> neither place would mark the request dirty. > >> > >> This is for the approach with setting job and then disabling the bitmap > >> in mirror_run(). > > > > Yes. > > > >> With your suggestion of having the bitmap disabled from the beginning > >> and setting the job before initializing the bitmap, using barriers would > >> be enough, or? > > > > It's not obvious what the barrier would synchronise with. Correct me if > > I'm wrong, the idea is that the two possible cases are: > > > > 1. Some request started in mirror_top_bs while s->job == NULL. This > > implies copy_to_target == false. > > 2. Set mirror_top_opaque->job > > 3. The request completes, sees s->job and calls bdrv_set_dirty_bitmap() > > > > 1. Some request started in mirror_top_bs while s->job == NULL > > 2. Set mirror_top_opaque->job > > 3. The request completes, still sees s->job == NULL and skips the bitmap > > 4. mirror_dirty_init() -> bdrv_co_is_allocated_above() picks up the > > change and dirties the bitmap > > > > The important question is if this bad case is possible, too: > > > > 1. Some request started in mirror_top_bs while s->job == NULL > > 2. Set mirror_top_opaque->job > > 3. mirror_dirty_init() -> bdrv_co_is_allocated_above() runs and because > > the request hasn't completed yet, the block isn't allocated > > 4. The request completes, still sees s->job == NULL and skips the > > bitmap, and nothing else will mark it dirty either > > > > And that is where atomics/barriers will make sure that the new s->job is > > seen in bdrv_mirror_top_do_write(). Ok, fair enough, but it doesn't feel > > obvious and I'm still confused what the best qemu/atomic.h operation for > > this is. > > > > Let me try another angle: > > > > The whole trouble starts with the possibility of having !copy_to_target > > && !s->job in bdrv_mirror_top_do_write() and then we neither copy nor > > mark something dirty. What if we still have the dirty bitmap enabled on > > the block layer level and explicitly handle the !copy_to_target && > > !s->job case by calling bdrv_set_dirty() in bdrv_mirror_top_do_write()? > > bdrv_set_dirty_bitmap() is then just the slightly optimised version for > > once the job has finished initialisation. > > > > During job startup we would then usually call bdrv_set_dirty() twice, > > but who cares? And it would fix the race because then we're guaranteed > > that bdrv_mirror_top_do_write() either copies or sets a dirty bit, but > > it never does neither. > > Thanks! Makes sense to me :) > > With my patch, what we do need is to ensure that > 1. set job > 2. disable bitmap > are not re-ordered. > > Otherwise, the following bad case could happen: > > 1. Some request started in mirror_top_bs while s->job == NULL > 2. bdrv_co_write_req_finish() sees the bitmap as disabled > 3. bdrv_mirror_top_do_write() still sees job == NULL and also does not > dirty the bitmap > > But IIUC, we are already protected implicitly by the fact that > qemu_mutex_lock() is used in bdrv_disable_dirty_bitmap(), which means > that memory is synchronized (according to the POSIX documentation [0]). > Is this correct?
I think that's right. But if we don't want to rely on the internals of bdrv_disable_dirty_bitmap(), I suppose we can stick a smp_wmb() in between to be extra sure. Kevin
