On Tue, Jan 20, 2026 at 09:57:42AM +0100, Fiona Ebner wrote:
> Am 19.01.26 um 9:10 PM schrieb Stefan Hajnoczi:
> > On Mon, Jan 12, 2026 at 04:23:51PM +0100, Fiona Ebner wrote:
> >> Some Proxmox users reported an occasional assertion failure [0][1] in
> >> busy VMs when using drive mirror with active mode. In particular, the
> >> failure may occur for zero writes shorter than the job granularity:
> >>
> >>> #0  0x00007b421154b507 in abort ()
> >>> #1  0x00007b421154b420 in ?? ()
> >>> #2  0x0000641c582e061f in bitmap_set (map=0x7b4204014e00, start=14, nr=-1)
> >>> #3  0x0000641c58062824 in do_sync_target_write (job=0x641c7e73d1e0,
> >>>       method=MIRROR_METHOD_ZERO, offset=852480, bytes=4096, qiov=0x0, 
> >>> flags=0)
> >>> #4  0x0000641c58062250 in bdrv_mirror_top_do_write (bs=0x641c7e62e1f0,
> >>         method=MIRROR_METHOD_ZERO, copy_to_target=true, offset=852480,
> >>         bytes=4096, qiov=0x0, flags=0)
> >>> #5  0x0000641c58061f31 in bdrv_mirror_top_pwrite_zeroes 
> >>> (bs=0x641c7e62e1f0,
> >>         offset=852480, bytes=4096, flags=0)
> >>
> >> The range for the dirty bitmap described by dirty_bitmap_offset and
> >> dirty_bitmap_end is narrower than the original range and in fact,
> >> dirty_bitmap_end might be smaller than dirty_bitmap_offset. There
> >> already is a check for 'dirty_bitmap_offset < dirty_bitmap_end' before
> >> resetting the dirty bitmap. Add such a check for setting the zero
> >> bitmap too, which uses the same narrower range.
> >>
> >> [0]: https://forum.proxmox.com/threads/177981/
> >> [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=7222
> >>
> >> Cc: [email protected]
> >> Fixes: 7e277545b9 ("mirror: Skip writing zeroes when target is already 
> >> zero")
> >> Signed-off-by: Fiona Ebner <[email protected]>
> >> ---
> >>  block/mirror.c | 9 ++++++---
> >>  1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/block/mirror.c b/block/mirror.c
> >> index b344182c74..bc982cb99a 100644
> >> --- a/block/mirror.c
> >> +++ b/block/mirror.c
> >> @@ -1514,9 +1514,12 @@ do_sync_target_write(MirrorBlockJob *job, 
> >> MirrorMethod method,
> >>          assert(!qiov);
> >>          ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
> >>          if (job->zero_bitmap && ret >= 0) {
> >> -            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / 
> >> job->granularity,
> >> -                       (dirty_bitmap_end - dirty_bitmap_offset) /
> >> -                       job->granularity);
> >> +            if (dirty_bitmap_offset < dirty_bitmap_end) {
> >> +                bitmap_set(job->zero_bitmap,
> >> +                           dirty_bitmap_offset / job->granularity,
> >> +                           (dirty_bitmap_end - dirty_bitmap_offset) /
> >> +                           job->granularity);
> >> +            }
> > 
> > Why does this case clause use dirty_bitmap_offset and dirty_bitmap_end
> > instead of zero_bitmap_offset and zero_bitmap_end like the other
> > zero_bitmap operations in this switch statement?
> > 
> >    if (job->zero_bitmap && ret >= 0) {
> > -      bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> > -                 (dirty_bitmap_end - dirty_bitmap_offset) /
> > -                 job->granularity);
> > +      bitmap_set(job->zero_bitmap, zero_bitmap_offset,
> > +                 zero_bitmap_end - zero_bitmap_offset);
> >    }
> > 
> > I'm probably missing something, but it's not obvious to me :).
> 
> Sorry, I could've mentioned this in the commit message. There is a
> comment explaining it further up in the function:
> 
> >     /*
> >      * Tails are either clean or shrunk, so for dirty bitmap resetting
> >      * we safely align the range narrower.  But for zero bitmap, round
> >      * range wider for checking or clearing, and narrower for setting.
> >      */

Thanks!

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to