On Thu, May 28, 2026 at 06:01:57PM -0300, Fabiano Rosas wrote:
> Kevin Wolf <[email protected]> writes:
> 
> > Am 28.05.2026 um 15:55 hat Fabiano Rosas geschrieben:
> >> It's not possible to access the image file while there is an incoming
> >> migration in progress, the QEMU process doesn't hold any locks to the
> >> storage at this point so nodes are inactive. Attempting to drain leads
> >> to an assert at bdrv_co_write_req_prepare():
> >> 
> >>    assert(!(bs->open_flags & BDRV_O_INACTIVE))
> >> 
> >> The issue is reproducible by running iotest 181 on a host under cpu
> >> load. The migration must coincide with the header already containing
> >> the QED_F_NEED_CHECK flag.
> >> 
> >> The sequence of events is as follows, with the respective call stacks
> >> referenced below:
> >> 
> >> During block device init, bdrv_qed_attach_aio_context() starts the
> >> 'need_check' timer. The timer will not fire during incoming migration
> >> as it uses QEMU_CLOCK_VIRTUAL (to avoid this very issue, as the code
> >> comment indicates).                                                   (0)
> >> 
> >> However, there's still bdrv_qed_drain_begin() which uses the fact that
> >> the timer is live to decide whether to start the
> >> qed_need_check_timer_entry() directly.                                (1)
> >> 
> >> The qed_need_check_timer_entry() eventually calls into
> >> qed_write_header() -> bdrv_co_pwrite() leading to the assert.         (2)
> >> 
> >> Since we don't have an API for checking if a timer is enabled, an
> >> alternative is to skip this logic whenever the runstate is
> >> INMIGRATE. This actually matches the setting of the BDRV_O_INACTIVE
> >> flag at blockdev.c.
> >
> > I'm not sure what you mean by "checking if a timer is enabled", because
> > bdrv_qed_drain_begin() has a timer_pending() check that sounds exactly
> > what you say doesn't exist.
> >
> 
> Sorry, I meant to say checking if a *clock* is enabled. 
> 
> The code comment at qed_start_need_check_timer() says:
> 
>     /* Use QEMU_CLOCK_VIRTUAL so we don't alter the image file while 
> suspended for
>      * migration.
>      */
> 
> It seems to assume that since QEMU_CLOCK_VIRTUAL is disabled when vcpus
> are paused then the timer function will never execute while the image is
> inactive. If bdrv_qed_drain_begin() could check clock->enabled, then it
> would be under the same assumption.
> 
> > But it's also not clear to me why s->need_check_timer is true on an
> > inactive image. I don't think this is supposed to be the case?
> 
> bdrv_qed_do_open() reads the header during QEMU startup. Whatever is set
> in the header at that moment will be picked up, including
> QED_F_NEED_CHECK, which is what the code uses to determine the timer
> should be started.
> 
> > It looks like qed is missing a .bdrv_inactivate implementation that
> > cancels the timer after flushing everything (including the header) to
> > disk.
> >
> 
> Hmm, but bdrv_inactivate happens on the migration source. The
> destination machine could still start at any moment while the image is
> dirty and see the QED_F_NEED_CHECK flag. Even doing something on the
> destination machine wouldn't help I think given how early the crash
> happens.

bdrv_qed_do_open() has flags & BDRV_O_INACTIVE checks but not when it
calls bdrv_qed_attach_aio_context(). The destination could avoid
creating the timer when BDRV_O_INACTIVE is set. The existing
bdrv_qed_co_invalidate_cache() code will create the timer when
BDRV_O_INACTIVE is cleared via bdrv_activate().

Stefan

> 
> >> 
> >> The stacks:
> >> 
> >> (0) == issues timer_mod ==
> >>  #6  in qed_start_need_check_timer       at ../block/qed.c:340
> >>  #7  in bdrv_qed_attach_aio_context      at ../block/qed.c:373
> >>  #8  in bdrv_qed_do_open                 at ../block/qed.c:556
> >>  #9  in bdrv_qed_open_entry              at ../block/qed.c:582
> >>  #10 in coroutine_trampoline             at 
> >> ../util/coroutine-ucontext.c:175
> >>  #0  in qemu_coroutine_switch<+120>      at 
> >> ../util/coroutine-ucontext.c:321
> >>  #1  in qemu_aio_coroutine_enter<+356>   at ../util/qemu-coroutine.c:293
> >>  #2  in aio_co_enter<+179>               at ../util/async.c:710
> >>  #3  in aio_co_wake<+53>                 at ../util/async.c:695
> >>  #4  in thread_pool_co_cb<+47>           at ../util/thread-pool.c:283
> >>  #5  in thread_pool_completion_bh<+241>  at ../util/thread-pool.c:202
> >>  #6  in aio_bh_call<+109>                at ../util/async.c:173
> >>  #7  in aio_bh_poll<+299>                at ../util/async.c:220
> >>  #8  in aio_poll<+690>                   at ../util/aio-posix.c:745
> >>  #9  in bdrv_qed_open<+392>              at ../block/qed.c:607
> >>  #10 in bdrv_open_driver<+327>           at ../block.c:1678
> >>  #11 in bdrv_open_common<+1619>          at ../block.c:2008
> >>  #12 in bdrv_open_inherit<+2556>         at ../block.c:4191
> >>  #13 in bdrv_open<+118>                  at ../block.c:4286
> >>  #14 in blk_new_open<+199>               at ../block/block-backend.c:458
> >>  #15 in blockdev_init<+2011>             at ../blockdev.c:612
> >>  #16 in drive_new<+3008>                 at ../blockdev.c:1008
> >>  #17 in drive_init_func<+51>             at ../system/vl.c:662
> >>  #18 in qemu_opts_foreach<+227>          at ../util/qemu-option.c:1148
> >>  #19 in configure_blockdev<+350>         at ../system/vl.c:721
> >>  #20 in qemu_create_early_backends<+343> at ../system/vl.c:2076
> >>  #21 in qemu_init<+12483>                at ../system/vl.c:3778
> >>  #22 in main<+46>                        at ../system/main.c:71
> >> 
> >> (1) == sees timer_pending ==
> >>  #6  in bdrv_qed_drain_begin             at ../block/qed.c:391
> >>  #7  in bdrv_do_drained_begin            at ../block/io.c:366
> >>  #8  in bdrv_do_drained_begin_quiesce    at ../block/io.c:386
> >>  #9  in bdrv_child_cb_drained_begin      at ../block.c:1207
> >>  #10 in bdrv_parent_drained_begin_single at ../block/io.c:133
> >>  #11 in bdrv_parent_drained_begin        at ../block/io.c:64
> >>  #12 in bdrv_do_drained_begin            at ../block/io.c:364
> >>  #13 in bdrv_drained_begin               at ../block/io.c:393
> >>  #14 in blk_drain                        at ../block/block-backend.c:2101
> >>  #15 in blk_unref                        at ../block/block-backend.c:544
> >>  #16 in bdrv_open_inherit                at ../block.c:4197
> >>  #17 in bdrv_open                        at ../block.c:4286
> >>  #18 in blk_new_open                     at ../block/block-backend.c:458
> >>  #19 in blockdev_init                    at ../blockdev.c:612
> >>  #20 in drive_new                        at ../blockdev.c:1008
> >>  #21 in drive_init_func                  at ../system/vl.c:662
> >>  #22 in qemu_opts_foreach                at ../util/qemu-option.c:1148
> >>  #23 in configure_blockdev               at ../system/vl.c:721
> >>  #24 in qemu_create_early_backends       at ../system/vl.c:2076
> >>  #25 in qemu_init                        at ../system/vl.c:3778
> >>  #26 in main                             at ../system/main.c:71
> >> 
> >> (2) == crashes ==
> >>  #5  in __assert_fail (assertion="!(bs->open_flags & BDRV_O_INACTIVE)", 
> >> file="../block/io.c", line=1977
> >>  #6  in bdrv_co_write_req_prepare  at ../block/io.c:1977
> >>  #7  in bdrv_aligned_pwritev       at ../block/io.c:2099
> >>  #8  in bdrv_co_pwritev_part       at ../block/io.c:2316
> >>  #9  in bdrv_co_pwritev            at ../block/io.c:2233
> >>  #10 in bdrv_co_pwrite             at ../include/block/block_int-io.h:77
> >>  #11 in qed_write_header           at ../block/qed.c:128
> >>  #12 in qed_need_check_timer       at ../block/qed.c:305
> >>  #13 in qed_need_check_timer_entry at ../block/qed.c:319
> >> 
> >> Note that this issue is not exactly the same as what's been reported
> >> in Gitlab, but given how easily this reproduces, I imagine it has to
> >> be happening in that setup as well.
> >> 
> >> Link: https://gitlab.com/qemu-project/qemu/-/work_items/3515
> >> Signed-off-by: Fabiano Rosas <[email protected]>
> >> ---
> >> CI run: https://gitlab.com/farosas/qemu/-/pipelines/2557314306
> >> ---
> >>  block/qed.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >> 
> >> diff --git a/block/qed.c b/block/qed.c
> >> index da23a83d62..ccf32bb4ae 100644
> >> --- a/block/qed.c
> >> +++ b/block/qed.c
> >> @@ -21,6 +21,7 @@
> >>  #include "qemu/module.h"
> >>  #include "qemu/option.h"
> >>  #include "qemu/memalign.h"
> >> +#include "system/runstate.h"
> >>  #include "trace.h"
> >>  #include "qed.h"
> >>  #include "system/block-backend.h"
> >> @@ -373,6 +374,11 @@ static void bdrv_qed_drain_begin(BlockDriverState *bs)
> >>  {
> >>      BDRVQEDState *s = bs->opaque;
> >>  
> >> +    /* Nodes are inactive while waiting for an incoming migration. */
> >> +    if (runstate_check(RUN_STATE_INMIGRATE)) {
> >
> > If you're interested in whether the node is inactive, this is what you
> > should check. RUN_STATE_INMIGRATE is only one of the conditions in which
> > it might (but doesn't have to) be true. There is bdrv_is_inactive() to
> > check this.
> >
> > Of course, if we're fixing the root cause (that QED_F_NEED_CHECK and
> > s->need_check_timer are set on an inactive image), this hunk might not
> > be needed at all.
> >
> >> +        return;
> >> +    }
> >> +
> >>      /* Fire the timer immediately in order to start doing I/O as soon as 
> >> the
> >>       * header is flushed.
> >>       */
> >
> > Kevin
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to