On Wed, Jun 3, 2026 at 3:39 PM Fabiano Rosas <[email protected]> wrote:
>
> It's not possible to access the image file while there is an incoming
> migration in progress, the QEMU process doesn't hold any locks to the
> storage at this point so nodes are inactive. Attempting to flush leads
> to an assert at bdrv_co_write_req_prepare():
>
>    assert(!(bs->open_flags & BDRV_O_INACTIVE))
>
> The issue is reproducible by running iotest 181 on a host under cpu
> load. The migration must coincide with the header already containing
> the QED_F_NEED_CHECK flag.
>
> The sequence of events is as follows, with the respective call stacks
> referenced below:
>
> During block device init, bdrv_qed_attach_aio_context() starts the
> 'need_check' timer. The timer will not fire during incoming migration
> as it uses QEMU_CLOCK_VIRTUAL (to avoid this very issue, as the code
> comment indicates).                                                   (0)
>
> However, there's still bdrv_qed_drain_begin() which uses the fact that
> the timer is live to decide whether to start the
> qed_need_check_timer_entry() directly.                                (1)
>
> The qed_need_check_timer_entry() eventually calls into
> qed_write_header() -> bdrv_co_pwrite() leading to the assert.         (2)
>
> Skip creating the 'need_check' timer whenever the image is inactive.

I was concerned that some parts of the code might call
qed_start_need_check_timer() or qed_cancel_need_check_timer() without
checking s->need_check_timer != NULL. However, it only happens in the
write code path, which is not taken when BDRV_O_INACTIVE is set (see
bdrv_co_write_req_prepare()'s assert(!(bs->open_flags &
BDRV_O_INACTIVE))). So this patch looks good.

Reviewed-by: Stefan Hajnoczi <[email protected]>

Reply via email to