On 3/13/26 12:55, Denis V. Lunev wrote:
> When libvirtd reconnects to a running QEMU process that had an
> in-progress migration, qemuProcessReconnect first connects the
> monitor and only later recovers the migration job. During this window
> the async job is VIR_ASYNC_JOB_NONE, so any MIGRATION status events
> from QEMU are silently dropped by qemuProcessHandleMigrationStatus.
>
> If the migration was already cancelled or completed by QEMU during
> this window, no further events will be emitted. When
> qemuMigrationSrcCancelUnattended later restores the async job and
> calls qemuMigrationSrcCancel with wait=true, the wait loop calls
> qemuDomainObjWait (virCondWait with no timeout) and blocks forever
> waiting for an event that will never arrive.
>
> Fix this by querying QEMU migration status with query-migrate
> immediately after sending migrate_cancel, while still inside the
> monitor session. This ensures the job's migration status is up to
> date before entering the wait loop, so if QEMU already reached a
> terminal state (cancelled/completed/error), the loop exits
> immediately.
>
> Signed-off-by: Denis V. Lunev <[email protected]>
> CC: Peter Krempa <[email protected]>
> CC: Michal Privoznik <[email protected]>
> ---
> src/qemu/qemu_migration.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index fec808ccfb..3a9185f65c 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -4876,6 +4876,21 @@ qemuMigrationSrcCancel(virDomainObj *vm,
> return -1;
>
> rc = qemuMonitorMigrateCancel(priv->mon);
> +
> + if (rc == 0 && wait) {
> + virDomainJobData *jobData = vm->job->current;
> + qemuDomainJobDataPrivate *privJob = jobData->privateData;
> + qemuMonitorMigrationStats stats;
> +
> + /* During reconnect the async job is not yet restored when migration
> + * events can arrive from QEMU, causing
> + * qemuProcessHandleMigrationStatus() to drop them. In that case
> + * QEMU won't send any more events and the wait loop would block
> + * forever. */
> + if (qemuMonitorGetMigrationStats(priv->mon, &stats, NULL) == 0)
> + privJob->stats.mig.status = stats.status;
> + }
> +
> qemuDomainObjExitMonitor(vm);
>
> if (rc < 0)
ping