On Tue, Sep 12, 2023 at 04:05:27PM -0400, Peter Xu wrote: > Thanks for contributing the test case! > > Do you want me to pick this patch up (with modifications) and repost > together with this series? It'll also work if you want to send a separate > test patch. Let me know!
It turns out I found more bug when I was reworking that test case based on yours. E.g., currently we'll crash dest qemu if we really fail during recovery, because we miss: diff --git a/migration/savevm.c b/migration/savevm.c index bb3e99194c..422406e0ee 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2723,7 +2723,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); } - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + /* Current state can be either ACTIVE or RECOVER */ + migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED); /* Notify the fault thread for the invalidated file handle */ So in double failure case we'll not set RECOVER to PAUSED, and we'll crash right afterwards, as we'll skip the semaphore: while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { <--- not true, continue qemu_sem_wait(&mis->postcopy_pause_sem_dst); } Now within the new test case I am 100% sure I can kick both sides into RECOVER state (one trick still needed along the way; the test patch will tell soon), then kick them back, then proceed with a successful migration. Let me just repost everything with the new test case. Thanks, -- Peter Xu