When a send thread encounters an error (as is the case with yank), it sets multifd_send_state->exiting and the other threads exit too. This races with multifd_send_sync_main() which now hangs at qemu_sem_wait(&p->sem_sync) in multifd_send_sync_main() line 647 as it waits for threads that have exited.
Fix this by kicking the semaphores when exiting the send threads. I encountered this hang when stress testing the colo unit test, though I was unable to write a migration test to reliably hit this. Signed-off-by: Lukas Straub <[email protected]> --- migration/multifd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/multifd.c b/migration/multifd.c index 220ed8564960fdabc58e4baa069dd252c8ad293c..e8c85cb6c48deaee2c9bda7b821a976166d78c9c 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -677,6 +677,7 @@ static void *multifd_send_thread(void *opaque) qemu_sem_wait(&p->sem); if (multifd_send_should_exit()) { + multifd_send_kick_main(p); break; } -- 2.39.5
