On Wed, Feb 18, 2026 at 10:29:38PM +0100, Lukas Straub wrote:
> When a send thread encounters an error (as is the case with yank),
> it sets multifd_send_state->exiting and the other threads exit too.
> This races with multifd_send_sync_main() which now hangs at
> qemu_sem_wait(&p->sem_sync) in multifd_send_sync_main() line 647
> as it waits for threads that have exited.
> 
> Fix this by kicking the semaphores when exiting the send threads.
> 
> I encountered this hang when stress testing the colo unit test,
> though I was unable to write a migration test to reliably hit this.
> 
> Signed-off-by: Lukas Straub <[email protected]>
> ---
>  migration/multifd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 
> 220ed8564960fdabc58e4baa069dd252c8ad293c..e8c85cb6c48deaee2c9bda7b821a976166d78c9c
>  100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -677,6 +677,7 @@ static void *multifd_send_thread(void *opaque)
>          qemu_sem_wait(&p->sem);
>  
>          if (multifd_send_should_exit()) {
> +            multifd_send_kick_main(p);
>              break;
>          }

Looks like normal migration cancellation will only error out the main
channel not multifd ones, hence the main sync will always properly done via
the sem_sync.  So maybe yank behaves differently indeed and less people use
yank in multifd migrations.  Looks fine to do extra kick for this path, as
long as we'll destroy the two semaphores later for each migration attempt.

Said that, special casing this path looks weird.

We could move the kick main at the end to be out of "err" case, so we
always kick it?  We can add a comment explaining that.

-- 
Peter Xu


Reply via email to