Hello Dave, > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 377da78f5b..744a180dfe 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -1040,6 +1040,17 @@ void multifd_recv_sync_main(void) > > trace_multifd_recv_sync_main(multifd_recv_state->packet_num); > > } > > > > +void multifd_shutdown(void) > > +{ > > + if (!migrate_use_multifd()) { > > + return; > > + } > > + > > + if (multifd_send_state) { > > + multifd_send_terminate_threads(NULL); > > + } > > That calls : > for (i = 0; i < migrate_multifd_channels(); i++) { > MultiFDSendParams *p = &multifd_send_state->params[i]; > > qemu_mutex_lock(&p->mutex); > p->quit = true; > qemu_sem_post(&p->sem); > qemu_mutex_unlock(&p->mutex); > } > > so why doesn't this also get stuck in the same mutex you're trying to > fix?
You are right, I got confused over the locks. I need to get a better look at the code, and truly understand why this patch fixes (?) the issue. > > Does the qio_channel_shutdown actually cause a shutdown on all fd's > for the multifd? As far as I tested, it does shutdown a single fd, but whenever this fd fails in it's first sendmsg it causes migration to fail and all the other fds get shutdown as well. > > (I've just seen the multifd/cancel test fail stuck in multifd_send_sync_main > waiting on one of the locks). > > Dave > > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > I will do a little more reading / debugging in this code. Thanks Dave!