On Tue, Feb 09, 2021 at 03:52:51PM +0800, Hao Wang wrote: > If any error happens during multifd send thread creating (e.g. channel broke > because new domain is destroyed by the dst), multifd_tls_handshake_thread > may exit silently, leaving main migration thread hanging (ram_save_setup -> > multifd_send_sync_main -> qemu_sem_wait(&p->sem_sync)). > Fix that by adding error handling in multifd_tls_handshake_thread. > > Signed-off-by: Hao Wang <wanghao...@huawei.com> > --- > migration/multifd.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index 45c690aa11..df29cdd26a 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -736,7 +736,16 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, > } else { > trace_multifd_tls_outgoing_handshake_complete(ioc); > } > - multifd_channel_connect(p, ioc, err); > + > + if (multifd_channel_connect(p, ioc, err)) {
Urgh, this method is rather confusing usng "true" to indicate failure. In include/qapi/error.h we document • bool-valued functions return true on success / false on failure, Admittedly this problem is pre-existing before this patch, but I would none the less like to see a patch that fixes thse inverted semantics first, and then this as a subsequent patch on top. > + /* > + * Error happen, mark multifd_send_thread status as 'quit' although > it > + * is not created, and then tell who pay attention to me. > + */ > + p->quit = true; > + qemu_sem_post(&multifd_send_state->channels_ready); > + qemu_sem_post(&p->sem_sync); > + } > } > > static void *multifd_tls_handshake_thread(void *opaque) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|