On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote: > When doing live migration with multifd channels 8, 16 or larger number, > the guest hangs in the presence of the network errors such as missing TCP > ACKs. > > At sender's side: > The main thread is blocked on qemu_thread_join, migration_fd_cleanup > is called because one thread fails on qio_channel_write_all when > the network problem happens and other send threads are blocked on sendmsg. > They could not be terminated. So the main thread is blocked on > qemu_thread_join > to wait for the threads terminated.
Isn't the right answer here to ensure we've called 'shutdown' on all the FDs, so that the threads get kicked out of sendmsg, before trying to join the thread ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|