On Fri, Feb 02, 2024 at 04:11:25PM -0300, Fabiano Rosas wrote: > We currently only need p->running to avoid calling qemu_thread_join() > on a non existent thread if the thread has never been created. > > However, there are at least two bugs in this logic: > > 1) On the sending side, p->running is set too early and > qemu_thread_create() can be skipped due to an error during TLS > handshake, leaving the flag set and leading to a crash when > multifd_save_cleanup() calls qemu_thread_join(). > > 2) During exit, the multifd thread clears the flag while holding the > channel lock. The counterpart at multifd_save_cleanup() reads the flag > outside of the lock and might free the mutex while the multifd thread > still has it locked. > > Fix the first issue by setting the flag right before creating the > thread. Rename it from p->running to p->thread_created to clarify its > usage. > > Fix the second issue by not clearing the flag at the multifd thread > exit. We don't have any use for that. > > Note that these bugs are straight-forward logic issues and not race > conditions. There is still a gap for races to affect this code due to > multifd_save_cleanup() being allowed to run concurrently with the > thread creation loop. This issue is solved in the next patch. >
Cc: qemu-stable <qemu-sta...@nongnu.org> > Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") > Reported-by: Avihai Horon <avih...@nvidia.com> > Reported-by: <chenyuh...@huawei.com> > Signed-off-by: Fabiano Rosas <faro...@suse.de> Reviewed-by: Peter Xu <pet...@redhat.com> -- Peter Xu