On Thu, Aug 03, 2023 at 11:15:41AM -0400, Peter Xu wrote: > On Thu, Aug 03, 2023 at 11:45:38AM -0300, Fabiano Rosas wrote: > > Peter Xu <pet...@redhat.com> writes: > > > > > On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote: > > >> Peter Xu <pet...@redhat.com> writes: > > >> > > >> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote: > > >> >> This function currently has a straight-forward part which is waiting > > >> >> for the thread to join and a complicated part which is doing a > > >> >> qemu_file_shutdown() on the return path file. > > >> >> > > >> >> The shutdown is tricky because all calls to qemu_file_shutdown() set > > >> >> f->last_error to -EIO, which means we can never know if an error is an > > >> >> actual error or if we cleanly shutdown the file previously. > > >> >> > > >> >> This is particularly bothersome for postcopy because it would send the > > >> >> return path thread into the retry routine which would wait on the > > >> >> postcopy_pause_rp_sem and consequently block the main thread. We > > >> >> haven't had reports of this so I must presume we never reach here with > > >> >> postcopy. > > >> >> > > >> >> The shutdown call is also racy because since it doesn't take the > > >> >> qemu_file_lock, it could NULL-dereference if the return path thread > > >> >> happens to be in the middle of the critical region at > > >> >> migration_release_dst_files(). > > >> > > > >> > After you rework the thread model on resume, shall we move > > >> > migration_release_dst_files() into the migration thread to be after the > > >> > pthread_join()? I assume then we don't even need a mutex to protect > > >> > it? > > >> > > > >> > > >> I just need to figure out if it's ok to move the postcopy_qemufile_src > > >> cleanup along. No idea why it is there in the first place. I see you > > >> moved it from postcopy_pause and we're about to move it back to the > > >> exact same place =D > > > > > > It was there because the old postcopy-preempt was sending data via > > > postcopy_qemufile_src from the migration thread, while postcopy_pause is > > > also the migration thread context. > > > > > > Then we had 9358982744 ("migration: Send requested page directly in > > > rp-return thread") where we moved that "send page" operation into the > > > return path thread to reduce latencies. After moving there it also means > > > the file handle can be accessed in >1 threads, so I just moved it over to > > > operate that always in the return path thread, then no race should happen. > > > > > > > Thanks for the context. > > > > > With your change, return path will vanish before migration thread accesses > > > it later (so as mentioned above, it must be after pthread_join() > > > succeeded), then I assume it'll be fine too to have it back in migration > > > thread. > > > > > > Or perhaps just take the file lock? > > > > > > > There's also migrate_fd_cleanup and migrate_fd_cancel that can touch > > these files. We might need to lock anyway, let's see. > > The cancel path shouldn't clear the QEMUFile*, then I assume it's fine. > That's based on the assumption that qemu_file_shutdown() is actually thread > safe (say, shutdown() syscall is thread-safe for sockets).
The shutdown() syscall and qio_channel_shutdown() method are intended to be safe to call from any thread *PROVIDED* you can ensure no other thread is concurrently going to call close() on the FD (or unref the QIOChannel object). There is no locking in qemu_file_shutdown() to guarantee this, but maybe something else in migration code is guaranteeing that the QIOChannel object is not going to be closed (or unref'd), while a thread is invoking qemu_file_shutdown(). IOW, in theory qemu_file_shutdown() could be safe to use but I'm not seeing a clearly expressed guarantee of safety in the code. If it is safe, the reasons are very subtle and rationale ought to be documented in the comment for qemu_file_shutdown With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|