Peter Xu <pet...@redhat.com> writes: > On Tue, Aug 15, 2023 at 07:31:28PM -0300, Fabiano Rosas wrote: >> Peter Xu <pet...@redhat.com> writes: >> >> > On Fri, Aug 11, 2023 at 12:08:36PM -0300, Fabiano Rosas wrote: >> >> We currently have a pattern for cleaning up a migration QEMUFile: >> >> >> >> qemu_mutex_lock(&s->qemu_file_lock); >> >> file = s->file_name; >> >> s->file_name = NULL; >> >> qemu_mutex_unlock(&s->qemu_file_lock); >> >> >> >> migration_ioc_unregister_yank_from_file(file); >> >> qemu_file_shutdown(file); >> >> qemu_fclose(file); >> >> >> >> There are some considerations for this sequence: >> >> >> >> - we must clear the pointer under the lock, to avoid TOC/TOU bugs; >> >> - the shutdown() and close() expect be given a non-null parameter; >> >> - a close() in one thread should not race with a shutdown() in another; >> >> >> >> Create a wrapper function to make sure everything works correctly. >> >> >> >> Note: the return path did not used to call >> >> migration_ioc_unregister_yank_from_file(), but I added it >> >> nonetheless for uniformity. >> >> >> >> Signed-off-by: Fabiano Rosas <faro...@suse.de> >> > >> > This definitely looks cleaner. Probably can be squashed together with >> > previous patch? If you could double check whether we can just drop the >> > shutdown() all over the places when close() altogether, it'll be even >> > nicer (I hope I didn't miss any real reasons to explicitly do that). >> > >> >> diff --git a/util/yank.c b/util/yank.c >> >> index abf47c346d..4b6afbf589 100644 >> >> --- a/util/yank.c >> >> +++ b/util/yank.c >> >> @@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance >> >> *instance, >> >> return; >> >> } >> >> } >> >> - >> >> - abort(); >> > >> > I think we can't silently do this. This check is very strict and I guess >> > you removed it because you hit a crash. What's the crash? Can we just >> > pair the yank reg/unreg? >> > >> >> Well, the abort() is the crash. It just means that we looped and didn't >> find the handler to unregister. It looks harmless to me. I should have >> mentioned this in the commit message. > > Yeah, trust me I wanted to remove that for quite a few times. :) But then I > normally decided to try harder to find what's missing; and so far indeed I > found that the cleanest way is always pair the reg/unreg. > >> >> I could certainly add a yank handler to the rp_state.from_dst_file. But >> then I have no idea what will happen if we try to yank the return path >> at a random moment. > > I think the idea was it should be registered always when the channel is > created, and then unregistered when the channel is destroyed. They should > just pair, alongside with the channel's lifecycle? > >> >> Side note: I see that yank does a qio_channel_shutdown() without the >> controversial setting of -EIO. Which means it is probably succeptible to >> the same race described in the qemu_file_shutdown() code. > > Are you looking outside migration code (I saw nbd_teardown_connection() > does have one)? > > For migration IIUC it's always via migration_ioc_unregister_yank().
I'm talking about the actual yank action, not the unregister. migration_yank_iochannel() calls qio_channel_shutdown() in the same way as qemu_file_shutdown(), but unlike the latter, it doesn't set f->last_error = -EIO. Which means that in theory, we could yank and still try to use the QEMUFile. In other words, what commit a555b8092a ("qemu-file: Don't do IO after shutdown") did does not apply to yank because yank didn't exit at the time.