Peter Xu <pet...@redhat.com> writes:

> On Tue, Aug 15, 2023 at 07:31:28PM -0300, Fabiano Rosas wrote:
>> Peter Xu <pet...@redhat.com> writes:
>> 
>> > On Fri, Aug 11, 2023 at 12:08:36PM -0300, Fabiano Rosas wrote:
>> >> We currently have a pattern for cleaning up a migration QEMUFile:
>> >> 
>> >>   qemu_mutex_lock(&s->qemu_file_lock);
>> >>   file = s->file_name;
>> >>   s->file_name = NULL;
>> >>   qemu_mutex_unlock(&s->qemu_file_lock);
>> >> 
>> >>   migration_ioc_unregister_yank_from_file(file);
>> >>   qemu_file_shutdown(file);
>> >>   qemu_fclose(file);
>> >> 
>> >> There are some considerations for this sequence:
>> >> 
>> >> - we must clear the pointer under the lock, to avoid TOC/TOU bugs;
>> >> - the shutdown() and close() expect be given a non-null parameter;
>> >> - a close() in one thread should not race with a shutdown() in another;
>> >> 
>> >> Create a wrapper function to make sure everything works correctly.
>> >> 
>> >> Note: the return path did not used to call
>> >>       migration_ioc_unregister_yank_from_file(), but I added it
>> >>       nonetheless for uniformity.
>> >> 
>> >> Signed-off-by: Fabiano Rosas <faro...@suse.de>
>> >
>> > This definitely looks cleaner.  Probably can be squashed together with
>> > previous patch?  If you could double check whether we can just drop the
>> > shutdown() all over the places when close() altogether, it'll be even
>> > nicer (I hope I didn't miss any real reasons to explicitly do that).
>> >
>> >> diff --git a/util/yank.c b/util/yank.c
>> >> index abf47c346d..4b6afbf589 100644
>> >> --- a/util/yank.c
>> >> +++ b/util/yank.c
>> >> @@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance 
>> >> *instance,
>> >>              return;
>> >>          }
>> >>      }
>> >> -
>> >> -    abort();
>> >
>> > I think we can't silently do this.  This check is very strict and I guess
>> > you removed it because you hit a crash.  What's the crash?  Can we just
>> > pair the yank reg/unreg?
>> >
>> 
>> Well, the abort() is the crash. It just means that we looped and didn't
>> find the handler to unregister. It looks harmless to me. I should have
>> mentioned this in the commit message.
>
> Yeah, trust me I wanted to remove that for quite a few times. :) But then I
> normally decided to try harder to find what's missing; and so far indeed I
> found that the cleanest way is always pair the reg/unreg.
>
>> 
>> I could certainly add a yank handler to the rp_state.from_dst_file. But
>> then I have no idea what will happen if we try to yank the return path
>> at a random moment.
>
> I think the idea was it should be registered always when the channel is
> created, and then unregistered when the channel is destroyed.  They should
> just pair, alongside with the channel's lifecycle?
>
>> 
>> Side note: I see that yank does a qio_channel_shutdown() without the
>> controversial setting of -EIO. Which means it is probably succeptible to
>> the same race described in the qemu_file_shutdown() code.
>
> Are you looking outside migration code (I saw nbd_teardown_connection()
> does have one)?
>
> For migration IIUC it's always via migration_ioc_unregister_yank().

I'm talking about the actual yank action, not the unregister.

migration_yank_iochannel() calls qio_channel_shutdown() in the same way
as qemu_file_shutdown(), but unlike the latter, it doesn't set
f->last_error = -EIO. Which means that in theory, we could yank and
still try to use the QEMUFile.

In other words, what commit a555b8092a ("qemu-file: Don't do IO after
shutdown") did does not apply to yank because yank didn't exit at the
time.

Reply via email to