On Fri, Feb 02, 2024 at 05:53:39PM +0800, Peter Xu wrote:
> On Thu, Feb 01, 2024 at 07:48:51PM +0100, Cédric Le Goater wrote:
> > Hello,
> 
> Hi, Cédric,
> 
> Thanks for the patches.
> 
> > 
> > Today, close_return_path_on_source() can perform a shutdown to exit
> > the return-path thread if an error occured. However, migrate_fd_cleanup()
> > does cleanups too early and the shutdown in close_return_path_on_source()
> > fails, leaving the source and destination waiting for an event to occur.
> > 
> > This little series tries to fix that. Comments welcome !  
> 
> One thing I do agree is that relying on qemu_file_get_error(to_dst_file) in
> close_return_path_on_source() is weird: IMHO we have better way to detect
> "whether the migration has error" now, which is migrate_has_error().
> 
> For this specific issue, I think one long standing issue that might be
> relevant is we have two QEMUFile (from_dst_file, to_dst_file) that share
> the same QIOChannel now.  Logically the two QEMUFile should be able to be
> managed separately, say, close() of to_dst_file shouldn't affect the other.
> 
> However I don't think it's the case now, as qemu_fclose(to_dst_file) will
> do qio_channel_close() already, which means there will be a side effect to
> the other QEMUFile that its backing IOC is already closed.
> 
> Is this the issue we're facing?  IOW, the close() of to_dst_file will not
> properly kick the other thread who is blocked at reading from_dst_file,
> while the shutdown() will kick it out?
> 
> If so, not sure whether we can somehow relay the real qio_channel_close()
> to until the last user releases it? IOW, conditionally close() the channel
> in qio_channel_finalize(), if the channel is still open?  Would that make
> sense?

IMHO the problem described above is a result of the design mistake of
having 2 separate QEMUFile instances for what is ultimately the same
channel. This was a convenient approach to take originally, but it has
likely outlived its purpose.

In the ideal world IMHO, QEMUFile would not exist at all, and we would
have a QIOChannelCached that adds the read/write buffering above the
base QIOChannel.

That's doable, but bigger than a quick fix. A natural stepping stone
to get there though is to move from 2 QEMUFile objs down to 1 QEMUFile,
which might be more practical as a quick fix.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to