On Thu, Feb 01, 2024 at 07:48:51PM +0100, Cédric Le Goater wrote:
> Hello,

Hi, Cédric,

Thanks for the patches.

> 
> Today, close_return_path_on_source() can perform a shutdown to exit
> the return-path thread if an error occured. However, migrate_fd_cleanup()
> does cleanups too early and the shutdown in close_return_path_on_source()
> fails, leaving the source and destination waiting for an event to occur.
> 
> This little series tries to fix that. Comments welcome !  

One thing I do agree is that relying on qemu_file_get_error(to_dst_file) in
close_return_path_on_source() is weird: IMHO we have better way to detect
"whether the migration has error" now, which is migrate_has_error().

For this specific issue, I think one long standing issue that might be
relevant is we have two QEMUFile (from_dst_file, to_dst_file) that share
the same QIOChannel now.  Logically the two QEMUFile should be able to be
managed separately, say, close() of to_dst_file shouldn't affect the other.

However I don't think it's the case now, as qemu_fclose(to_dst_file) will
do qio_channel_close() already, which means there will be a side effect to
the other QEMUFile that its backing IOC is already closed.

Is this the issue we're facing?  IOW, the close() of to_dst_file will not
properly kick the other thread who is blocked at reading from_dst_file,
while the shutdown() will kick it out?

If so, not sure whether we can somehow relay the real qio_channel_close()
to until the last user releases it? IOW, conditionally close() the channel
in qio_channel_finalize(), if the channel is still open?  Would that make
sense?

Copy Dan too.

Thanks,

-- 
Peter Xu


Reply via email to