On Thu, Feb 01, 2024 at 07:48:51PM +0100, Cédric Le Goater wrote: > Hello,
Hi, Cédric, Thanks for the patches. > > Today, close_return_path_on_source() can perform a shutdown to exit > the return-path thread if an error occured. However, migrate_fd_cleanup() > does cleanups too early and the shutdown in close_return_path_on_source() > fails, leaving the source and destination waiting for an event to occur. > > This little series tries to fix that. Comments welcome ! One thing I do agree is that relying on qemu_file_get_error(to_dst_file) in close_return_path_on_source() is weird: IMHO we have better way to detect "whether the migration has error" now, which is migrate_has_error(). For this specific issue, I think one long standing issue that might be relevant is we have two QEMUFile (from_dst_file, to_dst_file) that share the same QIOChannel now. Logically the two QEMUFile should be able to be managed separately, say, close() of to_dst_file shouldn't affect the other. However I don't think it's the case now, as qemu_fclose(to_dst_file) will do qio_channel_close() already, which means there will be a side effect to the other QEMUFile that its backing IOC is already closed. Is this the issue we're facing? IOW, the close() of to_dst_file will not properly kick the other thread who is blocked at reading from_dst_file, while the shutdown() will kick it out? If so, not sure whether we can somehow relay the real qio_channel_close() to until the last user releases it? IOW, conditionally close() the channel in qio_channel_finalize(), if the channel is still open? Would that make sense? Copy Dan too. Thanks, -- Peter Xu