On Fri, Feb 02, 2024 at 05:53:39PM +0800, Peter Xu wrote: > On Thu, Feb 01, 2024 at 07:48:51PM +0100, Cédric Le Goater wrote: > > Hello, > > Hi, Cédric, > > Thanks for the patches. > > > > > Today, close_return_path_on_source() can perform a shutdown to exit > > the return-path thread if an error occured. However, migrate_fd_cleanup() > > does cleanups too early and the shutdown in close_return_path_on_source() > > fails, leaving the source and destination waiting for an event to occur. > > > > This little series tries to fix that. Comments welcome ! > > One thing I do agree is that relying on qemu_file_get_error(to_dst_file) in > close_return_path_on_source() is weird: IMHO we have better way to detect > "whether the migration has error" now, which is migrate_has_error(). > > For this specific issue, I think one long standing issue that might be > relevant is we have two QEMUFile (from_dst_file, to_dst_file) that share > the same QIOChannel now. Logically the two QEMUFile should be able to be > managed separately, say, close() of to_dst_file shouldn't affect the other. > > However I don't think it's the case now, as qemu_fclose(to_dst_file) will > do qio_channel_close() already, which means there will be a side effect to > the other QEMUFile that its backing IOC is already closed. > > Is this the issue we're facing? IOW, the close() of to_dst_file will not > properly kick the other thread who is blocked at reading from_dst_file, > while the shutdown() will kick it out? > > If so, not sure whether we can somehow relay the real qio_channel_close() > to until the last user releases it? IOW, conditionally close() the channel > in qio_channel_finalize(), if the channel is still open? Would that make > sense?
IMHO the problem described above is a result of the design mistake of having 2 separate QEMUFile instances for what is ultimately the same channel. This was a convenient approach to take originally, but it has likely outlived its purpose. In the ideal world IMHO, QEMUFile would not exist at all, and we would have a QIOChannelCached that adds the read/write buffering above the base QIOChannel. That's doable, but bigger than a quick fix. A natural stepping stone to get there though is to move from 2 QEMUFile objs down to 1 QEMUFile, which might be more practical as a quick fix. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|