Peter Xu <pet...@redhat.com> writes: > On Fri, Jul 28, 2023 at 09:15:14AM -0300, Fabiano Rosas wrote: >> When waiting for the return path (RP) thread to finish, there is >> really nothing wrong in the RP if the destination end of the migration >> stops responding, leaving it stuck. >> >> Stop returning an error at that point and leave it to other parts of >> the code to catch. One such part is the very next routine run by >> migration_completion() which checks 'to_dst_file' for an error and fails >> the migration. Another is the RP thread itself when the recvmsg() >> returns an error. >> >> With this we stop marking RP bad from outside of the thread and can >> reuse await_return_path_close_on_source() in the next patches to wait >> on the thread during a paused migration. >> >> Signed-off-by: Fabiano Rosas <faro...@suse.de> >> --- >> migration/migration.c | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 91bba630a8..051067f8c5 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -2049,7 +2049,6 @@ static int >> await_return_path_close_on_source(MigrationState *ms) >> * waiting for the destination. >> */ >> qemu_file_shutdown(ms->rp_state.from_dst_file); >> - mark_source_rp_bad(ms); >> } >> trace_await_return_path_close_on_source_joining(); >> qemu_thread_join(&ms->rp_state.rp_thread); > > The retval of await_return_path_close_on_source() relies on > ms->rp_state.error. If mark_source_rp_bad() is dropped, is it possible > that it'll start to return succeed where it used to return failure?
Yep, as described in the commit message, I think it's ok to do that. The critical part is doing the shutdown. Other instances of mark_source_rp_bad() continue existing and we continue returning rp_state.error. > > Maybe not a big deal: I see migration_completion() also has another > qemu_file_get_error() later to catch errors, but I don't know how solid > that is. That is the instance I refer to in the commit message. At await_return_path_close_on_source() we only call mark_source_rp_bad() if to_dst_file has an error. That will be caught by this qemu_file_get_error() anyway.