On Wed, Aug 02, 2023 at 05:04:45PM -0300, Fabiano Rosas wrote: > Peter Xu <pet...@redhat.com> writes: > > >> + if (await_return_path_close_on_source(s)) { > >> + trace_migration_return_path_pause_err(); > >> + return MIG_THR_ERR_FATAL; > >> + } > > > > I see that here on return path failures we'll bail out, and actually it's > > against the instinction (that when pause it should have failed, so it's > > weird why it's returning 0). > > > > So how about above suggestion, plus here we just call > > await_return_path_close_on_source(), without caring about the retval? > > So you are suggesting to remove the knowledge of the retry entirely from > the thread. It just reports the error and the postcopy_pause takes the > responsibility of ignoring it when we want to retry... It could be > clearer that way indeed.
That error doesn't really important IMHO here, because the to-dst-file should have already errored out anyway. I just think it cleaner if we reset rp_error only until the new thread created. > > >> + > >> migrate_set_state(&s->state, s->state, > >> MIGRATION_STATUS_POSTCOPY_PAUSED); > >> > >> @@ -2566,12 +2542,6 @@ static MigThrError postcopy_pause(MigrationState *s) > >> if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { > >> /* Woken up by a recover procedure. Give it a shot */ > >> > >> - /* > >> - * Firstly, let's wake up the return path now, with a new > >> - * return path channel. > >> - */ > >> - qemu_sem_post(&s->postcopy_pause_rp_sem); > >> - > >> /* Do the resume logic */ > >> if (postcopy_do_resume(s) == 0) { > >> /* Let's continue! */ > >> @@ -3259,7 +3229,7 @@ void migrate_fd_connect(MigrationState *s, Error > >> *error_in) > >> * QEMU uses the return path. > >> */ > >> if (migrate_postcopy_ram() || migrate_return_path()) { > >> - if (open_return_path_on_source(s, !resume)) { > >> + if (open_return_path_on_source(s)) { > >> error_report("Unable to open return-path for postcopy"); > >> migrate_set_state(&s->state, s->state, > >> MIGRATION_STATUS_FAILED); > >> migrate_fd_cleanup(s); > >> @@ -3320,7 +3290,6 @@ static void migration_instance_finalize(Object *obj) > >> qemu_sem_destroy(&ms->rate_limit_sem); > >> qemu_sem_destroy(&ms->pause_sem); > >> qemu_sem_destroy(&ms->postcopy_pause_sem); > >> - qemu_sem_destroy(&ms->postcopy_pause_rp_sem); > >> qemu_sem_destroy(&ms->rp_state.rp_sem); > >> qemu_sem_destroy(&ms->rp_state.rp_pong_acks); > >> qemu_sem_destroy(&ms->postcopy_qemufile_src_sem); > >> @@ -3340,7 +3309,6 @@ static void migration_instance_init(Object *obj) > >> migrate_params_init(&ms->parameters); > >> > >> qemu_sem_init(&ms->postcopy_pause_sem, 0); > >> - qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); > >> qemu_sem_init(&ms->rp_state.rp_sem, 0); > >> qemu_sem_init(&ms->rp_state.rp_pong_acks, 0); > >> qemu_sem_init(&ms->rate_limit_sem, 0); > >> diff --git a/migration/migration.h b/migration/migration.h > >> index b7c8b67542..e78db5361c 100644 > >> --- a/migration/migration.h > >> +++ b/migration/migration.h > >> @@ -382,7 +382,6 @@ struct MigrationState { > >> > >> /* Needed by postcopy-pause state */ > >> QemuSemaphore postcopy_pause_sem; > >> - QemuSemaphore postcopy_pause_rp_sem; > >> /* > >> * Whether we abort the migration if decompression errors are > >> * detected at the destination. It is left at false for qemu > >> diff --git a/migration/trace-events b/migration/trace-events > >> index 5259c1044b..19ec649d1d 100644 > >> --- a/migration/trace-events > >> +++ b/migration/trace-events > >> @@ -164,6 +164,7 @@ migration_rate_limit_pre(int ms) "%d ms" > >> migration_rate_limit_post(int urgent) "urgent: %d" > >> migration_return_path_end_before(void) "" > >> migration_return_path_end_after(int rp_error) "%d" > >> +migration_return_path_pause_err(void) "" > > > > If it should never trigger, it shouldn't need a tracepoint. It needs an > > assertion if we're 100% confident, or error_report_once() perhaps would be > > more suitable. > > It would trigger when a rp error happened that wasn't related to the > QEMUFile. If we go with your suggestion above, then this goes away. With your current patch where rp_error seems to be always reset when thread quit, if that's true then it'll 100% happen that this will not trigger. But yeah this is a trivial spot, feel free to choose the best if you plan to reorganize this patch a bit. Thanks. -- Peter Xu