Peter Xu <pet...@redhat.com> writes: >> + if (await_return_path_close_on_source(s)) { >> + trace_migration_return_path_pause_err(); >> + return MIG_THR_ERR_FATAL; >> + } > > I see that here on return path failures we'll bail out, and actually it's > against the instinction (that when pause it should have failed, so it's > weird why it's returning 0). > > So how about above suggestion, plus here we just call > await_return_path_close_on_source(), without caring about the retval?
So you are suggesting to remove the knowledge of the retry entirely from the thread. It just reports the error and the postcopy_pause takes the responsibility of ignoring it when we want to retry... It could be clearer that way indeed. >> + >> migrate_set_state(&s->state, s->state, >> MIGRATION_STATUS_POSTCOPY_PAUSED); >> >> @@ -2566,12 +2542,6 @@ static MigThrError postcopy_pause(MigrationState *s) >> if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { >> /* Woken up by a recover procedure. Give it a shot */ >> >> - /* >> - * Firstly, let's wake up the return path now, with a new >> - * return path channel. >> - */ >> - qemu_sem_post(&s->postcopy_pause_rp_sem); >> - >> /* Do the resume logic */ >> if (postcopy_do_resume(s) == 0) { >> /* Let's continue! */ >> @@ -3259,7 +3229,7 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> * QEMU uses the return path. >> */ >> if (migrate_postcopy_ram() || migrate_return_path()) { >> - if (open_return_path_on_source(s, !resume)) { >> + if (open_return_path_on_source(s)) { >> error_report("Unable to open return-path for postcopy"); >> migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED); >> migrate_fd_cleanup(s); >> @@ -3320,7 +3290,6 @@ static void migration_instance_finalize(Object *obj) >> qemu_sem_destroy(&ms->rate_limit_sem); >> qemu_sem_destroy(&ms->pause_sem); >> qemu_sem_destroy(&ms->postcopy_pause_sem); >> - qemu_sem_destroy(&ms->postcopy_pause_rp_sem); >> qemu_sem_destroy(&ms->rp_state.rp_sem); >> qemu_sem_destroy(&ms->rp_state.rp_pong_acks); >> qemu_sem_destroy(&ms->postcopy_qemufile_src_sem); >> @@ -3340,7 +3309,6 @@ static void migration_instance_init(Object *obj) >> migrate_params_init(&ms->parameters); >> >> qemu_sem_init(&ms->postcopy_pause_sem, 0); >> - qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); >> qemu_sem_init(&ms->rp_state.rp_sem, 0); >> qemu_sem_init(&ms->rp_state.rp_pong_acks, 0); >> qemu_sem_init(&ms->rate_limit_sem, 0); >> diff --git a/migration/migration.h b/migration/migration.h >> index b7c8b67542..e78db5361c 100644 >> --- a/migration/migration.h >> +++ b/migration/migration.h >> @@ -382,7 +382,6 @@ struct MigrationState { >> >> /* Needed by postcopy-pause state */ >> QemuSemaphore postcopy_pause_sem; >> - QemuSemaphore postcopy_pause_rp_sem; >> /* >> * Whether we abort the migration if decompression errors are >> * detected at the destination. It is left at false for qemu >> diff --git a/migration/trace-events b/migration/trace-events >> index 5259c1044b..19ec649d1d 100644 >> --- a/migration/trace-events >> +++ b/migration/trace-events >> @@ -164,6 +164,7 @@ migration_rate_limit_pre(int ms) "%d ms" >> migration_rate_limit_post(int urgent) "urgent: %d" >> migration_return_path_end_before(void) "" >> migration_return_path_end_after(int rp_error) "%d" >> +migration_return_path_pause_err(void) "" > > If it should never trigger, it shouldn't need a tracepoint. It needs an > assertion if we're 100% confident, or error_report_once() perhaps would be > more suitable. It would trigger when a rp error happened that wasn't related to the QEMUFile. If we go with your suggestion above, then this goes away.