Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
On 3/28/24 16:50, Avihai Horon wrote: On 28/03/2024 17:21, Cédric Le Goater wrote: External email: Use caution opening links or attachments Hello Avihai, On 3/28/24 15:02, Avihai Horon wrote: After commit 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()"), close_return_path_on_source() assumes that migration error is set if an error occurs during migration. This may not be true if migration errors in migration_completion(). For example, if qemu_savevm_state_complete_precopy() errors, migration error will not be set Out of curiosity, could you describe a bit more the context ? Did vfio_save_complete_precopy() fail ? why ? Yep, vfio_save_complete_precopy() failed (but it failed while I was experimenting with an unofficial debug FW). We should propagate errors of .save_live_complete_precopy() handlers as it was done .save_setup handlers(). For 9.1. Agreed. This in turn, will cause a migration hang bug, similar to the bug that was fixed by commit 22b04245f0d5 ("migration: Join the return path thread before releasing to_dst_file"), as shutdown() will not be issued for the return-path channel. yes, but this test : if (ret < 0) { goto fail; } will skip the close_return_path_on_source() call. Won't it ? So I don't understand how it can be an issue. Am I missing something ? It will skip the close_return_path_on_source() call in migration_completion(), but there is another close_return_path_on_source() call in migrate_fd_cleanup(). OK. Found it. This is a code path I hadn't explored yet. Acked-by: Cédric Le Goater Thanks, C. Fix it by ensuring migration error is set in case of error in migration_completion(). Why didn't you add a reference to commit 9425ef3f990a ? I thought this commit didn't introduce this bug, but looking again in the mailing list [1], it kinda did: The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, but only for migration_completion() case. So, you are right, a fixes line with 9425ef3f990a should be added. Thanks. [1] https://lore.kernel.org/all/20240226203122.22894-1-faro...@suse.de/ Signed-off-by: Avihai Horon --- migration/migration.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9fe8fd2afd7..b73ae3a72c4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) { int ret = 0; int current_active_state = s->state; + Error *local_err = NULL; if (s->state == MIGRATION_STATUS_ACTIVE) { ret = migration_completion_precopy(s, _active_state); @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) return; fail: + if (qemu_file_get_error_obj(s->to_dst_file, _err)) { + migrate_set_error(s, local_err); + error_free(local_err); + } else if (ret) { + error_setg_errno(_err, -ret, "Error in migration completion"); The 'ret = -1' case could be improved with error_setg(). As a followup. Thanks, C. + migrate_set_error(s, local_err); + error_free(local_err); + } + migration_completion_failed(s, current_active_state); }
Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
On 28/03/2024 17:09, Peter Xu wrote: External email: Use caution opening links or attachments On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote: After commit 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()"), close_return_path_on_source() assumes that migration error is set if an error occurs during migration. This may not be true if migration errors in migration_completion(). For example, if qemu_savevm_state_complete_precopy() errors, migration error will not be set. This in turn, will cause a migration hang bug, similar to the bug that was fixed by commit 22b04245f0d5 ("migration: Join the return path thread before releasing to_dst_file"), as shutdown() will not be issued for the return-path channel. Fix it by ensuring migration error is set in case of error in migration_completion(). Signed-off-by: Avihai Horon Reviewed-by: Peter Xu I'll attach this if it looks all right to you: Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()") Yes, sure, go ahead. Thanks. Thanks, --- migration/migration.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9fe8fd2afd7..b73ae3a72c4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) { int ret = 0; int current_active_state = s->state; +Error *local_err = NULL; if (s->state == MIGRATION_STATUS_ACTIVE) { ret = migration_completion_precopy(s, _active_state); @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) return; fail: +if (qemu_file_get_error_obj(s->to_dst_file, _err)) { +migrate_set_error(s, local_err); +error_free(local_err); +} else if (ret) { +error_setg_errno(_err, -ret, "Error in migration completion"); +migrate_set_error(s, local_err); +error_free(local_err); +} + migration_completion_failed(s, current_active_state); } -- 2.26.3 -- Peter Xu
Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
On 28/03/2024 17:21, Cédric Le Goater wrote: External email: Use caution opening links or attachments Hello Avihai, On 3/28/24 15:02, Avihai Horon wrote: After commit 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()"), close_return_path_on_source() assumes that migration error is set if an error occurs during migration. This may not be true if migration errors in migration_completion(). For example, if qemu_savevm_state_complete_precopy() errors, migration error will not be set Out of curiosity, could you describe a bit more the context ? Did vfio_save_complete_precopy() fail ? why ? Yep, vfio_save_complete_precopy() failed (but it failed while I was experimenting with an unofficial debug FW). We should propagate errors of .save_live_complete_precopy() handlers as it was done .save_setup handlers(). For 9.1. Agreed. This in turn, will cause a migration hang bug, similar to the bug that was fixed by commit 22b04245f0d5 ("migration: Join the return path thread before releasing to_dst_file"), as shutdown() will not be issued for the return-path channel. yes, but this test : if (ret < 0) { goto fail; } will skip the close_return_path_on_source() call. Won't it ? So I don't understand how it can be an issue. Am I missing something ? It will skip the close_return_path_on_source() call in migration_completion(), but there is another close_return_path_on_source() call in migrate_fd_cleanup(). Fix it by ensuring migration error is set in case of error in migration_completion(). Why didn't you add a reference to commit 9425ef3f990a ? I thought this commit didn't introduce this bug, but looking again in the mailing list [1], it kinda did: The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, but only for migration_completion() case. So, you are right, a fixes line with 9425ef3f990a should be added. Thanks. [1] https://lore.kernel.org/all/20240226203122.22894-1-faro...@suse.de/ Signed-off-by: Avihai Horon --- migration/migration.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9fe8fd2afd7..b73ae3a72c4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) { int ret = 0; int current_active_state = s->state; + Error *local_err = NULL; if (s->state == MIGRATION_STATUS_ACTIVE) { ret = migration_completion_precopy(s, _active_state); @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) return; fail: + if (qemu_file_get_error_obj(s->to_dst_file, _err)) { + migrate_set_error(s, local_err); + error_free(local_err); + } else if (ret) { + error_setg_errno(_err, -ret, "Error in migration completion"); The 'ret = -1' case could be improved with error_setg(). As a followup. Thanks, C. + migrate_set_error(s, local_err); + error_free(local_err); + } + migration_completion_failed(s, current_active_state); }
Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
Hello Avihai, On 3/28/24 15:02, Avihai Horon wrote: After commit 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()"), close_return_path_on_source() assumes that migration error is set if an error occurs during migration. This may not be true if migration errors in migration_completion(). For example, if qemu_savevm_state_complete_precopy() errors, migration error will not be set Out of curiosity, could you describe a bit more the context ? Did vfio_save_complete_precopy() fail ? why ? We should propagate errors of .save_live_complete_precopy() handlers as it was done .save_setup handlers(). For 9.1. This in turn, will cause a migration hang bug, similar to the bug that was fixed by commit 22b04245f0d5 ("migration: Join the return path thread before releasing to_dst_file"), as shutdown() will not be issued for the return-path channel. yes, but this test : if (ret < 0) { goto fail; } will skip the close_return_path_on_source() call. Won't it ? So I don't understand how it can be an issue. Am I missing something ? Fix it by ensuring migration error is set in case of error in migration_completion(). Why didn't you add a reference to commit 9425ef3f990a ? Signed-off-by: Avihai Horon --- migration/migration.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9fe8fd2afd7..b73ae3a72c4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) { int ret = 0; int current_active_state = s->state; +Error *local_err = NULL; if (s->state == MIGRATION_STATUS_ACTIVE) { ret = migration_completion_precopy(s, _active_state); @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) return; fail: +if (qemu_file_get_error_obj(s->to_dst_file, _err)) { +migrate_set_error(s, local_err); +error_free(local_err); +} else if (ret) { +error_setg_errno(_err, -ret, "Error in migration completion"); The 'ret = -1' case could be improved with error_setg(). As a followup. Thanks, C. +migrate_set_error(s, local_err); +error_free(local_err); +} + migration_completion_failed(s, current_active_state); }
Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
On Thu, Mar 28, 2024 at 04:02:51PM +0200, Avihai Horon wrote: > After commit 9425ef3f990a ("migration: Use migrate_has_error() in > close_return_path_on_source()"), close_return_path_on_source() assumes > that migration error is set if an error occurs during migration. > > This may not be true if migration errors in migration_completion(). For > example, if qemu_savevm_state_complete_precopy() errors, migration error > will not be set. > > This in turn, will cause a migration hang bug, similar to the bug that > was fixed by commit 22b04245f0d5 ("migration: Join the return path > thread before releasing to_dst_file"), as shutdown() will not be issued > for the return-path channel. > > Fix it by ensuring migration error is set in case of error in > migration_completion(). > > Signed-off-by: Avihai Horon Reviewed-by: Peter Xu I'll attach this if it looks all right to you: Fixes: 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()") Thanks, > --- > migration/migration.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 9fe8fd2afd7..b73ae3a72c4 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) > { > int ret = 0; > int current_active_state = s->state; > +Error *local_err = NULL; > > if (s->state == MIGRATION_STATUS_ACTIVE) { > ret = migration_completion_precopy(s, _active_state); > @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) > return; > > fail: > +if (qemu_file_get_error_obj(s->to_dst_file, _err)) { > +migrate_set_error(s, local_err); > +error_free(local_err); > +} else if (ret) { > +error_setg_errno(_err, -ret, "Error in migration completion"); > +migrate_set_error(s, local_err); > +error_free(local_err); > +} > + > migration_completion_failed(s, current_active_state); > } > > -- > 2.26.3 > > -- Peter Xu
[PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
After commit 9425ef3f990a ("migration: Use migrate_has_error() in close_return_path_on_source()"), close_return_path_on_source() assumes that migration error is set if an error occurs during migration. This may not be true if migration errors in migration_completion(). For example, if qemu_savevm_state_complete_precopy() errors, migration error will not be set. This in turn, will cause a migration hang bug, similar to the bug that was fixed by commit 22b04245f0d5 ("migration: Join the return path thread before releasing to_dst_file"), as shutdown() will not be issued for the return-path channel. Fix it by ensuring migration error is set in case of error in migration_completion(). Signed-off-by: Avihai Horon --- migration/migration.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9fe8fd2afd7..b73ae3a72c4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s) { int ret = 0; int current_active_state = s->state; +Error *local_err = NULL; if (s->state == MIGRATION_STATUS_ACTIVE) { ret = migration_completion_precopy(s, _active_state); @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s) return; fail: +if (qemu_file_get_error_obj(s->to_dst_file, _err)) { +migrate_set_error(s, local_err); +error_free(local_err); +} else if (ret) { +error_setg_errno(_err, -ret, "Error in migration completion"); +migrate_set_error(s, local_err); +error_free(local_err); +} + migration_completion_failed(s, current_active_state); } -- 2.26.3