On Fri, May 08, 2026 at 04:01:43PM +0300, Avihai Horon wrote: > > On 5/7/2026 11:03 AM, Cédric Le Goater wrote: > > External email: Use caution opening links or attachments > > > > > > On 5/5/26 10:14, Avihai Horon wrote: > > > migration_completion_precopy() doesn't propagate errors to migration > > > core which leads to error information loss. Fix that. > > > > > > This prepares for a follow-up where migration_switchover_start() can > > > fail on switchover-ack and still report a useful error. Errors from > > > qemu_savevm_state_complete_precopy() are not propagated yet as it > > > requires more plumbing. > > > > > > Signed-off-by: Avihai Horon <[email protected]> > > > --- > > > migration/migration.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > index 6fd89995a2..a5c7ca6796 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -2780,23 +2780,28 @@ static bool > > > migration_switchover_start(MigrationState *s, Error **errp) > > > static int migration_completion_precopy(MigrationState *s) > > > { > > > int ret; > > > + Error *local_err = NULL; > > > > > > bql_lock(); > > > > > > if (!migrate_mode_is_cpr()) { > > > ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE); > > > if (ret < 0) { > > > + error_setg_errno(&local_err, -ret, "Failed to stop the > > > VM"); > > > goto out_unlock; > > > } > > > } > > > > > > - if (!migration_switchover_start(s, NULL)) { > > > + if (!migration_switchover_start(s, &local_err)) { > > > ret = -EFAULT; > > > goto out_unlock; > > > } > > > > > > ret = qemu_savevm_state_complete_precopy(s); > > > out_unlock: > > > + if (local_err) { > > > + migrate_error_propagate(s, local_err); > > > + } > > > bql_unlock(); > > > return ret; > > > } > > > > Instead, I would modify migration_completion_precopy() to use the Error > > variable in migration_completion() : > > > > static void migration_completion(MigrationState *s) > > { > > int ret = 0; > > Error *local_err = NULL; > > > > if (s->state == MIGRATION_STATUS_ACTIVE) { > > ret = migration_completion_precopy(s); > > > > ... > > > I'd rather keep this change limited and not involve migration_completion(). > The error reporting in this path is a bit convoluted (mixing error reporting > via qemu_file) and I think it deserves a separate series cleaning things up > there. > > Unless I am missing something here and the above should be easy?
Not easy to keep everything as before, but I tend to agree with Cedric. The hard part is to maintain the same error when something failed in migration_completion(), but IMHO that's a legacy problem we'll need to tackle with, sooner or later. We can do it now, facing risk that some error message might change: I think it's worthwhile to try. So it also avoids introducing yet another migrate_error_propagate() call deep in the stack.. Ideally we move it upper and upper so the invokation should be less as time goes. The old priority to handle errors in migration_completion() is: 1. if qemu_file_get_error_obj() succeeded, use it first, otherwise, 2. if ret!=0, generate an error for retval Side note: (1) is currently slightly off when qemu_file_get_error_obj() returns non-zero but without an Error attached.. but let's ignore it for now. After this patch, we could prioritize Error* whenever set, hence: 1. if error non-null, use it directly, 2. if qemu_file_get_error_obj() succeeded, use it first, otherwise, 3. if ret!=0, generate an error for retval I think this order makes sense because neither qemufile error nor retcode is better than a literal error passed over. Thanks, -- Peter Xu
