On 5/15/2026 6:20 PM, Peter Xu wrote:
External email: Use caution opening links or attachments
On Fri, May 08, 2026 at 04:01:43PM +0300, Avihai Horon wrote:
On 5/7/2026 11:03 AM, Cédric Le Goater wrote:
External email: Use caution opening links or attachments
On 5/5/26 10:14, Avihai Horon wrote:
migration_completion_precopy() doesn't propagate errors to migration
core which leads to error information loss. Fix that.
This prepares for a follow-up where migration_switchover_start() can
fail on switchover-ack and still report a useful error. Errors from
qemu_savevm_state_complete_precopy() are not propagated yet as it
requires more plumbing.
Signed-off-by: Avihai Horon <[email protected]>
---
migration/migration.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/migration/migration.c b/migration/migration.c
index 6fd89995a2..a5c7ca6796 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2780,23 +2780,28 @@ static bool
migration_switchover_start(MigrationState *s, Error **errp)
static int migration_completion_precopy(MigrationState *s)
{
int ret;
+ Error *local_err = NULL;
bql_lock();
if (!migrate_mode_is_cpr()) {
ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
if (ret < 0) {
+ error_setg_errno(&local_err, -ret, "Failed to stop the
VM");
goto out_unlock;
}
}
- if (!migration_switchover_start(s, NULL)) {
+ if (!migration_switchover_start(s, &local_err)) {
ret = -EFAULT;
goto out_unlock;
}
ret = qemu_savevm_state_complete_precopy(s);
out_unlock:
+ if (local_err) {
+ migrate_error_propagate(s, local_err);
+ }
bql_unlock();
return ret;
}
Instead, I would modify migration_completion_precopy() to use the Error
variable in migration_completion() :
static void migration_completion(MigrationState *s)
{
int ret = 0;
Error *local_err = NULL;
if (s->state == MIGRATION_STATUS_ACTIVE) {
ret = migration_completion_precopy(s);
...
I'd rather keep this change limited and not involve migration_completion().
The error reporting in this path is a bit convoluted (mixing error reporting
via qemu_file) and I think it deserves a separate series cleaning things up
there.
Unless I am missing something here and the above should be easy?
Not easy to keep everything as before, but I tend to agree with Cedric.
The hard part is to maintain the same error when something failed in
migration_completion(), but IMHO that's a legacy problem we'll need to
tackle with, sooner or later. We can do it now, facing risk that some
error message might change: I think it's worthwhile to try.
So it also avoids introducing yet another migrate_error_propagate() call
deep in the stack.. Ideally we move it upper and upper so the invokation
should be less as time goes.
The old priority to handle errors in migration_completion() is:
1. if qemu_file_get_error_obj() succeeded, use it first, otherwise,
2. if ret!=0, generate an error for retval
Side note: (1) is currently slightly off when qemu_file_get_error_obj()
returns non-zero but without an Error attached.. but let's ignore it for
now.
After this patch, we could prioritize Error* whenever set, hence:
1. if error non-null, use it directly,
2. if qemu_file_get_error_obj() succeeded, use it first, otherwise,
3. if ret!=0, generate an error for retval
I think this order makes sense because neither qemufile error nor retcode
is better than a literal error passed over.
Sounds reasonable, I will change it.
Thanks.