On 5/15/2026 6:20 PM, Peter Xu wrote:
External email: Use caution opening links or attachments


On Fri, May 08, 2026 at 04:01:43PM +0300, Avihai Horon wrote:
On 5/7/2026 11:03 AM, Cédric Le Goater wrote:
External email: Use caution opening links or attachments


On 5/5/26 10:14, Avihai Horon wrote:
migration_completion_precopy() doesn't propagate errors to migration
core which leads to error information loss. Fix that.

This prepares for a follow-up where migration_switchover_start() can
fail on switchover-ack and still report a useful error. Errors from
qemu_savevm_state_complete_precopy() are not propagated yet as it
requires more plumbing.

Signed-off-by: Avihai Horon <[email protected]>
---
   migration/migration.c | 7 ++++++-
   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6fd89995a2..a5c7ca6796 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2780,23 +2780,28 @@ static bool
migration_switchover_start(MigrationState *s, Error **errp)
   static int migration_completion_precopy(MigrationState *s)
   {
       int ret;
+    Error *local_err = NULL;

       bql_lock();

       if (!migrate_mode_is_cpr()) {
           ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
           if (ret < 0) {
+            error_setg_errno(&local_err, -ret, "Failed to stop the
VM");
               goto out_unlock;
           }
       }

-    if (!migration_switchover_start(s, NULL)) {
+    if (!migration_switchover_start(s, &local_err)) {
           ret = -EFAULT;
           goto out_unlock;
       }

       ret = qemu_savevm_state_complete_precopy(s);
   out_unlock:
+    if (local_err) {
+        migrate_error_propagate(s, local_err);
+    }
       bql_unlock();
       return ret;
   }
Instead, I would modify migration_completion_precopy() to use the Error
variable in migration_completion() :

   static void migration_completion(MigrationState *s)
   {
       int ret = 0;
       Error *local_err = NULL;

       if (s->state == MIGRATION_STATUS_ACTIVE) {
           ret = migration_completion_precopy(s);

       ...

I'd rather keep this change limited and not involve migration_completion().
The error reporting in this path is a bit convoluted (mixing error reporting
via qemu_file) and I think it deserves a separate series cleaning things up
there.

Unless I am missing something here and the above should be easy?
Not easy to keep everything as before, but I tend to agree with Cedric.

The hard part is to maintain the same error when something failed in
migration_completion(), but IMHO that's a legacy problem we'll need to
tackle with, sooner or later.  We can do it now, facing risk that some
error message might change: I think it's worthwhile to try.

So it also avoids introducing yet another migrate_error_propagate() call
deep in the stack..  Ideally we move it upper and upper so the invokation
should be less as time goes.

The old priority to handle errors in migration_completion() is:

   1. if qemu_file_get_error_obj() succeeded, use it first, otherwise,
   2. if ret!=0, generate an error for retval

Side note: (1) is currently slightly off when qemu_file_get_error_obj()
returns non-zero but without an Error attached.. but let's ignore it for
now.

After this patch, we could prioritize Error* whenever set, hence:

   1. if error non-null, use it directly,
   2. if qemu_file_get_error_obj() succeeded, use it first, otherwise,
   3. if ret!=0, generate an error for retval

I think this order makes sense because neither qemufile error nor retcode
is better than a literal error passed over.

Sounds reasonable, I will change it.

Thanks.


Reply via email to