On Fri, May 08, 2026 at 04:01:43PM +0300, Avihai Horon wrote:
> 
> On 5/7/2026 11:03 AM, Cédric Le Goater wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 5/5/26 10:14, Avihai Horon wrote:
> > > migration_completion_precopy() doesn't propagate errors to migration
> > > core which leads to error information loss. Fix that.
> > > 
> > > This prepares for a follow-up where migration_switchover_start() can
> > > fail on switchover-ack and still report a useful error. Errors from
> > > qemu_savevm_state_complete_precopy() are not propagated yet as it
> > > requires more plumbing.
> > > 
> > > Signed-off-by: Avihai Horon <[email protected]>
> > > ---
> > >   migration/migration.c | 7 ++++++-
> > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 6fd89995a2..a5c7ca6796 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2780,23 +2780,28 @@ static bool
> > > migration_switchover_start(MigrationState *s, Error **errp)
> > >   static int migration_completion_precopy(MigrationState *s)
> > >   {
> > >       int ret;
> > > +    Error *local_err = NULL;
> > > 
> > >       bql_lock();
> > > 
> > >       if (!migrate_mode_is_cpr()) {
> > >           ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
> > >           if (ret < 0) {
> > > +            error_setg_errno(&local_err, -ret, "Failed to stop the
> > > VM");
> > >               goto out_unlock;
> > >           }
> > >       }
> > > 
> > > -    if (!migration_switchover_start(s, NULL)) {
> > > +    if (!migration_switchover_start(s, &local_err)) {
> > >           ret = -EFAULT;
> > >           goto out_unlock;
> > >       }
> > > 
> > >       ret = qemu_savevm_state_complete_precopy(s);
> > >   out_unlock:
> > > +    if (local_err) {
> > > +        migrate_error_propagate(s, local_err);
> > > +    }
> > >       bql_unlock();
> > >       return ret;
> > >   }
> > 
> > Instead, I would modify migration_completion_precopy() to use the Error
> > variable in migration_completion() :
> > 
> >   static void migration_completion(MigrationState *s)
> >   {
> >       int ret = 0;
> >       Error *local_err = NULL;
> > 
> >       if (s->state == MIGRATION_STATUS_ACTIVE) {
> >           ret = migration_completion_precopy(s);
> > 
> >       ...
> > 
> I'd rather keep this change limited and not involve migration_completion().
> The error reporting in this path is a bit convoluted (mixing error reporting
> via qemu_file) and I think it deserves a separate series cleaning things up
> there.
> 
> Unless I am missing something here and the above should be easy?

Not easy to keep everything as before, but I tend to agree with Cedric.

The hard part is to maintain the same error when something failed in
migration_completion(), but IMHO that's a legacy problem we'll need to
tackle with, sooner or later.  We can do it now, facing risk that some
error message might change: I think it's worthwhile to try.

So it also avoids introducing yet another migrate_error_propagate() call
deep in the stack..  Ideally we move it upper and upper so the invokation
should be less as time goes.

The old priority to handle errors in migration_completion() is:

  1. if qemu_file_get_error_obj() succeeded, use it first, otherwise,
  2. if ret!=0, generate an error for retval

Side note: (1) is currently slightly off when qemu_file_get_error_obj()
returns non-zero but without an Error attached.. but let's ignore it for
now.

After this patch, we could prioritize Error* whenever set, hence:

  1. if error non-null, use it directly,
  2. if qemu_file_get_error_obj() succeeded, use it first, otherwise,
  3. if ret!=0, generate an error for retval

I think this order makes sense because neither qemufile error nor retcode
is better than a literal error passed over.

Thanks,

-- 
Peter Xu


Reply via email to