On Wed, May 03, 2023 at 08:31:16PM +0000, tejus.gk wrote:
> There are places in the code where the migration is marked failed with
> MIGRATION_STATUS_FAILED, but the failiure reason is never updated. Hence
> libvirt doesn't know why the migration failed when it queries for it.
> 
> Signed-off-by: tejus.gk <tejus...@nutanix.com>
> ---
>  migration/migration.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index feb5ab7493..0d7d34bf4d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1665,8 +1665,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>          }
>          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
>                     "a valid migration protocol");
> +        error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE, "uri",
> +                   "a valid migration protocol");
>          migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>                            MIGRATION_STATUS_FAILED);
> +        migrate_set_error(s, local_err);
>          block_cleanup_parameters();
>          return;

Most of this  "} else {"  block is duplicating what is done in
the following "if (local_error)" block. As such I think this
should be deleted and replaced with merely

   } else {
        error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE, "uri",
                   "a valid migration protocol");
        block_cleanup_parameters();
   }

...so we just fallthruogh to the local_error cleanup block.

>      }
> @@ -2059,6 +2062,7 @@ static int postcopy_start(MigrationState *ms)
>      int64_t bandwidth = migrate_max_postcopy_bandwidth();
>      bool restart_block = false;
>      int cur_state = MIGRATION_STATUS_ACTIVE;
> +    Error *local_err = NULL;
>  
>      if (migrate_postcopy_preempt()) {
>          migration_wait_main_channel(ms);
> @@ -2203,8 +2207,10 @@ static int postcopy_start(MigrationState *ms)
>      ret = qemu_file_get_error(ms->to_dst_file);
>      if (ret) {
>          error_report("postcopy_start: Migration stream errored");
> +        error_setg(&local_err, "postcopy_start: Migration stream errored");

There is an earlier place in this method which also calls
error_report which you've not changed to call migrate_set_error.

Even more crazy is that the caller of postcopy_start() also
calls error_report() but with a useless error message.

ALso nothing is free'ing the local_err object once set.

IMHO, the postcopy_start() method should be changed to accept
an "Error **errp" parameter, and then the caller should be
responsible for calling error_report_err and migrate_set_error


>          migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                                MIGRATION_STATUS_FAILED);
> +        migrate_set_error(ms, local_err);
>      }
>  
>      trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
> @@ -3233,7 +3239,9 @@ void migrate_fd_connect(MigrationState *s, Error 
> *error_in)
>      if (migrate_postcopy_ram() || migrate_return_path()) {
>          if (open_return_path_on_source(s, !resume)) {
>              error_report("Unable to open return-path for postcopy");
> +            error_setg(&local_err, "Unable to open return-path");

Having two different error messages is bad and again nothing free's
the local_err object. Remove the error_report call and have it call
error_report_err(&local_err) which does free the object

>              migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> +            migrate_set_error(s, local_err);
>              migrate_fd_cleanup(s);
>              return;
>          }
> -- 
> 2.22.3
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to