On 04/05/23 1:46 pm, Daniel P. Berrangé wrote:
> On Wed, May 03, 2023 at 08:31:16PM +0000, tejus.gk wrote:
>> There are places in the code where the migration is marked failed with
>> MIGRATION_STATUS_FAILED, but the failiure reason is never updated. Hence
>> libvirt doesn't know why the migration failed when it queries for it.
>>
>> Signed-off-by: tejus.gk <tejus...@nutanix.com>
>> ---
>>  migration/migration.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index feb5ab7493..0d7d34bf4d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1665,8 +1665,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
>> blk,
>>          }
>>          error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
>>                     "a valid migration protocol");
>> +        error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE, "uri",
>> +                   "a valid migration protocol");
>>          migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>                            MIGRATION_STATUS_FAILED);
>> +        migrate_set_error(s, local_err);
>>          block_cleanup_parameters();
>>          return;
> 
> Most of this  "} else {"  block is duplicating what is done in
> the following "if (local_error)" block. As such I think this
> should be deleted and replaced with merely
> 
>    } else {
>         error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE, "uri",
>                    "a valid migration protocol");
>         block_cleanup_parameters();
>    }
> 
> ...so we just fallthruogh to the local_error cleanup block.
Ack. Will modify this is in the next patch. 
> 
>>      }
>> @@ -2059,6 +2062,7 @@ static int postcopy_start(MigrationState *ms)
>>      int64_t bandwidth = migrate_max_postcopy_bandwidth();
>>      bool restart_block = false;
>>      int cur_state = MIGRATION_STATUS_ACTIVE;
>> +    Error *local_err = NULL;
>>  
>>      if (migrate_postcopy_preempt()) {
>>          migration_wait_main_channel(ms);
>> @@ -2203,8 +2207,10 @@ static int postcopy_start(MigrationState *ms)
>>      ret = qemu_file_get_error(ms->to_dst_file);
>>      if (ret) {
>>          error_report("postcopy_start: Migration stream errored");
>> +        error_setg(&local_err, "postcopy_start: Migration stream errored");
> 
> There is an earlier place in this method which also calls
> error_report which you've not changed to call migrate_set_error.
> 
Ack, will fix this in the next patch. 
> Even more crazy is that the caller of postcopy_start() also
> calls error_report() but with a useless error message.
> 
> ALso nothing is free'ing the local_err object once set.
> 
> IMHO, the postcopy_start() method should be changed to accept
> an "Error **errp" parameter, and then the caller should be
> responsible for calling error_report_err and migrate_set_error
Ack, will modify this in the next patch. 
> 
> 
>>          migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>>                                MIGRATION_STATUS_FAILED);
>> +        migrate_set_error(ms, local_err);
>>      }
>>  
>>      trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
>> @@ -3233,7 +3239,9 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>      if (migrate_postcopy_ram() || migrate_return_path()) {
>>          if (open_return_path_on_source(s, !resume)) {
>>              error_report("Unable to open return-path for postcopy");
>> +            error_setg(&local_err, "Unable to open return-path");
> 
> Having two different error messages is bad and again nothing free's
> the local_err object. Remove the error_report call and have it call
> error_report_err(&local_err) which does free the object
My bad, missed this. Will fix this in the next patch. 
> 
>>              migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>> +            migrate_set_error(s, local_err);
>>              migrate_fd_cleanup(s);
>>              return;
>>          }
>> -- 
>> 2.22.3
>>
>>
> 
> With regards,
> Daniel

Hi, 
Thanks for the reviews. I'll be sending a revision with the fixes shortly. 
Meanwhile I wanted to get something clarified. Apart from the places this patch 
set is covering, there are also places in the code, where the migration is 
marked as failed, yet an error_report() call is either not happening or is 
happening in a different file. An example of the latter can be seen in the 
function migration_completion() in migration.c, where

        ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
                                                         s->block_inactive);
            }
        }
        qemu_mutex_unlock_iothread();

        if (ret < 0) {
            goto fail;
        }

and if we take a look at fail:

        fail:
              migrate_set_state(&s->state, current_active_state,
                      MIGRATION_STATUS_FAILED);

In this instance, the error_report() call for a possible failure while saving 
the vmstate is being done in the file vmstate.c. I wanted to ask if doing a 
migrate_set_error() in a different file (vmstate.c in this case) is permissible?

regards,
tejus

Reply via email to