Re: [Qemu-devel] [Qemu-devel [RFC] [WIP] v2] Keeping the Source side alive incase of network failure (Migration recovery from network failure)

haris iqbal Sun, 12 Jun 2016 23:40:43 -0700

On Thu, Jun 9, 2016 at 2:11 AM, Eric Blake <ebl...@redhat.com> wrote:
> On 06/08/2016 12:13 PM, Md Haris Iqbal wrote:
>
> The subject line is long, and has a typo (s/incase/in case/).  Also, the
> mailing list automatically prepends [Qemu-devel], so you shouldn't
> repeat it manually.  Better might have been a short subject line then a
> longer commit body:
>
> migration: keep source alive on network failure
>
> Details about what was failing, and why this code improves it


Yes, I will add details and will take care while writing commit messages.

>
> Missing a Signed-off-by: attribution; without that, we can't take it.

I will add it from next time.

>
>> ---
>
> You marked this patch as v2, but in the same minute sent another email
> with subject line v1, and didn't say what changed to need a v2. Here
> after the --- divider is a good place for that.

The other patch is different then the one I posted with v1. There was
another patch which I posted few days back, which wa v1 of this patch.
I should have pointed out what has changed though.

>
>>  include/migration/migration.h |  1 +
>>  migration/migration.c         | 76 
>> ++++++++++++++++++++++++++++++++++++++++---
>>  qapi-schema.json              | 11 +++++--
>>  vl.c                          |  4 +++
>>  4 files changed, 85 insertions(+), 7 deletions(-)
>>
>
>> @@ -1726,11 +1755,32 @@ static void *migration_thread(void *opaque)
>>              }
>>          }
>>
>> -        if (qemu_file_get_error(s->to_dst_file)) {
>> -            migrate_set_state(&s->state, current_active_state,
>> -                              MIGRATION_STATUS_FAILED);
>> -            trace_migration_thread_file_err();
>> -            break;
>> +        if ((ret = qemu_file_get_error(s->to_dst_file))) {
>> +            fprintf(stderr, "1 : Error %s %d\n", strerror(-ret), -ret);
>
> fprintf() is rather awkward for errors; can we use qemu's Error mechanism?

Just something I am using to debug. It will be absent in the final
version of the patch.

>
>> +
>> +            /*  This check is based on how the error is set during the 
>> network
>> +             *  recv(). When recv() returns 0 (i.e. no data to read), the 
>> error
>> +             *  is set to -EIO. For all other network errors, it is set
>> +             *  according to the return value received.
>> +             */
>> +            if (ret != -EIO && s->state == 
>> MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>> +                /* Network Failure during postcopy */
>> +
>> +                current_active_state = MIGRATION_STATUS_POSTCOPY_RECOVERY;
>> +                runstate_set(RUN_STATE_POSTMIGRATE_RECOVERY);
>> +                fprintf(stderr, "1.1 : Error %s %d\n", strerror(-ret), 
>> -ret);
>
> Does the end user really need to see "1.1 :"

Just a debugging output.

>
>
>> +++ b/qapi-schema.json
>> @@ -154,12 +154,14 @@
>>  # @watchdog: the watchdog action is configured to pause and has been 
>> triggered
>>  #
>>  # @guest-panicked: guest has been panicked as a result of guest OS panic
>> +#
>> +# @postmigrate-recovery: guest is paused for recovery after a network 
>> failure
>
> Not your fault that the overall enum is missing an overall line:
>
> # Since: 1.4
>
> nor that guest-panicked is missing a "(since 1.5)" hint, but at least
> your addition should have a "(since 2.7)" hint.

Added.

>
>>  ##
>>  { 'enum': 'RunState',
>>    'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>>              'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>>              'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
>> -            'guest-panicked' ] }
>> +            'guest-panicked', 'postmigrate-recovery' ] }
>
> Adding new enums can cause existing clients like libvirt to do weird
> things if they aren't expecting the new state. Are we sure we want to do
> it?
I think so. If we do not have a new state, then one would not know
that the VM is in recovery.

> Is it a state that cannot be entered by default, but only in
> response to a client request that proves the client is new enough to
> expect the new state?

I did not quite understand what you are trying to say.

>
>>
>>  ##
>>  # @StatusInfo:
>> @@ -434,12 +436,15 @@
>>  #
>>  # @failed: some error occurred during migration process.
>>  #
>> +# @postcopy-recovery: in recovery mode, after a network failure.
>> +#
>
> Missing a "(since 2.7)" hint.

Added.

>
>>  # Since: 2.3
>>  #
>>  ##
>>  { 'enum': 'MigrationStatus',
>>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
>> -            'active', 'postcopy-active', 'completed', 'failed' ] }
>> +            'active', 'postcopy-active', 'completed', 'failed',
>> +            'postcopy-recovery' ] }
>>
>>  ##
>>  # @MigrationInfo
>> @@ -2058,6 +2063,8 @@
>>  #
>>  # @uri: the Uniform Resource Identifier of the destination VM
>>  #
>> +# @recover: #optional recover from a broken migration
>> +#
>
> I don't see any 'recover' parameter added to the 'migrate' command to
> match this added documentation.

That was a mistake. This was suppose to go to the other patch I posted
this same minute. Just missed splitting it out.

>
>>  # @blk: #optional do block migration (full disk copy)
>>  #
>>  # @inc: #optional incremental disk copy migration
>> diff --git a/vl.c b/vl.c
>> index 5fd22cb..c237140 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -618,6 +618,10 @@ static const RunStateTransition 
>> runstate_transitions_def[] = {
>>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
>>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
>> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE_RECOVERY },
>> +
>> +    { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_FINISH_MIGRATE },
>> +    { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_SHUTDOWN },
>>
>>      { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>>      { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
>>
>
> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



-- 

With regards,

Md Haris Iqbal,
Placement Coordinator, MTech IT
NITK Surathkal,
Contact: +91 8861996962

Re: [Qemu-devel] [Qemu-devel [RFC] [WIP] v2] Keeping the Source side alive incase of network failure (Migration recovery from network failure)

Reply via email to