On Thu, Jun 9, 2016 at 2:11 AM, Eric Blake <ebl...@redhat.com> wrote: > On 06/08/2016 12:13 PM, Md Haris Iqbal wrote: > > The subject line is long, and has a typo (s/incase/in case/). Also, the > mailing list automatically prepends [Qemu-devel], so you shouldn't > repeat it manually. Better might have been a short subject line then a > longer commit body: > > migration: keep source alive on network failure > > Details about what was failing, and why this code improves it
Yes, I will add details and will take care while writing commit messages. > > Missing a Signed-off-by: attribution; without that, we can't take it. I will add it from next time. > >> --- > > You marked this patch as v2, but in the same minute sent another email > with subject line v1, and didn't say what changed to need a v2. Here > after the --- divider is a good place for that. The other patch is different then the one I posted with v1. There was another patch which I posted few days back, which wa v1 of this patch. I should have pointed out what has changed though. > >> include/migration/migration.h | 1 + >> migration/migration.c | 76 >> ++++++++++++++++++++++++++++++++++++++++--- >> qapi-schema.json | 11 +++++-- >> vl.c | 4 +++ >> 4 files changed, 85 insertions(+), 7 deletions(-) >> > >> @@ -1726,11 +1755,32 @@ static void *migration_thread(void *opaque) >> } >> } >> >> - if (qemu_file_get_error(s->to_dst_file)) { >> - migrate_set_state(&s->state, current_active_state, >> - MIGRATION_STATUS_FAILED); >> - trace_migration_thread_file_err(); >> - break; >> + if ((ret = qemu_file_get_error(s->to_dst_file))) { >> + fprintf(stderr, "1 : Error %s %d\n", strerror(-ret), -ret); > > fprintf() is rather awkward for errors; can we use qemu's Error mechanism? Just something I am using to debug. It will be absent in the final version of the patch. > >> + >> + /* This check is based on how the error is set during the >> network >> + * recv(). When recv() returns 0 (i.e. no data to read), the >> error >> + * is set to -EIO. For all other network errors, it is set >> + * according to the return value received. >> + */ >> + if (ret != -EIO && s->state == >> MIGRATION_STATUS_POSTCOPY_ACTIVE) { >> + /* Network Failure during postcopy */ >> + >> + current_active_state = MIGRATION_STATUS_POSTCOPY_RECOVERY; >> + runstate_set(RUN_STATE_POSTMIGRATE_RECOVERY); >> + fprintf(stderr, "1.1 : Error %s %d\n", strerror(-ret), >> -ret); > > Does the end user really need to see "1.1 :" Just a debugging output. > > >> +++ b/qapi-schema.json >> @@ -154,12 +154,14 @@ >> # @watchdog: the watchdog action is configured to pause and has been >> triggered >> # >> # @guest-panicked: guest has been panicked as a result of guest OS panic >> +# >> +# @postmigrate-recovery: guest is paused for recovery after a network >> failure > > Not your fault that the overall enum is missing an overall line: > > # Since: 1.4 > > nor that guest-panicked is missing a "(since 1.5)" hint, but at least > your addition should have a "(since 2.7)" hint. Added. > >> ## >> { 'enum': 'RunState', >> 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused', >> 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm', >> 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog', >> - 'guest-panicked' ] } >> + 'guest-panicked', 'postmigrate-recovery' ] } > > Adding new enums can cause existing clients like libvirt to do weird > things if they aren't expecting the new state. Are we sure we want to do > it? I think so. If we do not have a new state, then one would not know that the VM is in recovery. > Is it a state that cannot be entered by default, but only in > response to a client request that proves the client is new enough to > expect the new state? I did not quite understand what you are trying to say. > >> >> ## >> # @StatusInfo: >> @@ -434,12 +436,15 @@ >> # >> # @failed: some error occurred during migration process. >> # >> +# @postcopy-recovery: in recovery mode, after a network failure. >> +# > > Missing a "(since 2.7)" hint. Added. > >> # Since: 2.3 >> # >> ## >> { 'enum': 'MigrationStatus', >> 'data': [ 'none', 'setup', 'cancelling', 'cancelled', >> - 'active', 'postcopy-active', 'completed', 'failed' ] } >> + 'active', 'postcopy-active', 'completed', 'failed', >> + 'postcopy-recovery' ] } >> >> ## >> # @MigrationInfo >> @@ -2058,6 +2063,8 @@ >> # >> # @uri: the Uniform Resource Identifier of the destination VM >> # >> +# @recover: #optional recover from a broken migration >> +# > > I don't see any 'recover' parameter added to the 'migrate' command to > match this added documentation. That was a mistake. This was suppose to go to the other patch I posted this same minute. Just missed splitting it out. > >> # @blk: #optional do block migration (full disk copy) >> # >> # @inc: #optional incremental disk copy migration >> diff --git a/vl.c b/vl.c >> index 5fd22cb..c237140 100644 >> --- a/vl.c >> +++ b/vl.c >> @@ -618,6 +618,10 @@ static const RunStateTransition >> runstate_transitions_def[] = { >> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING }, >> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE }, >> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH }, >> + { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE_RECOVERY }, >> + >> + { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_FINISH_MIGRATE }, >> + { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_SHUTDOWN }, >> >> { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING }, >> { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH }, >> > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- With regards, Md Haris Iqbal, Placement Coordinator, MTech IT NITK Surathkal, Contact: +91 8861996962