> -----Original Message----- > From: Lukas Straub [mailto:lukasstra...@web.de] > Sent: Thursday, May 14, 2020 10:31 PM > To: Zhanghailiang <zhang.zhanghaili...@huawei.com> > Cc: qemu-devel <qemu-devel@nongnu.org>; Zhang Chen > <chen.zh...@intel.com>; Juan Quintela <quint...@redhat.com>; Dr. David > Alan Gilbert <dgilb...@redhat.com> > Subject: Re: [PATCH 6/6] migration/colo.c: Move > colo_notify_compares_event to the right place > > On Thu, 14 May 2020 13:27:30 +0000 > Zhanghailiang <zhang.zhanghaili...@huawei.com> wrote: > > > Cc: Zhang Chen <chen.zh...@intel.com> > > > > > > > > If the secondary has to failover during checkpointing, it still is > > > in the old state (i.e. different state than primary). Thus we can't > > > expose the primary state until after the checkpoint is sent. > > > > > > > Hmm, do you mean we should not flush the net packages to client > > connection until checkpointing Process almost success because it may fail > during checkpointing ? > > No. > If the primary fails/crashes during checkpointing, the secondary is still in > different state than the primary because it didn't receive the full > checkpoint. > We can release the miscompared packets only after both primary and > secondary are in the same state. > > Example: > 1. Client opens a TCP connection, sends SYN. > 2. Primary accepts the connection with SYN-ACK, but due to > nondeterministic execution the secondary is delayed. > 3. Checkpoint happens, primary releases the SYN-ACK packet but then > crashes while sending the checkpoint. > 4. The Secondary fails over. At this point it is still in the old state where > it > hasn't sent the SYN-ACK packet. > 5. The client responds with ACK to the SYN-ACK packet. > 6. Because it doesn't know the connection, the secondary responds with RST, > connection reset. >
Good example. For this patch, it is OK, I will add reviewed-by in your origin patch. > Regards, > Lukas Straub > > > > This fixes sporadic connection reset of client connections during > > > failover. > > > > > > Signed-off-by: Lukas Straub <lukasstra...@web.de> > > > --- > > > migration/colo.c | 12 ++++++------ > > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > > > diff --git a/migration/colo.c b/migration/colo.c index > > > a69782efc5..a3fc21e86e 100644 > > > --- a/migration/colo.c > > > +++ b/migration/colo.c > > > @@ -430,12 +430,6 @@ static int > > > colo_do_checkpoint_transaction(MigrationState *s, > > > goto out; > > > } > > > > > > - qemu_event_reset(&s->colo_checkpoint_event); > > > - colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, > > > &local_err); > > > - if (local_err) { > > > - goto out; > > > - } > > > - > > > /* Disable block migration */ > > > migrate_set_block_enabled(false, &local_err); > > > qemu_mutex_lock_iothread(); > > > @@ -494,6 +488,12 @@ static int > > > colo_do_checkpoint_transaction(MigrationState *s, > > > goto out; > > > } > > > > > > + qemu_event_reset(&s->colo_checkpoint_event); > > > + colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, > > > &local_err); > > > + if (local_err) { > > > + goto out; > > > + } > > > + > > > colo_receive_check_message(s->rp_state.from_dst_file, > > > COLO_MESSAGE_VMSTATE_LOADED, > &local_err); > > > if (local_err) { > > > -- > > > 2.20.1