* Peter Xu (pet...@redhat.com) wrote: > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote: > > * Peter Xu (pet...@redhat.com) wrote: > > > On Thu, Sep 21, 2017 at 08:29:03PM +0100, Dr. David Alan Gilbert wrote: > > > > * Peter Xu (pet...@redhat.com) wrote: > > > > > When there is IO error on the incoming channel (e.g., network down), > > > > > instead of bailing out immediately, we allow the dst vm to switch to > > > > > the > > > > > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the > > > > > new semaphore, until someone poke it for another attempt. > > > > > > > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > > > > --- > > > > > migration/migration.c | 1 + > > > > > migration/migration.h | 3 +++ > > > > > migration/savevm.c | 60 > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++-- > > > > > migration/trace-events | 2 ++ > > > > > 4 files changed, 64 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > > index 8d26ea8..80de212 100644 > > > > > --- a/migration/migration.c > > > > > +++ b/migration/migration.c > > > > > @@ -146,6 +146,7 @@ MigrationIncomingState > > > > > *migration_incoming_get_current(void) > > > > > memset(&mis_current, 0, sizeof(MigrationIncomingState)); > > > > > qemu_mutex_init(&mis_current.rp_mutex); > > > > > qemu_event_init(&mis_current.main_thread_load_event, false); > > > > > + qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0); > > > > > once = true; > > > > > } > > > > > return &mis_current; > > > > > diff --git a/migration/migration.h b/migration/migration.h > > > > > index 0c957c9..c423682 100644 > > > > > --- a/migration/migration.h > > > > > +++ b/migration/migration.h > > > > > @@ -60,6 +60,9 @@ struct MigrationIncomingState { > > > > > /* The coroutine we should enter (back) after failover */ > > > > > Coroutine *migration_incoming_co; > > > > > QemuSemaphore colo_incoming_sem; > > > > > + > > > > > + /* notify PAUSED postcopy incoming migrations to try to continue > > > > > */ > > > > > + QemuSemaphore postcopy_pause_sem_dst; > > > > > }; > > > > > > > > > > MigrationIncomingState *migration_incoming_get_current(void); > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > index 7172f14..3777124 100644 > > > > > --- a/migration/savevm.c > > > > > +++ b/migration/savevm.c > > > > > @@ -1488,8 +1488,8 @@ static int > > > > > loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis, > > > > > */ > > > > > static void *postcopy_ram_listen_thread(void *opaque) > > > > > { > > > > > - QEMUFile *f = opaque; > > > > > MigrationIncomingState *mis = migration_incoming_get_current(); > > > > > + QEMUFile *f = mis->from_src_file; > > > > > int load_res; > > > > > > > > > > migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > > > > > @@ -1503,6 +1503,14 @@ static void *postcopy_ram_listen_thread(void > > > > > *opaque) > > > > > */ > > > > > qemu_file_set_blocking(f, true); > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > + > > > > > + /* > > > > > + * This is tricky, but, mis->from_src_file can change after it > > > > > + * returns, when postcopy recovery happened. In the future, we > > > > > may > > > > > + * want a wrapper for the QEMUFile handle. > > > > > + */ > > > > > + f = mis->from_src_file; > > > > > + > > > > > /* And non-blocking again so we don't block in any cleanup */ > > > > > qemu_file_set_blocking(f, false); > > > > > > > > > > @@ -1581,7 +1589,7 @@ static int > > > > > loadvm_postcopy_handle_listen(MigrationIncomingState *mis) > > > > > /* Start up the listening thread and wait for it to signal ready > > > > > */ > > > > > qemu_sem_init(&mis->listen_thread_sem, 0); > > > > > qemu_thread_create(&mis->listen_thread, "postcopy/listen", > > > > > - postcopy_ram_listen_thread, > > > > > mis->from_src_file, > > > > > + postcopy_ram_listen_thread, NULL, > > > > > QEMU_THREAD_DETACHED); > > > > > qemu_sem_wait(&mis->listen_thread_sem); > > > > > qemu_sem_destroy(&mis->listen_thread_sem); > > > > > @@ -1966,11 +1974,44 @@ void qemu_loadvm_state_cleanup(void) > > > > > } > > > > > } > > > > > > > > > > +/* Return true if we should continue the migration, or false. */ > > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis) > > > > > +{ > > > > > + trace_postcopy_pause_incoming(); > > > > > + > > > > > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > > > > > + MIGRATION_STATUS_POSTCOPY_PAUSED); > > > > > + > > > > > + assert(mis->from_src_file); > > > > > + qemu_file_shutdown(mis->from_src_file); > > > > > + qemu_fclose(mis->from_src_file); > > > > > + mis->from_src_file = NULL; > > > > > + > > > > > + assert(mis->to_src_file); > > > > > + qemu_mutex_lock(&mis->rp_mutex); > > > > > + qemu_file_shutdown(mis->to_src_file); > > > > > > > > Should you not do the shutdown() before the lock? > > > > For example if the other thread is stuck, with rp_mutex > > > > held, trying to write to to_src_file, then you'll block > > > > waiting for the mutex. If you call shutdown and then take > > > > the lock, the other thread will error and release the lock. > > > > > > The problem is that IMHO QEMUFile is not yet thread-safe itself. So > > > if we operate on it (even to shut it down) logically we need to have > > > the lock, right? > > > > That probably needs fixing for 'shutdown' under the assumption that > > a) No one has or is deleting/freeing the QEMUFile > > b) No one is closing the QEMUFile > > > > The whole point of using shutdown() is it forces any stuck send()'s or > > read()'s to fail rather than staying stuck. > > I see. I just noticed that actually qemu_file_shutdown() is > thread-safe itself - it boils down to the system shutdown() call (as > long as the above assumption is there). > > Let me call qemu_file_shutdown() first before taking the lock to make > sure send()/recv() hang won't happen. > > > > > > Then, IMHO the question would be: when will the send() be stuck in the > > > other thread? > > > > > > Normally the only case I can think of is that source didn't recv() > > > fast enough, and we even consumed all the write buffer in dst side (I > > > don't really know how kernel manages the buffers though, and e.g. how > > > the size of buffer is defined...). > > > > > > But when reach here, the channel (say, from_src_file and to_src_file, > > > since both of them are using the same channel behind the QEMUFile > > > interface) should already be broken in some way, then IIUC even there > > > is a send() in the other thread, it should return at some point with a > > > failure as well, just like how we reached here (possibly due to a > > > read() failure). > > > > We have to be careful about this; a network can fail in a way it > > gets stuck rather than fails - this can get stuck until a full TCP > > disconnection; and that takes about 30mins (from memory). > > The nice thing about using 'shutdown' is that you can kill the existing > > connection if it's hung. (Which then makes an interesting question; > > the rules in your migrate-incoming command become different if you > > want to declare it's failed!). Having said that, you're right that at > > this point stuff has already failed - so do we need the shutdown? > > (You might want to do the shutdown as part of the recovery earlier > > or as a separate command to force the failure) > > I assume if I call shutdown before the lock then we'll be good then.
The question is what happens if you only allow recovery if we're already in postcopy-paused state; in the case of a hung socket, since no IO has actually failed yet, you will still be in postcopy-active. Dave > > > > > > I'm not quite sure what will happen if we end up calling this > > > > before the main thread has been returned from postcopy and the > > > > device loading is complete. > > > > > > IIUC you mean the time starts from when we got MIG_CMD_PACKAGED until > > > main thread finishes handling that package? > > > > Yes. > > > > > Normally I think that should not matter much since during handling the > > > package it should hardly fail (we were reading from a buffer QIO > > > channel, no real IOs there)... > > > > Note that while the main thread is reading the package, the listener > > thread is receiving pages, so you can legally get a failure at that > > point when the fd fails as it's receiving pages at the same time > > as reading the devices. > > (There's an argument that if it fails before you've received all > > your devices then perhaps you can just restart the source) > > Yes. > > > > > > But I agree about the reasoning. How > > > about one more patch to postpone the "active" to "postcopy-active" > > > state change after the package is handled correctly? Like: > > > > > > -------------- > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > index b5c3214034..8317b2a7e2 100644 > > > --- a/migration/savevm.c > > > +++ b/migration/savevm.c > > > @@ -1573,8 +1573,6 @@ static void *postcopy_ram_listen_thread(void > > > *opaque) > > > > > > QEMUFile *f = mis->from_src_file; > > > int load_res; > > > > > > - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > > > - MIGRATION_STATUS_POSTCOPY_ACTIVE); > > > qemu_sem_post(&mis->listen_thread_sem); > > > trace_postcopy_ram_listen_thread_start(); > > > > > > @@ -1817,6 +1815,9 @@ static int > > > loadvm_handle_cmd_packaged(MigrationIncomingState *mis) > > > > > > qemu_fclose(packf); > > > object_unref(OBJECT(bioc)); > > > > > > + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > > > + > > > return ret; > > > } > > > -------------- > > > > > > This function will only be called with "postcopy-active" state. > > > > I *think* that's safe; you've got to be careful, but I can't see > > anyone on the destination that cares about the destinction. > > Indeed, but I'd say that's the best thing I can think of (and the > simplest). Even, not sure whether it'll be more clear if we set > postcopy-active state right before starting the VM on destination, > say, at the beginning of loadvm_postcopy_handle_run_bh(). > > > > > > > Also, at this point have we guaranteed no one else is about > > > > to do an op on mis->to_src_file and will seg? > > > > > > I think no? Since IMHO the main thread is playing with the buffer QIO > > > channel, rather than the real one? > > > > OK. > > > > > (btw, could I ask what's "seg"? :) > > > > just short for segmentation fault; sig 11. > > I see. Thanks! > > > > > Dave > > > > > -- > > > Peter Xu > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK