On Tue, Oct 15, 2019 at 11:50:08AM +0100, Dr. David Alan Gilbert wrote:
* Jens Freimann (jfreim...@redhat.com) wrote:
On Fri, Oct 11, 2019 at 06:11:33PM +0100, Dr. David Alan Gilbert wrote:
> * Jens Freimann (jfreim...@redhat.com) wrote:
> > This patch adds a new migration state called wait-unplug. It is entered
> > after the SETUP state and will transition into ACTIVE once all devices
> > were succesfully unplugged from the guest.
> >
> > So if a guest doesn't respond or takes long to honor the unplug request
> > the user will see the migration state 'wait-unplug'.
> >
> > In the migration thread we query failover devices if they're are still
> > pending the guest unplug. When all are unplugged the migration
> > continues. We give it a defined number of iterations including small
> > waiting periods before we proceed.
> >
> > Signed-off-by: Jens Freimann <jfreim...@redhat.com>
[..]
> > + while (i < FAILOVER_UNPLUG_RETRIES &&
> > + s->state == MIGRATION_STATUS_WAIT_UNPLUG) {
> > + i++;
> > + qemu_sem_timedwait(&s->wait_unplug_sem,
FAILOVER_GUEST_UNPLUG_WAIT);
> > + all_unplugged = qemu_savevm_state_guest_unplug_pending();
> > + if (all_unplugged) {
> > + break;
> > + }
> > + }
> > +
> > + if (all_unplugged) {
> > + migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
> > + MIGRATION_STATUS_ACTIVE);
> > + } else {
> > + migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
> > + MIGRATION_STATUS_CANCELLING);
> > + }
>
> I think you can get rid of both the timeout and the count and just make
> sure that migrate_cancel works at this point.
I see, I need to add the new state to migration_is_setup_or_active() or
a cancel won't work.
You probably need to do that anyway given all the other places
is_setup_or_active is called.
Yes, done.
> This pushes the problem up a layer, which I think is fine.
Seems good to me. To be clear, you're saying I should just poll on
the device unplugged state? Like
while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
!qemu_savevm_state_guest_unplug_pending()) {
_ /* This block intentionally left blank */
}
I'd keep the qemu_sem_timedwait in there, but with a short time out
(e.g. 250ms say); that way it doesn't eat cpu, but also the cancel still
happens quickly.
Yes, that's what I do now and it works fine.
Thanks!
regards,
Jens