Il 06/11/2013 02:50, Zhanghaoyu (A) ha scritto: >>>>> Avoid starting a new migration task while the previous one still >>>> exist. >>>> >>>> Can you explain how to reproduce the problem? >>>> >>> When network disconnection between source and destination happened, >>> the migration thread stuck at below stack, >>> Then I cancel the migration task, the migration state in qemu will be set >>> to MIG_STATE_CANCELLED, so the migration job in libvirt quits. >>> Then I perform migration again, at this time, the network reconnected >>> successfully, since the TCP timeout retransmission, above stack will not >>> return immediately, so two migration tasks exist at the same time. >>> And still worse, source qemu will crash, because of accessing the NULL >>> pointer in qemu_bh_schedule(s->cleanup_bh); statement in latter migration >>> task, since the "s->cleanup_bh" had been deleted by previous migration task. >> >> Thanks for explaining. CANCELLING looks like a useful addition. >> >> Why do you need both CANCELLING and COMPLETING? The COMPLETED state should >> be set only after all I/O is done. > > There is a period of time from the timing of setting COMPLETED state to that > of migration task exits, > so it's problematic in COMPLETED->CANCELLED transition, but if applying your > below proposal, the problem gone. > > do { > old_state = s->state; > if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) { > break; > } > migrate_set_state(s, old_state, MIG_STATE_CANCELLED); > } while (s->state != MIG_STATE_CANCELLED);
Ok. >> I agree with Eric that the CANCELLING state should not be exposed via QMP. >> "info migrate" and "query-migrate" can keep showing "active" for maximum >> backwards compatibility. >> >> More comments below. >> >> >>> - if (s->state != MIG_STATE_COMPLETED) { >>> + if (s->state != MIG_STATE_COMPLETING) { >>> qemu_savevm_state_cancel(); >>> + if (s->state == MIG_STATE_CANCELLING) { >>> + migrate_set_state(s, MIG_STATE_CANCELLING, >>> MIG_STATE_CANCELLED); >>> + } >> >> I think you can remove the "if" and unconditionally call migrate_set_state. > > Do you mean to remove the "if (s->state == MIG_STATE_CANCELLING)" ? > The s->state probably is MIG_STATE_ERROR here, is it okay to unconditionally > call migrate_set_state? migrate_set_state has atomic_cmpxchg so it has an "implicit" if, but you're right it's clearer this way. Paolo > Thanks, > Zhang Haoyu > >> >>> + }else { >>> + migrate_set_state(s, MIG_STATE_COMPLETING, >>> + MIG_STATE_COMPLETED); >>> } >>> >>> notifier_list_notify(&migration_state_notifiers, s); } >>> >>> -static void migrate_set_state(MigrationState *s, int old_state, int >>> new_state) -{ >>> - if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) { >>> - trace_migrate_set_state(new_state); >>> - } >>> -} >>> - >>> void migrate_fd_error(MigrationState *s) { >>> DPRINTF("setting error state\n"); @@ -328,7 +337,7 @@ static void >>> migrate_fd_cancel(MigrationState *s) { >>> DPRINTF("cancelling migration\n"); >>> >>> - migrate_set_state(s, s->state, MIG_STATE_CANCELLED); >>> + migrate_set_state(s, s->state, MIG_STATE_CANCELLING); >> >> Here probably we want something like >> >> do { >> old_state = s->state; >> if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) { >> break; >> } >> migrate_set_state(s, old_state, MIG_STATE_CANCELLING); >> } while (s->state != MIG_STATE_CANCELLING); >> >> to avoid a bogus COMPLETED->CANCELLED transition. Please separate the patch >> in two parts: >> >> (1) the first uses the above code, with CANCELLED instead of CANCELLING >> >> (2) the second, similar to the one you have posted, introduces the new >> CANCELLING state >> >> Thanks, >> >> Paolo