On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote: > [add migration maintainers] > > On 24.09.24 15:56, Andrey Drobyshev wrote: >> [...] > > I doubt that this a correct way to go. > > As far as I understand, "inactive" actually means that "storage is not > belong to qemu, but to someone else (another qemu process for example), > and may be changed transparently". In turn this means that Qemu should > do nothing with inactive disks. So the problem is that nobody called > bdrv_activate_all on target, and we shouldn't ignore that. > > Hmm, I see in process_incoming_migration_bh() we do call > bdrv_activate_all(), but only in some scenarios. May be, the condition > should be less strict here. > > Why we need any condition here at all? Don't we want to activate > block-layer on target after migration anyway? >
Hmm I'm not sure about the unconditional activation, since we at least have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it in such a case). In current libvirt upstream I see such code: > /* Migration capabilities which should always be enabled as long as they > > * are supported by QEMU. If the capability is supposed to be enabled on both > > * sides of migration, it won't be enabled unless both sides support it. > > */ > > static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] = > { > > {QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER, > > QEMU_MIGRATION_SOURCE}, > > > > {QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE, > > QEMU_MIGRATION_DESTINATION}, > > }; which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set. The code from process_incoming_migration_bh() you're referring to: > /* If capability late_block_activate is set: > > * Only fire up the block code now if we're going to restart the > > * VM, else 'cont' will do it. > > * This causes file locking to happen; so we don't want it to happen > > * unless we really are starting the VM. > > */ > > if (!migrate_late_block_activate() || > > (autostart && (!global_state_received() || > > runstate_is_live(global_state_get_runstate())))) { > > /* Make sure all file formats throw away their mutable metadata. > > > * If we get an error here, just don't restart the VM yet. */ > > bdrv_activate_all(&local_err); > > if (local_err) { > > error_report_err(local_err); > > local_err = NULL; > > autostart = false; > > } > > } It states explicitly that we're either going to start VM right at this point if (autostart == true), or we wait till "cont" command happens. None of this is going to happen if we start another migration while still being in PAUSED state. So I think it seems reasonable to take such case into account. For instance, this patch does prevent the crash: > diff --git a/migration/migration.c b/migration/migration.c > index ae2be31557..3222f6745b 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque) > */ > if (!migrate_late_block_activate() || > (autostart && (!global_state_received() || > - runstate_is_live(global_state_get_runstate())))) { > + runstate_is_live(global_state_get_runstate()))) || > + (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) { > /* Make sure all file formats throw away their mutable metadata. > * If we get an error here, just don't restart the VM yet. */ > bdrv_activate_all(&local_err); What are your thoughts on it? Andrey