On Thu, Feb 20, 2020 at 04:16:02PM +0100, David Hildenbrand wrote: > On 19.02.20 17:17, David Hildenbrand wrote: > > Resizing while migrating is dangerous and does not work as expected. > > The whole migration code works on the usable_length of ram blocks and does > > not expect this to change at random points in time. > > > > In the case of precopy, the ram block size must not change on the source, > > after syncing the RAM block list in ram_save_setup(), so as long as the > > guest is still running on the source. > > > > Resizing can be trigger *after* (but not during) a reset in > > ACPI code by the guest > > - hw/arm/virt-acpi-build.c:acpi_ram_update() > > - hw/i386/acpi-build.c:acpi_ram_update() > > > > Use the ram block notifier to get notified about resizes. Let's simply > > cancel migration and indicate the reason. We'll continue running on the > > source. No harm done. > > > > Update the documentation. Postcopy will be handled separately. > > > > Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > Cc: Juan Quintela <quint...@redhat.com> > > Cc: Eduardo Habkost <ehabk...@redhat.com> > > Cc: Paolo Bonzini <pbonz...@redhat.com> > > Cc: Igor Mammedov <imamm...@redhat.com> > > Cc: "Michael S. Tsirkin" <m...@redhat.com> > > Cc: Richard Henderson <richard.hender...@linaro.org> > > Cc: Shannon Zhao <shannon.z...@linaro.org> > > Cc: Alex Bennée <alex.ben...@linaro.org> > > Cc: Peter Xu <pet...@redhat.com> > > Signed-off-by: David Hildenbrand <da...@redhat.com> > > --- > > exec.c | 5 +++-- > > include/exec/memory.h | 10 ++++++---- > > migration/migration.c | 9 +++++++-- > > migration/migration.h | 1 + > > migration/ram.c | 41 +++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 58 insertions(+), 8 deletions(-) > > > > diff --git a/exec.c b/exec.c > > index b75250e773..8b015821d6 100644 > > --- a/exec.c > > +++ b/exec.c > > @@ -2120,8 +2120,9 @@ static int memory_try_enable_merging(void *addr, > > size_t len) > > return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE); > > } > > > > -/* Only legal before guest might have detected the memory size: e.g. on > > - * incoming migration, or right after reset. > > +/* > > + * Resizing RAM while migrating can result in the migration being canceled. > > + * Care has to be taken if the guest might have already detected the > > memory. > > * > > * As memory core doesn't know how is memory accessed, it is up to > > * resize callback to update device state and/or add assertions to detect > > diff --git a/include/exec/memory.h b/include/exec/memory.h > > index e85b7de99a..de111347e8 100644 > > --- a/include/exec/memory.h > > +++ b/include/exec/memory.h > > @@ -113,7 +113,7 @@ typedef struct IOMMUNotifier IOMMUNotifier; > > #define RAM_SHARED (1 << 1) > > > > /* Only a portion of RAM (used_length) is actually used, and migrated. > > - * This used_length size can change across reboots. > > + * Resizing RAM while migrating can result in the migration being canceled. > > */ > > #define RAM_RESIZEABLE (1 << 2) > > > > @@ -843,7 +843,9 @@ void > > memory_region_init_ram_shared_nomigrate(MemoryRegion *mr, > > * RAM. Accesses into the region will > > * modify memory directly. Only an > > initial > > * portion of this RAM is actually > > used. > > - * The used size can change across > > reboots. > > + * Changing the size while migrating > > + * can result in the migration being > > + * canceled. > > * > > * @mr: the #MemoryRegion to be initialized. > > * @owner: the object that tracks the region's reference count > > @@ -1464,8 +1466,8 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr); > > > > /* memory_region_ram_resize: Resize a RAM region. > > * > > - * Only legal before guest might have detected the memory size: e.g. on > > - * incoming migration, or right after reset. > > + * Resizing RAM while migrating can result in the migration being canceled. > > + * Care has to be taken if the guest might have already detected the > > memory. > > * > > * @mr: a memory region created with @memory_region_init_resizeable_ram. > > * @newsize: the new size the region > > diff --git a/migration/migration.c b/migration/migration.c > > index 8fb68795dc..ac9751dbe5 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -175,13 +175,18 @@ void migration_object_init(void) > > } > > } > > > > +void migration_cancel(void) > > +{ > > + migrate_fd_cancel(current_migration); > > +} > > + > > void migration_shutdown(void) > > { > > /* > > * Cancel the current migration - that will (eventually) > > * stop the migration using this structure > > */ > > - migrate_fd_cancel(current_migration); > > + migration_cancel(); > > object_unref(OBJECT(current_migration)); > > } > > > > @@ -2019,7 +2024,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool > > blk, > > > > void qmp_migrate_cancel(Error **errp) > > { > > - migrate_fd_cancel(migrate_get_current()); > > + migration_cancel(); > > } > > > > void qmp_migrate_continue(MigrationStatus state, Error **errp) > > diff --git a/migration/migration.h b/migration/migration.h > > index 8473ddfc88..79fd74afa5 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -343,5 +343,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, > > void *opaque); > > void migration_make_urgent_request(void); > > void migration_consume_urgent_request(void); > > bool migration_rate_limit(void); > > +void migration_cancel(void); > > > > #endif > > diff --git a/migration/ram.c b/migration/ram.c > > index ed23ed1c7c..57f32011a3 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -52,6 +52,7 @@ > > #include "migration/colo.h" > > #include "block.h" > > #include "sysemu/sysemu.h" > > +#include "sysemu/runstate.h" > > #include "savevm.h" > > #include "qemu/iov.h" > > #include "multifd.h" > > @@ -3710,8 +3711,48 @@ static SaveVMHandlers savevm_ram_handlers = { > > .resume_prepare = ram_resume_prepare, > > }; > > > > +static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, > > + size_t old_size, size_t new_size) > > +{ > > + ram_addr_t offset; > > + Error *err = NULL; > > + RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset); > > + > > + if (ramblock_is_ignored(rb)) { > > + return; > > + } > > + > > + /* > > + * Some resizes are triggered on the migration target by precopy code, > > + * when synchronizing RAM block sizes. In these cases, the VM is not > > + * running and migration is not idle. We have to ignore these resizes, > > + * as we only care about resizes during precopy on the migration > > source. > > + * This handler is always registered, so ignore when migration is idle. > > + */ > > + if (migration_is_idle() || !runstate_is_running() ||
So I noticed that I mis-misread the code after chat with Dave... migration_is_idle() should only return false if on the source and only if during migration. Destination should still return true for that (destination VM reads state from MigrationIncomingState.state instead). With that, I think we can drop the confusing !runstate_is_running() check because migration_is_idle() will cover that (and touch up the comment too)? > > + postcopy_is_running()) { > > + return; > > + } > > + > > + /* > > + * Precopy code cannot deal with the size of ram blocks changing at > > + * random points in time. We're still running on the source, abort > > + * the migration and continue running here. Make sure to wait until > > + * migration was canceled. > > + */ > > + error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr); > > + migrate_set_error(migrate_get_current(), err); > > + error_free(err); > > + migration_cancel(); > > +} > > + > > +static RAMBlockNotifier ram_mig_ram_notifier = { > > + .ram_block_resized = ram_mig_ram_block_resized, > > +}; > > + > > void ram_mig_init(void) > > { > > qemu_mutex_init(&XBZRLE.lock); > > register_savevm_live("ram", 0, 4, &savevm_ram_handlers, &ram_state); > > + ram_block_notifier_add(&ram_mig_ram_notifier); > > } > > > > So, this seems to work very reliably when triggering a resize of a RAM > block during system reset (using my virtio-mem prototype): > > (qemu) info migrate > globals: > store-global-state: on > only-migratable: off > send-configuration: on > send-section-footer: on > decompress-error-check: on > clear-bitmap-shift: 18 > Migration status: cancelled > total time: 0 milliseconds > > > And from QEMU > > qemu-system-x86_64: RAM block '0000:00:03.0/mem1' resized during precopy. Good news! Thanks, -- Peter Xu