* David Hildenbrand (da...@redhat.com) wrote: > Ordinary memory preallocation runs when QEMU starts up and creates the > memory backends, before processing the incoming migration stream. With > virtio-mem, we don't know which memory blocks to preallocate before > migration started. Now that we migrate the virtio-mem bitmap early, before > migrating any RAM content, we can safely preallocate memory for all plugged > memory blocks before migrating any RAM content. > > This is especially relevant for the following cases: > > (1) User errors > > With hugetlb/files, if we don't have sufficient backend memory available on > the migration destination, we'll crash QEMU (SIGBUS) during RAM migration > when running out of backend memory. Preallocating memory before actual > RAM migration allows for failing gracefully and informing the user about > the setup problem. > > (2) Excluded memory ranges during migration > > For example, virtio-balloon free page hinting will exclude some pages > from getting migrated. In that case, we won't crash during RAM > migration, but later, when running the VM on the destination, which is > bad. > > To fix this for new QEMU machines that migrate the bitmap early, > preallocate the memory early, before any RAM migration. Warn with old > QEMU machines. > > Getting postcopy right is a bit tricky, but we essentially now implement > the same (problematic) preallocation logic as ordinary preallocation: > preallocate memory early and discard it again before precopy starts. During > ordinary preallocation, discarding of RAM happens when postcopy is advised. > As the state (bitmap) is loaded after postcopy was advised but before > postcopy starts listening, we have to discard memory we preallocated > immediately again ourselves. > > Note that nothing (not even hugetlb reservations) guarantees for postcopy > that backend memory (especially, hugetlb pages) are still free after they > were freed ones while discarding RAM. Still, allocating that memory at > least once helps catching some basic setup problems. > > Before this change, trying to restore a VM when insufficient hugetlb > pages are around results in the process crashing to to a "Bus error" > (SIGBUS). With this change, QEMU fails gracefully: > > qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad > address > qemu-system-x86_64: error while loading state for instance 0x0 of device > '0000:00:03.0/virtio-mem-device-early' > qemu-system-x86_64: load of migration failed: Cannot allocate memory > > And we can even introspect the early migration data, including the > bitmap: > $ ./scripts/analyze-migration.py -f STATEFILE > { > "ram (2)": { > "section sizes": { > "0000:00:03.0/mem0": "0x0000000780000000", > "0000:00:04.0/mem1": "0x0000000780000000", > "pc.ram": "0x0000000100000000", > "/rom@etc/acpi/tables": "0x0000000000020000", > "pc.bios": "0x0000000000040000", > "0000:00:02.0/e1000.rom": "0x0000000000040000", > "pc.rom": "0x0000000000020000", > "/rom@etc/table-loader": "0x0000000000001000", > "/rom@etc/acpi/rsdp": "0x0000000000001000" > } > }, > "0000:00:03.0/virtio-mem-device-early (51)": { > "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 > 20 00 00 00 00 00 00", > "size": "0x0000000040000000", > "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...] > }, > "0000:00:04.0/virtio-mem-device-early (53)": { > "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 > 20 00 00 00 00 00 00", > "size": "0x00000001fa400000", > "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...] > }, > [...] > > Reported-by: Jing Qi <ji...@redhat.com> > Signed-off-by: David Hildenbrand <da...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > --- > hw/virtio/virtio-mem.c | 87 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 87 insertions(+) > > diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c > index 51666baa01..4c3720249c 100644 > --- a/hw/virtio/virtio-mem.c > +++ b/hw/virtio/virtio-mem.c > @@ -204,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const > VirtIOMEM *vmem, void *arg, > return ret; > } > > +static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void > *arg, > + virtio_mem_range_cb cb) > +{ > + unsigned long first_bit, last_bit; > + uint64_t offset, size; > + int ret = 0; > + > + first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size); > + while (first_bit < vmem->bitmap_size) { > + offset = first_bit * vmem->block_size; > + last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, > + first_bit + 1) - 1; > + size = (last_bit - first_bit + 1) * vmem->block_size; > + > + ret = cb(vmem, arg, offset, size); > + if (ret) { > + break; > + } > + first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, > + last_bit + 2); > + } > + return ret; > +} > + > /* > * Adjust the memory section to cover the intersection with the given range. > * > @@ -938,6 +962,10 @@ static int virtio_mem_post_load(void *opaque, int > version_id) > RamDiscardListener *rdl; > int ret; > > + if (vmem->prealloc && !vmem->early_migration) { > + warn_report("Proper preallocation with migration requires a newer > QEMU machine"); > + } > + > /* > * We started out with all memory discarded and our memory region is > mapped > * into an address space. Replay, now that we updated the bitmap. > @@ -957,6 +985,64 @@ static int virtio_mem_post_load(void *opaque, int > version_id) > return virtio_mem_restore_unplugged(vmem); > } > > +static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg, > + uint64_t offset, uint64_t size) > +{ > + void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset; > + int fd = memory_region_get_fd(&vmem->memdev->mr); > + Error *local_err = NULL; > + > + qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err); > + if (local_err) { > + error_report_err(local_err); > + return -ENOMEM; > + } > + return 0; > +} > + > +static int virtio_mem_post_load_early(void *opaque, int version_id) > +{ > + VirtIOMEM *vmem = VIRTIO_MEM(opaque); > + RAMBlock *rb = vmem->memdev->mr.ram_block; > + int ret; > + > + if (!vmem->prealloc) { > + return 0; > + } > + > + /* > + * We restored the bitmap and verified that the basic properties > + * match on source and destination, so we can go ahead and preallocate > + * memory for all plugged memory blocks, before actual RAM migration > starts > + * touching this memory. > + */ > + ret = virtio_mem_for_each_plugged_range(vmem, NULL, > + virtio_mem_prealloc_range_cb); > + if (ret) { > + return ret; > + } > + > + /* > + * This is tricky: postcopy wants to start with a clean slate. On > + * POSTCOPY_INCOMING_ADVISE, postcopy code discards all (ordinarily > + * preallocated) RAM such that postcopy will work as expected later. > + * > + * However, we run after POSTCOPY_INCOMING_ADVISE -- but before actual > + * RAM migration. So let's discard all memory again. This looks like an > + * expensive NOP, but actually serves a purpose: we made sure that we > + * were able to allocate all required backend memory once. We cannot > + * guarantee that the backend memory we will free will remain free > + * until we need it during postcopy, but at least we can catch the > + * obvious setup issues this way. > + */ > + if (migration_incoming_postcopy_advised()) { > + if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) { > + return -EBUSY; > + } > + } > + return 0; > +} > + > typedef struct VirtIOMEMMigSanityChecks { > VirtIOMEM *parent; > uint64_t addr; > @@ -1068,6 +1154,7 @@ static const VMStateDescription > vmstate_virtio_mem_device_early = { > .minimum_version_id = 1, > .version_id = 1, > .immutable = 1, > + .post_load = virtio_mem_post_load_early, > .fields = (VMStateField[]) { > VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks, > vmstate_virtio_mem_sanity_checks), > -- > 2.39.0 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK