From: William Roche <william.ro...@oracle.com> Problem: -------- A Qemu VM can survive a memory error, as qemu can relay the error to the VM kernel which could also deal with it -- poisoning/off-lining the impacted page. This situation creates a hole in the VM memory address space (an unreadable page or set of pages).
A migration request of this VM (live migration through the network or pseudo-migration with the creation of a state file) will crash Qemu when it sequentially reads the memory address space and stumbles on the existing hole. New fix proposal: ----------------- Let's prevent the migration when we know that there is a poison page in the VM address space. History: -------- My first fix proposal for this crash condition (latest version: https://lore.kernel.org/all/20231106220319.456765-1-william.ro...@oracle.com/ ) relied on a well behaving kernel to guaranty that a known poison page is not accessed. It introduced an ARM platform specificity. I haven't received any feedback about the ARM specificity to avoid a possible memory corruption after a migration transforming a poisoned page into an all zero page. I also accept that when a memory error leads to memory poisoning, this platform functionality has to be honored as long as a physical platform would provide it. Peter asked for a complete correction of this problem (transfering the memory holes information with the migration and recreating these holes on the destination platform). In the meantime, this is a very small fix to avoid the current crash situation reading the poisoned memory pages. I'm simply preventing the migration when we know that it would crash, when there is a poisoned page in the VM address space. This is a generic protection code, avoiding a crash condition and reporting the following error message: "Error: Can't migrate this vm with hardware poisoned memory, please reboot the vm and try again" instead of crashing the VM. This fix is scripts/checkpatch.pl clean. Unit tested on ARM and x86. William Roche (1): migration: prevent migration when VM has poisoned memory accel/kvm/kvm-all.c | 10 ++++++++++ accel/stubs/kvm-stub.c | 5 +++++ include/sysemu/kvm.h | 6 ++++++ migration/migration.c | 7 +++++++ 4 files changed, 28 insertions(+) -- 2.39.3