On Thu, Sep 14, 2023 at 08:20:53PM +0000, “William Roche wrote:
> From: William Roche <william.ro...@oracle.com>
> 
> A Qemu VM can survive a memory error, as qemu can relay the error to the
> VM kernel which could also deal with it -- poisoning/off-lining the impacted
> page.
> This situation creates a hole in the VM memory address space that the VM 
> kernel
> knows about (an unreadable page or set of pages).
> 
> But the migration of this VM (live migration through the network or
> pseudo-migration with the creation of a state file) will crash Qemu when
> it sequentially reads the memory address space and stumbles on the
> existing hole.
> 
> In order to correct this problem, I suggest to treat the poisoned pages as if
> they were zero-pages for the migration copy.
> This fix also works with underlying large pages, taking into account the
> RAMBlock segment "page-size".
> This fix is scripts/checkpatch.pl clean.
> 
> v2:
>   - adding compressed transfer handling of poisoned pages
>  
> Testing: I could verify that migration now works with a poisoned page
> through standard and compressed migration with 4k and large (2M) pages.
> 
> The RDMA transfer is not considered by this patch.
> 
> William Roche (1):
>   migration: skip poisoned memory pages on "ram saving" phase

If there's a new version, please consider adding a TODO above
control_save_page() that poison page is probably broken there, so we can
still remember.

Reviewed-by: Peter Xu <pet...@redhat.com>

Copy:

lizhij...@fujitsu.com, lidongc...@tencent.com

Thanks,

-- 
Peter Xu


Reply via email to