On Wed, Sep 06, 2023 at 03:19:32PM +0100, Joao Martins wrote: > On 06/09/2023 14:59, “William Roche wrote: > > From: William Roche <william.ro...@oracle.com> > > > > A memory page poisoned from the hypervisor level is no longer readable. > > Thus, it is now treated as a zero-page for the ram saving migration phase. > > > > The migration of a VM will crash Qemu when it tries to read the > > memory address space and stumbles on the poisoned page with a similar > > stack trace: > > > > Program terminated with signal SIGBUS, Bus error. > > #0 _mm256_loadu_si256 > > #1 buffer_zero_avx2 > > #2 select_accel_fn > > #3 buffer_is_zero > > #4 save_zero_page_to_file > > #5 save_zero_page > > #6 ram_save_target_page_legacy > > #7 ram_save_host_page > > #8 ram_find_and_save_block > > #9 ram_save_iterate > > #10 qemu_savevm_state_iterate > > #11 migration_iteration_run > > #12 migration_thread > > #13 qemu_thread_start > > > > Fix it by considering poisoned pages as if they were zero-pages for > > the migration copy. This fix also works with underlying large pages, > > taking into account the RAMBlock segment "page-size". > > > > Signed-off-by: William Roche <william.ro...@oracle.com> > > You forgot to CC the maintainers; Adding them now > > ./scripts/get_maintainer.pl is your friend for the next version :) > > > --- > > accel/kvm/kvm-all.c | 14 ++++++++++++++ > > accel/stubs/kvm-stub.c | 5 +++++ > > include/sysemu/kvm.h | 10 ++++++++++ > > migration/ram.c | 3 ++- > > 4 files changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index 2ba7521695..24a7709495 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -1152,6 +1152,20 @@ static void kvm_unpoison_all(void *param) > > } > > } > > > > +bool kvm_hwpoisoned_page(RAMBlock *block, void *offset) > > +{ > > + HWPoisonPage *pg; > > + ram_addr_t ram_addr = (ram_addr_t) offset; > > + > > + QLIST_FOREACH(pg, &hwpoison_page_list, list) { > > + if ((ram_addr >= pg->ram_addr) && > > + (ram_addr - pg->ram_addr < block->page_size)) {
Just a note.. Probably fine for now to reuse block page size, but IIUC the right thing to do is to fetch it from the signal info (in QEMU's sigbus_handler()) of kernel_siginfo.si_addr_lsb. At least for x86 I think that stores the "shift" of covered poisoned page (one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a huge page, though.. not aware of any man page for that). It'll then work naturally when Linux huge pages will start to support sub-huge-page-size poisoning someday. We can definitely leave that for later. > > + return true; > > + } > > + } > > + return false; > > +} > > + > > void kvm_hwpoison_page_add(ram_addr_t ram_addr) > > { > > HWPoisonPage *page; > > diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c > > index 235dc661bc..c0a31611df 100644 > > --- a/accel/stubs/kvm-stub.c > > +++ b/accel/stubs/kvm-stub.c > > @@ -133,3 +133,8 @@ uint32_t kvm_dirty_ring_size(void) > > { > > return 0; > > } > > + > > +bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr) > > +{ > > + return false; > > +} > > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h > > index ebdca41052..a2196e9e6b 100644 > > --- a/include/sysemu/kvm.h > > +++ b/include/sysemu/kvm.h > > @@ -580,4 +580,14 @@ bool kvm_arch_cpu_check_are_resettable(void); > > bool kvm_dirty_ring_enabled(void); > > > > uint32_t kvm_dirty_ring_size(void); > > + > > +/** > > + * kvm_hwpoisoned_page - indicate if the given page is poisoned > > + * @block: memory block of the given page > > + * @ram_addr: offset of the page > > + * > > + * Returns: true: page is poisoned > > + * false: page not yet poisoned > > + */ > > +bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr); > > #endif > > diff --git a/migration/ram.c b/migration/ram.c > > index 9040d66e61..48d875b12d 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus > > *pss, QEMUFile *file, > > uint8_t *p = block->host + offset; > > int len = 0; > > > > - if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { > > + if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) || Can we move this out of zero page handling? Zero detection is not guaranteed to always be the 1st thing to do when processing a guest page. Currently it'll already skip either rdma or when compression enabled, so it'll keep crashing there. Perhaps at the entry of ram_save_target_page_legacy()? > > + buffer_is_zero(p, TARGET_PAGE_SIZE)) { > > len += save_page_header(pss, file, block, offset | > > RAM_SAVE_FLAG_ZERO); > > qemu_put_byte(file, 0); > > len += 1; > -- Peter Xu