On 2011-02-09 04:00, Huang Ying wrote: > In Linux kernel HWPoison processing implementation, the virtual > address in processes mapping the error physical memory page is marked > as HWPoison. So that, the further accessing to the virtual > address will kill corresponding processes with SIGBUS. > > If the error physical memory page is used by a KVM guest, the SIGBUS > will be sent to QEMU, and QEMU will simulate a MCE to report that > memory error to the guest OS. If the guest OS can not recover from > the error (for example, the page is accessed by kernel code), guest OS > will reboot the system. But because the underlying host virtual > address backing the guest physical memory is still poisoned, if the > guest system accesses the corresponding guest physical memory even > after rebooting, the SIGBUS will still be sent to QEMU and MCE will be > simulated. That is, guest system can not recover via rebooting.
Yeah, saw this already during my test... > > In fact, across rebooting, the contents of guest physical memory page > need not to be kept. We can allocate a new host physical page to > back the corresponding guest physical address. I just wondering what would be architecturally suboptimal if we simply remapped on SIGBUS directly. Would save us at least the bookkeeping. > > This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap() > to clear the corresponding page table entry, so that make it possible > to allocate a new page to recover the issue. > > Signed-off-by: Huang Ying <ying.hu...@intel.com> > --- > target-i386/kvm.c | 39 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 39 insertions(+) > > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -508,6 +508,42 @@ static int kvm_get_supported_msrs(KVMSta > return ret; > } > > +struct HWPoisonPage; > +typedef struct HWPoisonPage HWPoisonPage; > +struct HWPoisonPage > +{ > + ram_addr_t ram_addr; > + QLIST_ENTRY(HWPoisonPage) list; > +}; > + > +static QLIST_HEAD(hwpoison_page_list, HWPoisonPage) hwpoison_page_list = > + QLIST_HEAD_INITIALIZER(hwpoison_page_list); > + > +static void kvm_unpoison_all(void *param) > +{ > + HWPoisonPage *page, *next_page; > + > + QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) { > + QLIST_REMOVE(page, list); > + qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE); > + qemu_free(page); > + } > +} > + > +static void kvm_hwpoison_page_add(ram_addr_t ram_addr) > +{ > + HWPoisonPage *page; > + > + QLIST_FOREACH(page, &hwpoison_page_list, list) { > + if (page->ram_addr == ram_addr) > + return; > + } > + > + page = qemu_malloc(sizeof(HWPoisonPage)); > + page->ram_addr = ram_addr; > + QLIST_INSERT_HEAD(&hwpoison_page_list, page, list); > +} > + > int kvm_arch_init(KVMState *s) > { > uint64_t identity_base = 0xfffbc000; > @@ -556,6 +592,7 @@ int kvm_arch_init(KVMState *s) > fprintf(stderr, "e820_add_entry() table is full\n"); > return ret; > } > + qemu_register_reset(kvm_unpoison_all, NULL); > > return 0; > } > @@ -1882,6 +1919,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *en > hardware_memory_error(); > } > } > + kvm_hwpoison_page_add(ram_addr); > > if (code == BUS_MCEERR_AR) { > /* Fake an Intel architectural Data Load SRAR UCR */ > @@ -1926,6 +1964,7 @@ int kvm_arch_on_sigbus(int code, void *a > "QEMU itself instead of guest system!: %p\n", addr); > return 0; > } > + kvm_hwpoison_page_add(ram_addr); > kvm_mce_inj_srao_memscrub2(first_cpu, paddr); > } else > #endif > > Looks fine otherwise. Unless that simplification makes sense, I could offer to include this into my MCE rework (there is some minor conflict). If all goes well, that series should be posted during this week. Jan
signature.asc
Description: OpenPGP digital signature