Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Hi gengdongjiu, On 05/04/17 00:05, gengdongjiu wrote: > thanks for the patch, have you consider to told Qemu or KVM tools > the reason for this bus error(SEA/SEI)? They should never need to know. We should treat Qemu/kvmtool like any other program. Programs should only need to know about the affect on them, not the underlying reason or mechanism. > when Qemu or KVM tools get this SIGBUS signal, it do not know receive > this SIGBUS due to SEA or SEI. Why would this matter? Firmware signalled Linux that something bad happened. Linux handles the problem and everything keeps running. The interface with firmware has to be architecture specific. When signalling user-space it should be architecture agnostic, otherwise we can't write portable user space code. If Qemu was affected by the error (currently only if some of its memory was hwpoisoned) we send it SIGBUS as we would for any other program. Qemu can choose if and how to signal the guest about this error, it doesn't have to use the same interface as firmware and the host used. With TCG Qemu may be emulating a totally different architecture! Looking at the list of errors in table 250 of UEFI 2.6, cache-errors are the only case I can imagine we would want to report to a guest, these are effectively transient memory errors. SIGBUS is still appropriate here, but we probably need a new si_code value to indicate the error can be cleared. (unlike hwpoison which appears to never re-use the affected page). Thanks, James ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Hi James, thanks for the patch, have you consider to told Qemu or KVM tools the reason for this bus error(SEA/SEI)? when Qemu or KVM tools get this SIGBUS signal, it do not know receive this SIGBUS due to SEA or SEI. OR KVM only send this SIGBUS when encounter SEA? if so, for the SEI case, how to let Qemu simulate to generate CPER for guest OS SEI. 2017-03-16 0:07 GMT+08:00 James Morse: > Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for > broken memory can call memory_failure() in mm/memory-failure.c to deliver > SIGBUS to any user space process using the page, and notify all the > in-kernel users. > > If the page corresponded with guest memory, KVM will unmap this page > from its stage2 page tables. The user space process that allocated > this memory may have never touched this page in which case it may not > be mapped meaning SIGBUS won't be delivered. > > When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it > comes to process the stage2 fault. > > Do as x86 does, and deliver the SIGBUS when we discover > KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb > as this matches the user space mapping size. > > Signed-off-by: James Morse > CC: gengdongjiu > --- > Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns > EFAULT. > QEMU: error: kvm run failed Bad address > LVKM: KVM_RUN failed: Bad address > > With this patch both kvmtool and Qemu receive SIGBUS ... and then exit. > In the future Qemu can use this signal to notify the guest, for more details > see hwpoison[1]. > > [0] https://www.spinics.net/lists/arm-kernel/msg560009.html > [1] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt > > > arch/arm/kvm/mmu.c | 23 +++ > 1 file changed, 23 insertions(+) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 962616fd4ddd..9d1aa294e88f 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -20,8 +20,10 @@ > #include > #include > #include > +#include > #include > #include > +#include > #include > #include > #include > @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu > *vcpu, kvm_pfn_t pfn, > __coherent_cache_guest_page(vcpu, pfn, size); > } > > +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) > +{ > + siginfo_t info; > + > + info.si_signo = SIGBUS; > + info.si_errno = 0; > + info.si_code= BUS_MCEERR_AR; > + info.si_addr= (void __user *)address; > + > + if (hugetlb) > + info.si_addr_lsb = PMD_SHIFT; > + else > + info.si_addr_lsb = PAGE_SHIFT; > + > + send_sig_info(SIGBUS, , current); > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_memory_slot *memslot, unsigned long hva, > unsigned long fault_status) > @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > smp_rmb(); > > pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); > + if (pfn == KVM_PFN_ERR_HWPOISON) { > + kvm_send_hwpoison_signal(hva, hugetlb); > + return 0; > + } > if (is_error_noslot_pfn(pfn)) > return -EFAULT; > > -- > 2.10.1 > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
On Tue, Mar 28, 2017 at 03:50:51PM +0100, Punit Agrawal wrote: > Christoffer Dallwrites: > > > On Mon, Mar 27, 2017 at 02:31:44PM +0100, Punit Agrawal wrote: > >> Christoffer Dall writes: > >> > >> > On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: > >> >> Hi guys, > >> >> > >> >> On 27/03/17 12:20, Punit Agrawal wrote: > >> >> > Christoffer Dall writes: > >> >> >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: > >> >> >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], > >> >> >>> notifications for > >> >> >>> broken memory can call memory_failure() in mm/memory-failure.c to > >> >> >>> deliver > >> >> >>> SIGBUS to any user space process using the page, and notify all the > >> >> >>> in-kernel users. > >> >> >>> > >> >> >>> If the page corresponded with guest memory, KVM will unmap this page > >> >> >>> from its stage2 page tables. The user space process that allocated > >> >> >>> this memory may have never touched this page in which case it may > >> >> >>> not > >> >> >>> be mapped meaning SIGBUS won't be delivered. > >> >> >>> > >> >> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it > >> >> >>> comes to process the stage2 fault. > >> >> >>> > >> >> >>> Do as x86 does, and deliver the SIGBUS when we discover > >> >> >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb > >> >> >>> as this matches the user space mapping size. > >> >> > >> >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >> >> >>> index 962616fd4ddd..9d1aa294e88f 100644 > >> >> >>> --- a/arch/arm/kvm/mmu.c > >> >> >>> +++ b/arch/arm/kvm/mmu.c > >> >> >>> @@ -20,8 +20,10 @@ > >> >> >>> #include > >> >> >>> #include > >> >> >>> #include > >> >> >>> +#include > >> >> >>> #include > >> >> >>> #include > >> >> >>> +#include > >> >> >>> #include > >> >> >>> #include > >> >> >>> #include > >> >> >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct > >> >> >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, > >> >> >>> __coherent_cache_guest_page(vcpu, pfn, size); > >> >> >>> } > >> >> >>> > >> >> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool > >> >> >>> hugetlb) > >> >> >>> +{ > >> >> >>> + siginfo_t info; > >> >> >>> + > >> >> >>> + info.si_signo = SIGBUS; > >> >> >>> + info.si_errno = 0; > >> >> >>> + info.si_code= BUS_MCEERR_AR; > >> >> >>> + info.si_addr= (void __user *)address; > >> >> >>> + > >> >> >>> + if (hugetlb) > >> >> >>> + info.si_addr_lsb = PMD_SHIFT; > >> >> >>> + else > >> >> >>> + info.si_addr_lsb = PAGE_SHIFT; > >> >> >>> + > >> >> >>> + send_sig_info(SIGBUS, , current); > >> >> >>> +} > >> >> >>> + > >> >> >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t > >> >> >>> fault_ipa, > >> >> >>> struct kvm_memory_slot *memslot, unsigned > >> >> >>> long hva, > >> >> >>> unsigned long fault_status) > >> >> >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu > >> >> >>> *vcpu, phys_addr_t fault_ipa, > >> >> >>> smp_rmb(); > >> >> >>> > >> >> >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); > >> >> >>> + if (pfn == KVM_PFN_ERR_HWPOISON) { > >> >> >>> + kvm_send_hwpoison_signal(hva, hugetlb); > >> >> >> > >> >> >> The way this is called means that we'll only notify userspace of a > >> >> >> huge > >> >> >> mapping if userspace is mapping hugetlbfs, and not because the stage2 > >> >> >> mapping may or may not have used transparent huge pages when the > >> >> >> error > >> >> >> was discovered. Is this the desired semantics? > >> >> > >> >> No, > >> >> > >> >> > >> >> > I think so. > >> >> > > >> >> > AFAIUI, transparent hugepages are split before being poisoned while > >> >> > all > >> >> > the underlying pages of a hugepage are poisoned together, i.e., no > >> >> > splitting. > >> >> > >> >> In which case I need to look into this some more! > >> >> > >> >> My thinking was we should report the size that was knocked out of the > >> >> stage2 to > >> >> avoid the guest repeatedly faulting until it has touched every > >> >> guest-page-size > >> >> in the stage2 hole. > >> > > >> > By signaling something at the fault path, I think it's going to be very > >> > hard to backtrack how the stage 2 page tables looked like when faults > >> > started happening, because I think these are completely decoupled events > >> > (the mmu notifier and the later fault). > >> > > >> >> > >> >> Reading the code in that kvm/mmu.c it looked like the mapping sizes > >> >> would always > >> >> be the same as those used by userspace. > >> > > >> > I think the mapping sizes should be the same between userspace and KVM, > >> > but the mapping size of a particular page (and associated pages) may > >> > vary over time. > >> > >> Stage 1 and Stage 2 support different hugepage sizes. A larger size > >>
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Christoffer Dallwrites: > On Mon, Mar 27, 2017 at 02:31:44PM +0100, Punit Agrawal wrote: >> Christoffer Dall writes: >> >> > On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: >> >> Hi guys, >> >> >> >> On 27/03/17 12:20, Punit Agrawal wrote: >> >> > Christoffer Dall writes: >> >> >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: >> >> >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], >> >> >>> notifications for >> >> >>> broken memory can call memory_failure() in mm/memory-failure.c to >> >> >>> deliver >> >> >>> SIGBUS to any user space process using the page, and notify all the >> >> >>> in-kernel users. >> >> >>> >> >> >>> If the page corresponded with guest memory, KVM will unmap this page >> >> >>> from its stage2 page tables. The user space process that allocated >> >> >>> this memory may have never touched this page in which case it may not >> >> >>> be mapped meaning SIGBUS won't be delivered. >> >> >>> >> >> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >> >> >>> comes to process the stage2 fault. >> >> >>> >> >> >>> Do as x86 does, and deliver the SIGBUS when we discover >> >> >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >> >> >>> as this matches the user space mapping size. >> >> >> >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> >> >>> index 962616fd4ddd..9d1aa294e88f 100644 >> >> >>> --- a/arch/arm/kvm/mmu.c >> >> >>> +++ b/arch/arm/kvm/mmu.c >> >> >>> @@ -20,8 +20,10 @@ >> >> >>> #include >> >> >>> #include >> >> >>> #include >> >> >>> +#include >> >> >>> #include >> >> >>> #include >> >> >>> +#include >> >> >>> #include >> >> >>> #include >> >> >>> #include >> >> >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct >> >> >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, >> >> >>> __coherent_cache_guest_page(vcpu, pfn, size); >> >> >>> } >> >> >>> >> >> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool >> >> >>> hugetlb) >> >> >>> +{ >> >> >>> + siginfo_t info; >> >> >>> + >> >> >>> + info.si_signo = SIGBUS; >> >> >>> + info.si_errno = 0; >> >> >>> + info.si_code= BUS_MCEERR_AR; >> >> >>> + info.si_addr= (void __user *)address; >> >> >>> + >> >> >>> + if (hugetlb) >> >> >>> + info.si_addr_lsb = PMD_SHIFT; >> >> >>> + else >> >> >>> + info.si_addr_lsb = PAGE_SHIFT; >> >> >>> + >> >> >>> + send_sig_info(SIGBUS, , current); >> >> >>> +} >> >> >>> + >> >> >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t >> >> >>> fault_ipa, >> >> >>> struct kvm_memory_slot *memslot, unsigned >> >> >>> long hva, >> >> >>> unsigned long fault_status) >> >> >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu >> >> >>> *vcpu, phys_addr_t fault_ipa, >> >> >>> smp_rmb(); >> >> >>> >> >> >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); >> >> >>> + if (pfn == KVM_PFN_ERR_HWPOISON) { >> >> >>> + kvm_send_hwpoison_signal(hva, hugetlb); >> >> >> >> >> >> The way this is called means that we'll only notify userspace of a huge >> >> >> mapping if userspace is mapping hugetlbfs, and not because the stage2 >> >> >> mapping may or may not have used transparent huge pages when the error >> >> >> was discovered. Is this the desired semantics? >> >> >> >> No, >> >> >> >> >> >> > I think so. >> >> > >> >> > AFAIUI, transparent hugepages are split before being poisoned while all >> >> > the underlying pages of a hugepage are poisoned together, i.e., no >> >> > splitting. >> >> >> >> In which case I need to look into this some more! >> >> >> >> My thinking was we should report the size that was knocked out of the >> >> stage2 to >> >> avoid the guest repeatedly faulting until it has touched every >> >> guest-page-size >> >> in the stage2 hole. >> > >> > By signaling something at the fault path, I think it's going to be very >> > hard to backtrack how the stage 2 page tables looked like when faults >> > started happening, because I think these are completely decoupled events >> > (the mmu notifier and the later fault). >> > >> >> >> >> Reading the code in that kvm/mmu.c it looked like the mapping sizes would >> >> always >> >> be the same as those used by userspace. >> > >> > I think the mapping sizes should be the same between userspace and KVM, >> > but the mapping size of a particular page (and associated pages) may >> > vary over time. >> >> Stage 1 and Stage 2 support different hugepage sizes. A larger size >> stage 1 page maps to multiple stage 2 page table entries. For stage 1, >> we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while >> only PMD_SIZE is supported for Stage 2. >> >> > >> >> >> >> If the page was split before KVM could have taken this fault I assumed it >> >> would >> >> fault on the
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
On Mon, Mar 27, 2017 at 02:31:44PM +0100, Punit Agrawal wrote: > Christoffer Dallwrites: > > > On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: > >> Hi guys, > >> > >> On 27/03/17 12:20, Punit Agrawal wrote: > >> > Christoffer Dall writes: > >> >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: > >> >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications > >> >>> for > >> >>> broken memory can call memory_failure() in mm/memory-failure.c to > >> >>> deliver > >> >>> SIGBUS to any user space process using the page, and notify all the > >> >>> in-kernel users. > >> >>> > >> >>> If the page corresponded with guest memory, KVM will unmap this page > >> >>> from its stage2 page tables. The user space process that allocated > >> >>> this memory may have never touched this page in which case it may not > >> >>> be mapped meaning SIGBUS won't be delivered. > >> >>> > >> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it > >> >>> comes to process the stage2 fault. > >> >>> > >> >>> Do as x86 does, and deliver the SIGBUS when we discover > >> >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb > >> >>> as this matches the user space mapping size. > >> > >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >> >>> index 962616fd4ddd..9d1aa294e88f 100644 > >> >>> --- a/arch/arm/kvm/mmu.c > >> >>> +++ b/arch/arm/kvm/mmu.c > >> >>> @@ -20,8 +20,10 @@ > >> >>> #include > >> >>> #include > >> >>> #include > >> >>> +#include > >> >>> #include > >> >>> #include > >> >>> +#include > >> >>> #include > >> >>> #include > >> >>> #include > >> >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct > >> >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, > >> >>>__coherent_cache_guest_page(vcpu, pfn, size); > >> >>> } > >> >>> > >> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool > >> >>> hugetlb) > >> >>> +{ > >> >>> + siginfo_t info; > >> >>> + > >> >>> + info.si_signo = SIGBUS; > >> >>> + info.si_errno = 0; > >> >>> + info.si_code= BUS_MCEERR_AR; > >> >>> + info.si_addr= (void __user *)address; > >> >>> + > >> >>> + if (hugetlb) > >> >>> + info.si_addr_lsb = PMD_SHIFT; > >> >>> + else > >> >>> + info.si_addr_lsb = PAGE_SHIFT; > >> >>> + > >> >>> + send_sig_info(SIGBUS, , current); > >> >>> +} > >> >>> + > >> >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t > >> >>> fault_ipa, > >> >>> struct kvm_memory_slot *memslot, unsigned > >> >>> long hva, > >> >>> unsigned long fault_status) > >> >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu > >> >>> *vcpu, phys_addr_t fault_ipa, > >> >>>smp_rmb(); > >> >>> > >> >>>pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); > >> >>> + if (pfn == KVM_PFN_ERR_HWPOISON) { > >> >>> + kvm_send_hwpoison_signal(hva, hugetlb); > >> >> > >> >> The way this is called means that we'll only notify userspace of a huge > >> >> mapping if userspace is mapping hugetlbfs, and not because the stage2 > >> >> mapping may or may not have used transparent huge pages when the error > >> >> was discovered. Is this the desired semantics? > >> > >> No, > >> > >> > >> > I think so. > >> > > >> > AFAIUI, transparent hugepages are split before being poisoned while all > >> > the underlying pages of a hugepage are poisoned together, i.e., no > >> > splitting. > >> > >> In which case I need to look into this some more! > >> > >> My thinking was we should report the size that was knocked out of the > >> stage2 to > >> avoid the guest repeatedly faulting until it has touched every > >> guest-page-size > >> in the stage2 hole. > > > > By signaling something at the fault path, I think it's going to be very > > hard to backtrack how the stage 2 page tables looked like when faults > > started happening, because I think these are completely decoupled events > > (the mmu notifier and the later fault). > > > >> > >> Reading the code in that kvm/mmu.c it looked like the mapping sizes would > >> always > >> be the same as those used by userspace. > > > > I think the mapping sizes should be the same between userspace and KVM, > > but the mapping size of a particular page (and associated pages) may > > vary over time. > > Stage 1 and Stage 2 support different hugepage sizes. A larger size > stage 1 page maps to multiple stage 2 page table entries. For stage 1, > we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while > only PMD_SIZE is supported for Stage 2. > > > > >> > >> If the page was split before KVM could have taken this fault I assumed it > >> would > >> fault on the page-size mapping and hugetlb would be false. > > > > I think you could have a huge page, which gets unmapped as a result on > > it getting split (perhaps
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Marc Zyngierwrites: > On 27/03/17 14:31, Punit Agrawal wrote: >> Christoffer Dall writes: >> >>> On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: Hi guys, On 27/03/17 12:20, Punit Agrawal wrote: > Christoffer Dall writes: >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications >>> for >>> broken memory can call memory_failure() in mm/memory-failure.c to >>> deliver >>> SIGBUS to any user space process using the page, and notify all the >>> in-kernel users. >>> >>> If the page corresponded with guest memory, KVM will unmap this page >>> from its stage2 page tables. The user space process that allocated >>> this memory may have never touched this page in which case it may not >>> be mapped meaning SIGBUS won't be delivered. >>> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >>> comes to process the stage2 fault. >>> >>> Do as x86 does, and deliver the SIGBUS when we discover >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >>> as this matches the user space mapping size. >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 962616fd4ddd..9d1aa294e88f 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -20,8 +20,10 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, >>> __coherent_cache_guest_page(vcpu, pfn, size); >>> } >>> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool >>> hugetlb) >>> +{ >>> + siginfo_t info; >>> + >>> + info.si_signo = SIGBUS; >>> + info.si_errno = 0; >>> + info.si_code= BUS_MCEERR_AR; >>> + info.si_addr= (void __user *)address; >>> + >>> + if (hugetlb) >>> + info.si_addr_lsb = PMD_SHIFT; >>> + else >>> + info.si_addr_lsb = PAGE_SHIFT; >>> + >>> + send_sig_info(SIGBUS, , current); >>> +} >>> + >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>> struct kvm_memory_slot *memslot, unsigned >>> long hva, >>> unsigned long fault_status) >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> smp_rmb(); >>> >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); >>> + if (pfn == KVM_PFN_ERR_HWPOISON) { >>> + kvm_send_hwpoison_signal(hva, hugetlb); >> >> The way this is called means that we'll only notify userspace of a huge >> mapping if userspace is mapping hugetlbfs, and not because the stage2 >> mapping may or may not have used transparent huge pages when the error >> was discovered. Is this the desired semantics? No, > I think so. > > AFAIUI, transparent hugepages are split before being poisoned while all > the underlying pages of a hugepage are poisoned together, i.e., no > splitting. In which case I need to look into this some more! My thinking was we should report the size that was knocked out of the stage2 to avoid the guest repeatedly faulting until it has touched every guest-page-size in the stage2 hole. >>> >>> By signaling something at the fault path, I think it's going to be very >>> hard to backtrack how the stage 2 page tables looked like when faults >>> started happening, because I think these are completely decoupled events >>> (the mmu notifier and the later fault). >>> Reading the code in that kvm/mmu.c it looked like the mapping sizes would always be the same as those used by userspace. >>> >>> I think the mapping sizes should be the same between userspace and KVM, >>> but the mapping size of a particular page (and associated pages) may >>> vary over time. >> >> Stage 1 and Stage 2 support different hugepage sizes. A larger size >> stage 1 page maps to multiple stage 2 page table entries. For stage 1, >> we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while >> only PMD_SIZE is supported for Stage 2. > > What is stage-1 doing here? We have no idea about what stage-1 is doing > (not under KVM's control). Or do you mean userspace instead? I mean userspace here. Sorry for the confusion. > > Thanks, > > M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
On 27/03/17 14:31, Punit Agrawal wrote: > Christoffer Dallwrites: > >> On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: >>> Hi guys, >>> >>> On 27/03/17 12:20, Punit Agrawal wrote: Christoffer Dall writes: > On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: >> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications >> for >> broken memory can call memory_failure() in mm/memory-failure.c to deliver >> SIGBUS to any user space process using the page, and notify all the >> in-kernel users. >> >> If the page corresponded with guest memory, KVM will unmap this page >> from its stage2 page tables. The user space process that allocated >> this memory may have never touched this page in which case it may not >> be mapped meaning SIGBUS won't be delivered. >> >> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >> comes to process the stage2 fault. >> >> Do as x86 does, and deliver the SIGBUS when we discover >> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >> as this matches the user space mapping size. >>> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 962616fd4ddd..9d1aa294e88f 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -20,8 +20,10 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct >> kvm_vcpu *vcpu, kvm_pfn_t pfn, >> __coherent_cache_guest_page(vcpu, pfn, size); >> } >> >> +static void kvm_send_hwpoison_signal(unsigned long address, bool >> hugetlb) >> +{ >> +siginfo_t info; >> + >> +info.si_signo = SIGBUS; >> +info.si_errno = 0; >> +info.si_code= BUS_MCEERR_AR; >> +info.si_addr= (void __user *)address; >> + >> +if (hugetlb) >> +info.si_addr_lsb = PMD_SHIFT; >> +else >> +info.si_addr_lsb = PAGE_SHIFT; >> + >> +send_sig_info(SIGBUS, , current); >> +} >> + >> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>struct kvm_memory_slot *memslot, unsigned >> long hva, >>unsigned long fault_status) >> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> smp_rmb(); >> >> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); >> +if (pfn == KVM_PFN_ERR_HWPOISON) { >> +kvm_send_hwpoison_signal(hva, hugetlb); > > The way this is called means that we'll only notify userspace of a huge > mapping if userspace is mapping hugetlbfs, and not because the stage2 > mapping may or may not have used transparent huge pages when the error > was discovered. Is this the desired semantics? >>> >>> No, >>> >>> I think so. AFAIUI, transparent hugepages are split before being poisoned while all the underlying pages of a hugepage are poisoned together, i.e., no splitting. >>> >>> In which case I need to look into this some more! >>> >>> My thinking was we should report the size that was knocked out of the >>> stage2 to >>> avoid the guest repeatedly faulting until it has touched every >>> guest-page-size >>> in the stage2 hole. >> >> By signaling something at the fault path, I think it's going to be very >> hard to backtrack how the stage 2 page tables looked like when faults >> started happening, because I think these are completely decoupled events >> (the mmu notifier and the later fault). >> >>> >>> Reading the code in that kvm/mmu.c it looked like the mapping sizes would >>> always >>> be the same as those used by userspace. >> >> I think the mapping sizes should be the same between userspace and KVM, >> but the mapping size of a particular page (and associated pages) may >> vary over time. > > Stage 1 and Stage 2 support different hugepage sizes. A larger size > stage 1 page maps to multiple stage 2 page table entries. For stage 1, > we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while > only PMD_SIZE is supported for Stage 2. What is stage-1 doing here? We have no idea about what stage-1 is doing (not under KVM's control). Or do you mean userspace instead? Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Christoffer Dallwrites: > On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: >> Hi guys, >> >> On 27/03/17 12:20, Punit Agrawal wrote: >> > Christoffer Dall writes: >> >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: >> >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications >> >>> for >> >>> broken memory can call memory_failure() in mm/memory-failure.c to deliver >> >>> SIGBUS to any user space process using the page, and notify all the >> >>> in-kernel users. >> >>> >> >>> If the page corresponded with guest memory, KVM will unmap this page >> >>> from its stage2 page tables. The user space process that allocated >> >>> this memory may have never touched this page in which case it may not >> >>> be mapped meaning SIGBUS won't be delivered. >> >>> >> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >> >>> comes to process the stage2 fault. >> >>> >> >>> Do as x86 does, and deliver the SIGBUS when we discover >> >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >> >>> as this matches the user space mapping size. >> >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> >>> index 962616fd4ddd..9d1aa294e88f 100644 >> >>> --- a/arch/arm/kvm/mmu.c >> >>> +++ b/arch/arm/kvm/mmu.c >> >>> @@ -20,8 +20,10 @@ >> >>> #include >> >>> #include >> >>> #include >> >>> +#include >> >>> #include >> >>> #include >> >>> +#include >> >>> #include >> >>> #include >> >>> #include >> >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct >> >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, >> >>> __coherent_cache_guest_page(vcpu, pfn, size); >> >>> } >> >>> >> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool >> >>> hugetlb) >> >>> +{ >> >>> +siginfo_t info; >> >>> + >> >>> +info.si_signo = SIGBUS; >> >>> +info.si_errno = 0; >> >>> +info.si_code= BUS_MCEERR_AR; >> >>> +info.si_addr= (void __user *)address; >> >>> + >> >>> +if (hugetlb) >> >>> +info.si_addr_lsb = PMD_SHIFT; >> >>> +else >> >>> +info.si_addr_lsb = PAGE_SHIFT; >> >>> + >> >>> +send_sig_info(SIGBUS, , current); >> >>> +} >> >>> + >> >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> >>>struct kvm_memory_slot *memslot, unsigned >> >>> long hva, >> >>>unsigned long fault_status) >> >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> >>> phys_addr_t fault_ipa, >> >>> smp_rmb(); >> >>> >> >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); >> >>> +if (pfn == KVM_PFN_ERR_HWPOISON) { >> >>> +kvm_send_hwpoison_signal(hva, hugetlb); >> >> >> >> The way this is called means that we'll only notify userspace of a huge >> >> mapping if userspace is mapping hugetlbfs, and not because the stage2 >> >> mapping may or may not have used transparent huge pages when the error >> >> was discovered. Is this the desired semantics? >> >> No, >> >> >> > I think so. >> > >> > AFAIUI, transparent hugepages are split before being poisoned while all >> > the underlying pages of a hugepage are poisoned together, i.e., no >> > splitting. >> >> In which case I need to look into this some more! >> >> My thinking was we should report the size that was knocked out of the stage2 >> to >> avoid the guest repeatedly faulting until it has touched every >> guest-page-size >> in the stage2 hole. > > By signaling something at the fault path, I think it's going to be very > hard to backtrack how the stage 2 page tables looked like when faults > started happening, because I think these are completely decoupled events > (the mmu notifier and the later fault). > >> >> Reading the code in that kvm/mmu.c it looked like the mapping sizes would >> always >> be the same as those used by userspace. > > I think the mapping sizes should be the same between userspace and KVM, > but the mapping size of a particular page (and associated pages) may > vary over time. Stage 1 and Stage 2 support different hugepage sizes. A larger size stage 1 page maps to multiple stage 2 page table entries. For stage 1, we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while only PMD_SIZE is supported for Stage 2. > >> >> If the page was split before KVM could have taken this fault I assumed it >> would >> fault on the page-size mapping and hugetlb would be false. > > I think you could have a huge page, which gets unmapped as a result on > it getting split (perhaps because there was a failure on one page) and > later as you fault, you can discover a range which can be a hugetlbfs or > transparent huge pages. > > The question that I don't know is how Linux behaves if a page is marked > with hwpoison, in that case, if Linux never supports THP and always >
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote: > Hi guys, > > On 27/03/17 12:20, Punit Agrawal wrote: > > Christoffer Dallwrites: > >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: > >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for > >>> broken memory can call memory_failure() in mm/memory-failure.c to deliver > >>> SIGBUS to any user space process using the page, and notify all the > >>> in-kernel users. > >>> > >>> If the page corresponded with guest memory, KVM will unmap this page > >>> from its stage2 page tables. The user space process that allocated > >>> this memory may have never touched this page in which case it may not > >>> be mapped meaning SIGBUS won't be delivered. > >>> > >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it > >>> comes to process the stage2 fault. > >>> > >>> Do as x86 does, and deliver the SIGBUS when we discover > >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb > >>> as this matches the user space mapping size. > > >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >>> index 962616fd4ddd..9d1aa294e88f 100644 > >>> --- a/arch/arm/kvm/mmu.c > >>> +++ b/arch/arm/kvm/mmu.c > >>> @@ -20,8 +20,10 @@ > >>> #include > >>> #include > >>> #include > >>> +#include > >>> #include > >>> #include > >>> +#include > >>> #include > >>> #include > >>> #include > >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct > >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, > >>> __coherent_cache_guest_page(vcpu, pfn, size); > >>> } > >>> > >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) > >>> +{ > >>> + siginfo_t info; > >>> + > >>> + info.si_signo = SIGBUS; > >>> + info.si_errno = 0; > >>> + info.si_code= BUS_MCEERR_AR; > >>> + info.si_addr= (void __user *)address; > >>> + > >>> + if (hugetlb) > >>> + info.si_addr_lsb = PMD_SHIFT; > >>> + else > >>> + info.si_addr_lsb = PAGE_SHIFT; > >>> + > >>> + send_sig_info(SIGBUS, , current); > >>> +} > >>> + > >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > >>> struct kvm_memory_slot *memslot, unsigned long hva, > >>> unsigned long fault_status) > >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > >>> phys_addr_t fault_ipa, > >>> smp_rmb(); > >>> > >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); > >>> + if (pfn == KVM_PFN_ERR_HWPOISON) { > >>> + kvm_send_hwpoison_signal(hva, hugetlb); > >> > >> The way this is called means that we'll only notify userspace of a huge > >> mapping if userspace is mapping hugetlbfs, and not because the stage2 > >> mapping may or may not have used transparent huge pages when the error > >> was discovered. Is this the desired semantics? > > No, > > > > I think so. > > > > AFAIUI, transparent hugepages are split before being poisoned while all > > the underlying pages of a hugepage are poisoned together, i.e., no > > splitting. > > In which case I need to look into this some more! > > My thinking was we should report the size that was knocked out of the stage2 > to > avoid the guest repeatedly faulting until it has touched every guest-page-size > in the stage2 hole. By signaling something at the fault path, I think it's going to be very hard to backtrack how the stage 2 page tables looked like when faults started happening, because I think these are completely decoupled events (the mmu notifier and the later fault). > > Reading the code in that kvm/mmu.c it looked like the mapping sizes would > always > be the same as those used by userspace. I think the mapping sizes should be the same between userspace and KVM, but the mapping size of a particular page (and associated pages) may vary over time. > > If the page was split before KVM could have taken this fault I assumed it > would > fault on the page-size mapping and hugetlb would be false. I think you could have a huge page, which gets unmapped as a result on it getting split (perhaps because there was a failure on one page) and later as you fault, you can discover a range which can be a hugetlbfs or transparent huge pages. The question that I don't know is how Linux behaves if a page is marked with hwpoison, in that case, if Linux never supports THP and always marks an entire huge page in a hugetlbfs with the poison, then I think we're mostly good here. If not, we should make sure we align with whatever the rest of the kernel does. > (which is already > wrong for another reason, looks like I grabbed the variable before > transparent_hugepage_adjust() has had a go a it.). > yes, which is why I asked if you only care about hugetlbfs. > > >> Also notice that the hva is not necessarily aligned to the beginning of > >> the huge page, so can we be giving userspace wrong information by > >> pointing in the middle of a huge page
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Hi guys, On 27/03/17 12:20, Punit Agrawal wrote: > Christoffer Dallwrites: >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for >>> broken memory can call memory_failure() in mm/memory-failure.c to deliver >>> SIGBUS to any user space process using the page, and notify all the >>> in-kernel users. >>> >>> If the page corresponded with guest memory, KVM will unmap this page >>> from its stage2 page tables. The user space process that allocated >>> this memory may have never touched this page in which case it may not >>> be mapped meaning SIGBUS won't be delivered. >>> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >>> comes to process the stage2 fault. >>> >>> Do as x86 does, and deliver the SIGBUS when we discover >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >>> as this matches the user space mapping size. >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 962616fd4ddd..9d1aa294e88f 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -20,8 +20,10 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct >>> kvm_vcpu *vcpu, kvm_pfn_t pfn, >>> __coherent_cache_guest_page(vcpu, pfn, size); >>> } >>> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) >>> +{ >>> + siginfo_t info; >>> + >>> + info.si_signo = SIGBUS; >>> + info.si_errno = 0; >>> + info.si_code= BUS_MCEERR_AR; >>> + info.si_addr= (void __user *)address; >>> + >>> + if (hugetlb) >>> + info.si_addr_lsb = PMD_SHIFT; >>> + else >>> + info.si_addr_lsb = PAGE_SHIFT; >>> + >>> + send_sig_info(SIGBUS, , current); >>> +} >>> + >>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>> struct kvm_memory_slot *memslot, unsigned long hva, >>> unsigned long fault_status) >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> smp_rmb(); >>> >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); >>> + if (pfn == KVM_PFN_ERR_HWPOISON) { >>> + kvm_send_hwpoison_signal(hva, hugetlb); >> >> The way this is called means that we'll only notify userspace of a huge >> mapping if userspace is mapping hugetlbfs, and not because the stage2 >> mapping may or may not have used transparent huge pages when the error >> was discovered. Is this the desired semantics? No, > I think so. > > AFAIUI, transparent hugepages are split before being poisoned while all > the underlying pages of a hugepage are poisoned together, i.e., no > splitting. In which case I need to look into this some more! My thinking was we should report the size that was knocked out of the stage2 to avoid the guest repeatedly faulting until it has touched every guest-page-size in the stage2 hole. Reading the code in that kvm/mmu.c it looked like the mapping sizes would always be the same as those used by userspace. If the page was split before KVM could have taken this fault I assumed it would fault on the page-size mapping and hugetlb would be false. (which is already wrong for another reason, looks like I grabbed the variable before transparent_hugepage_adjust() has had a go a it.). >> Also notice that the hva is not necessarily aligned to the beginning of >> the huge page, so can we be giving userspace wrong information by >> pointing in the middle of a huge page and telling it there was an >> address error in the size of the PMD ? >> > > I could be reading it wrong but I think we are fine here - the address > (hva) is the location that faulted. And the lsb indicates the least > significant bit of the faulting address (See man sigaction(2)). The > receiver of the signal is expected to use the address and lsb to workout > the extent of corruption. kill_proc() in mm/memory-failure.c does this too, but the address is set by page_address_in_vma() in add_to_kill() of the same file. (I'll chat with Punit off list.) > Though I missed a subtlety while reviewing the patch before. The > reported lsb should be for the userspace hugepage mapping (i.e., hva) > and not for the stage 2. I thought these were always supposed to be the same, and using hugetlb was a bug because I didn't look closely enough at what is_vm_hugetlb_page() does. > In light of this, I'd like to retract my Reviewed-by tag for this > version of the patch as I believe we'll need to change the lsb > reporting. Sure, lets work out what this should be doing. I'm beginning to suspect x86's 'always page size' was correct to begin with! Thanks, James ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Christoffer Dallwrites: > On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: >> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for >> broken memory can call memory_failure() in mm/memory-failure.c to deliver >> SIGBUS to any user space process using the page, and notify all the >> in-kernel users. >> >> If the page corresponded with guest memory, KVM will unmap this page >> from its stage2 page tables. The user space process that allocated >> this memory may have never touched this page in which case it may not >> be mapped meaning SIGBUS won't be delivered. >> >> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it >> comes to process the stage2 fault. >> >> Do as x86 does, and deliver the SIGBUS when we discover >> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb >> as this matches the user space mapping size. >> >> Signed-off-by: James Morse >> CC: gengdongjiu >> --- >> Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns >> EFAULT. >> QEMU: error: kvm run failed Bad address >> LVKM: KVM_RUN failed: Bad address >> >> With this patch both kvmtool and Qemu receive SIGBUS ... and then exit. >> In the future Qemu can use this signal to notify the guest, for more details >> see hwpoison[1]. >> >> [0] https://www.spinics.net/lists/arm-kernel/msg560009.html >> [1] >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt >> >> >> arch/arm/kvm/mmu.c | 23 +++ >> 1 file changed, 23 insertions(+) >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 962616fd4ddd..9d1aa294e88f 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -20,8 +20,10 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu >> *vcpu, kvm_pfn_t pfn, >> __coherent_cache_guest_page(vcpu, pfn, size); >> } >> >> +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) >> +{ >> +siginfo_t info; >> + >> +info.si_signo = SIGBUS; >> +info.si_errno = 0; >> +info.si_code= BUS_MCEERR_AR; >> +info.si_addr= (void __user *)address; >> + >> +if (hugetlb) >> +info.si_addr_lsb = PMD_SHIFT; >> +else >> +info.si_addr_lsb = PAGE_SHIFT; >> + >> +send_sig_info(SIGBUS, , current); >> +} >> + >> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>struct kvm_memory_slot *memslot, unsigned long hva, >>unsigned long fault_status) >> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> smp_rmb(); >> >> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); >> +if (pfn == KVM_PFN_ERR_HWPOISON) { >> +kvm_send_hwpoison_signal(hva, hugetlb); > > The way this is called means that we'll only notify userspace of a huge > mapping if userspace is mapping hugetlbfs, and not because the stage2 > mapping may or may not have used transparent huge pages when the error > was discovered. Is this the desired semantics? I think so. AFAIUI, transparent hugepages are split before being poisoned while all the underlying pages of a hugepage are poisoned together, i.e., no splitting. > > Also notice that the hva is not necessarily aligned to the beginning of > the huge page, so can we be giving userspace wrong information by > pointing in the middle of a huge page and telling it there was an > address error in the size of the PMD ? > I could be reading it wrong but I think we are fine here - the address (hva) is the location that faulted. And the lsb indicates the least significant bit of the faulting address (See man sigaction(2)). The receiver of the signal is expected to use the address and lsb to workout the extent of corruption. Though I missed a subtlety while reviewing the patch before. The reported lsb should be for the userspace hugepage mapping (i.e., hva) and not for the stage 2. So in the case of hugepages the value of lsb should be - huge_page_shift(hstate_vma(vma)) as the kernel supports more than just PMD size hugepages. Does that make sense? In light of this, I'd like to retract my Reviewed-by tag for this version of the patch as I believe we'll need to change the lsb reporting. Thanks, Punit >> +return 0; >> +} >> if (is_error_noslot_pfn(pfn)) >> return -EFAULT; >> >> -- >> 2.10.1 >> > > Thanks, > -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote: > Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for > broken memory can call memory_failure() in mm/memory-failure.c to deliver > SIGBUS to any user space process using the page, and notify all the > in-kernel users. > > If the page corresponded with guest memory, KVM will unmap this page > from its stage2 page tables. The user space process that allocated > this memory may have never touched this page in which case it may not > be mapped meaning SIGBUS won't be delivered. > > When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it > comes to process the stage2 fault. > > Do as x86 does, and deliver the SIGBUS when we discover > KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb > as this matches the user space mapping size. > > Signed-off-by: James Morse> CC: gengdongjiu > --- > Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns > EFAULT. > QEMU: error: kvm run failed Bad address > LVKM: KVM_RUN failed: Bad address > > With this patch both kvmtool and Qemu receive SIGBUS ... and then exit. > In the future Qemu can use this signal to notify the guest, for more details > see hwpoison[1]. > > [0] https://www.spinics.net/lists/arm-kernel/msg560009.html > [1] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt > > > arch/arm/kvm/mmu.c | 23 +++ > 1 file changed, 23 insertions(+) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 962616fd4ddd..9d1aa294e88f 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -20,8 +20,10 @@ > #include > #include > #include > +#include > #include > #include > +#include > #include > #include > #include > @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu > *vcpu, kvm_pfn_t pfn, > __coherent_cache_guest_page(vcpu, pfn, size); > } > > +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) > +{ > + siginfo_t info; > + > + info.si_signo = SIGBUS; > + info.si_errno = 0; > + info.si_code= BUS_MCEERR_AR; > + info.si_addr= (void __user *)address; > + > + if (hugetlb) > + info.si_addr_lsb = PMD_SHIFT; > + else > + info.si_addr_lsb = PAGE_SHIFT; > + > + send_sig_info(SIGBUS, , current); > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_memory_slot *memslot, unsigned long hva, > unsigned long fault_status) > @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > smp_rmb(); > > pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); > + if (pfn == KVM_PFN_ERR_HWPOISON) { > + kvm_send_hwpoison_signal(hva, hugetlb); The way this is called means that we'll only notify userspace of a huge mapping if userspace is mapping hugetlbfs, and not because the stage2 mapping may or may not have used transparent huge pages when the error was discovered. Is this the desired semantics? Also notice that the hva is not necessarily aligned to the beginning of the huge page, so can we be giving userspace wrong information by pointing in the middle of a huge page and telling it there was an address error in the size of the PMD ? > + return 0; > + } > if (is_error_noslot_pfn(pfn)) > return -EFAULT; > > -- > 2.10.1 > Thanks, -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Hi James, One comment at the end. James Morsewrites: > Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for > broken memory can call memory_failure() in mm/memory-failure.c to deliver > SIGBUS to any user space process using the page, and notify all the > in-kernel users. > > If the page corresponded with guest memory, KVM will unmap this page > from its stage2 page tables. The user space process that allocated > this memory may have never touched this page in which case it may not > be mapped meaning SIGBUS won't be delivered. > > When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it > comes to process the stage2 fault. > > Do as x86 does, and deliver the SIGBUS when we discover > KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb > as this matches the user space mapping size. > > Signed-off-by: James Morse > CC: gengdongjiu > --- > Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns > EFAULT. > QEMU: error: kvm run failed Bad address > LVKM: KVM_RUN failed: Bad address > > With this patch both kvmtool and Qemu receive SIGBUS ... and then exit. > In the future Qemu can use this signal to notify the guest, for more details > see hwpoison[1]. > > [0] https://www.spinics.net/lists/arm-kernel/msg560009.html > [1] > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt > > > arch/arm/kvm/mmu.c | 23 +++ > 1 file changed, 23 insertions(+) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 962616fd4ddd..9d1aa294e88f 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -20,8 +20,10 @@ > #include > #include > #include > +#include > #include > #include > +#include > #include > #include > #include > @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu > *vcpu, kvm_pfn_t pfn, > __coherent_cache_guest_page(vcpu, pfn, size); > } > > +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) > +{ > + siginfo_t info; > + > + info.si_signo = SIGBUS; > + info.si_errno = 0; > + info.si_code= BUS_MCEERR_AR; > + info.si_addr= (void __user *)address; > + > + if (hugetlb) > + info.si_addr_lsb = PMD_SHIFT; > + else > + info.si_addr_lsb = PAGE_SHIFT; > + > + send_sig_info(SIGBUS, , current); > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_memory_slot *memslot, unsigned long hva, > unsigned long fault_status) > @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > smp_rmb(); > > pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); > + if (pfn == KVM_PFN_ERR_HWPOISON) { > + kvm_send_hwpoison_signal(hva, hugetlb); > + return 0; > + } > if (is_error_noslot_pfn(pfn)) > return -EFAULT; The changes look good to me. Though in essence as mentioned in the commit log we are not doing anything different to x86 here. Worth moving kvm_send_hwpoison_signal to an architecture agostic location and using it from there? In any case, FWIW, Reviewed-by: Punit Agrawal Thanks. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for broken memory can call memory_failure() in mm/memory-failure.c to deliver SIGBUS to any user space process using the page, and notify all the in-kernel users. If the page corresponded with guest memory, KVM will unmap this page from its stage2 page tables. The user space process that allocated this memory may have never touched this page in which case it may not be mapped meaning SIGBUS won't be delivered. When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it comes to process the stage2 fault. Do as x86 does, and deliver the SIGBUS when we discover KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb as this matches the user space mapping size. Signed-off-by: James MorseCC: gengdongjiu --- Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns EFAULT. QEMU: error: kvm run failed Bad address LVKM: KVM_RUN failed: Bad address With this patch both kvmtool and Qemu receive SIGBUS ... and then exit. In the future Qemu can use this signal to notify the guest, for more details see hwpoison[1]. [0] https://www.spinics.net/lists/arm-kernel/msg560009.html [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt arch/arm/kvm/mmu.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 962616fd4ddd..9d1aa294e88f 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -20,8 +20,10 @@ #include #include #include +#include #include #include +#include #include #include #include @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn, __coherent_cache_guest_page(vcpu, pfn, size); } +static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb) +{ + siginfo_t info; + + info.si_signo = SIGBUS; + info.si_errno = 0; + info.si_code= BUS_MCEERR_AR; + info.si_addr= (void __user *)address; + + if (hugetlb) + info.si_addr_lsb = PMD_SHIFT; + else + info.si_addr_lsb = PAGE_SHIFT; + + send_sig_info(SIGBUS, , current); +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, smp_rmb(); pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, ); + if (pfn == KVM_PFN_ERR_HWPOISON) { + kvm_send_hwpoison_signal(hva, hugetlb); + return 0; + } if (is_error_noslot_pfn(pfn)) return -EFAULT; -- 2.10.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm