Re: [kvm-devel] [patch 0/3] QEMU balloon support
Marcelo Tosatti wrote: Hi Avi, Good that you're back. On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: This patchset resends Anthony's QEMU balloon support plus: - Truncates the target size to ram size - Enables madvise() conditioned on KVM_ZAP_GFN ioctl Once mmu notifiers are in, KVM_ZAP_GFN isn't needed. So we have three possible situations: - zap needed, but not available: don't madvise() - zap needed and available: zap and madvise() - zap unneeded: madvise() Right. That is what the patch does. You just have to fill in have_mmu_notifiers here: +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn) +{ + unsigned long addr; + int have_mmu_notifiers = 0; + + down_read(kvm-slots_lock); + addr = gfn_to_hva(kvm, gfn); + + if (kvm_is_error_hva(addr)) { + up_read(kvm-slots_lock); + return -EINVAL; + } + + if (!have_mmu_notifiers) { + spin_lock(kvm-mmu_lock); + rmap_nuke(kvm, gfn); + spin_unlock(kvm-mmu_lock); + } + up_read(kvm-slots_lock); So as to avoid rmap_nuke, since that will be done through the madvise() path. Why not do it in userspace? I'll likely merge zap directly into the backwards compatibility support (it won't ever appear in mainline since mmu notifiers will be merged sooner). Did you find out what's causing the errors in the first place (if zap is not used)? It worries me greatly. Yes, the problem is that the rmap code does not handle the qemu process mappings from vanishing while there is a present rmap. If that happens, and there is a fault for a gfn whose qemu mapping has been removed, a different physical zero page will be allocated: rmap a - gfn 0 - physical host page 0 mapping for gfn 0 gets removed guest faults in gfn 0 through the same pte chain rmap a - gfn 0 - physical host page 1 When instantiating the shadow mapping for the second time, the is_rmap_pte check succeeds, so we release the reference grabbed by gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing to a physical page without having an additional reference on that page. The following makes the host not crash under such a condition, but the condition itself is invalid leading to inconsistent state on the guest. So IMHO it shouldnt be allowed to happen in the first place. The only way to prevent it is with mmu notifiers, which we may not have for 2.6.25. So I'd like to send this for 2.6.25-rc. Please add a signoff. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0cdfba..4c93b79 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t +gva) return page; } +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page) +{ + int ret = 0; + unsigned long host_pfn = (*spte PT64_BASE_ADDR_MASK) PAGE_SHIFT; hfn_t hfn + + if (is_rmap_pte(*spte)) { + if (host_pfn != page_to_pfn(page)) + rmap_remove(kvm, spte); + else + ret = 1; + } + + return ret; +} + static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, unsigned pt_access, unsigned pte_access, int user_fault, int write_fault, int dirty, @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 +*shadow_pte, struct page *page) { u64 spte; - int was_rmapped = is_rmap_pte(*shadow_pte); + int was_rmapped = was_spte_rmapped(vcpu-kvm, shadow_pte, page); int was_writeble = is_writeble_pte(*shadow_pte); Calling code with side effects in an initializer is a bit confusing. I suggest simply inlining the code into mmu_set_spte(). -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 0/3] QEMU balloon support
On Mon, Mar 17, 2008 at 01:11:11PM +0200, Avi Kivity wrote: + up_read(kvm-slots_lock); So as to avoid rmap_nuke, since that will be done through the madvise() path. Why not do it in userspace? I don't see any way of knowing whether you have or not mmu notifiers from userspace except adding an ioctl. I'll likely merge zap directly into the backwards compatibility support (it won't ever appear in mainline since mmu notifiers will be merged sooner). Even with mmu notifiers QEMU needs to ask the host kernel do you have mmu notifiers? before zapping any page, otherwise the guest will crash. So some method of finding if the host kernel has mmu notifiers needs to be in place even in mainline. And what about allocation of ioctl numbers? Will you reserve the number used for ZAP_GFN on mainline? Did you find out what's causing the errors in the first place (if zap is not used)? It worries me greatly. Yes, the problem is that the rmap code does not handle the qemu process mappings from vanishing while there is a present rmap. If that happens, and there is a fault for a gfn whose qemu mapping has been removed, a different physical zero page will be allocated: rmap a - gfn 0 - physical host page 0 mapping for gfn 0 gets removed guest faults in gfn 0 through the same pte chain rmap a - gfn 0 - physical host page 1 When instantiating the shadow mapping for the second time, the is_rmap_pte check succeeds, so we release the reference grabbed by gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing to a physical page without having an additional reference on that page. The following makes the host not crash under such a condition, but the condition itself is invalid leading to inconsistent state on the guest. So IMHO it shouldnt be allowed to happen in the first place. The only way to prevent it is with mmu notifiers, which we may not have for 2.6.25. So I'd like to send this for 2.6.25-rc. Well, you can prevent it by never allowing a page from being removed while there are shadow mappings for it. So while this fixes the host crash, it won't fix the guest crash. KVM: MMU: handle page removal with shadow mapping Do not assume that a shadow mapping will always point to the same host frame number. Fixes crash with madvise(MADV_DONTNEED). Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0cdfba..4caa1a0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1016,8 +1016,16 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, struct page *page) { u64 spte; - int was_rmapped = is_rmap_pte(*shadow_pte); + int was_rmapped = 0; int was_writeble = is_writeble_pte(*shadow_pte); + hfn_t host_pfn = (*shadow_pte PT64_BASE_ADDR_MASK) PAGE_SHIFT; + + if (is_rmap_pte(*shadow_pte)) { + if (host_pfn != page_to_pfn(page)) + rmap_remove(vcpu-kvm, shadow_pte); + else + was_rmapped = 1; + } /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 0/3] QEMU balloon support
Marcelo Tosatti wrote: This patchset resends Anthony's QEMU balloon support plus: - Truncates the target size to ram size - Enables madvise() conditioned on KVM_ZAP_GFN ioctl Once mmu notifiers are in, KVM_ZAP_GFN isn't needed. So we have three possible situations: - zap needed, but not available: don't madvise() - zap needed and available: zap and madvise() - zap unneeded: madvise() Did you find out what's causing the errors in the first place (if zap is not used)? It worries me greatly. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 0/3] QEMU balloon support
Hi Avi, Good that you're back. On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: This patchset resends Anthony's QEMU balloon support plus: - Truncates the target size to ram size - Enables madvise() conditioned on KVM_ZAP_GFN ioctl Once mmu notifiers are in, KVM_ZAP_GFN isn't needed. So we have three possible situations: - zap needed, but not available: don't madvise() - zap needed and available: zap and madvise() - zap unneeded: madvise() Right. That is what the patch does. You just have to fill in have_mmu_notifiers here: +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn) +{ + unsigned long addr; + int have_mmu_notifiers = 0; + + down_read(kvm-slots_lock); + addr = gfn_to_hva(kvm, gfn); + + if (kvm_is_error_hva(addr)) { + up_read(kvm-slots_lock); + return -EINVAL; + } + + if (!have_mmu_notifiers) { + spin_lock(kvm-mmu_lock); + rmap_nuke(kvm, gfn); + spin_unlock(kvm-mmu_lock); + } + up_read(kvm-slots_lock); So as to avoid rmap_nuke, since that will be done through the madvise() path. Did you find out what's causing the errors in the first place (if zap is not used)? It worries me greatly. Yes, the problem is that the rmap code does not handle the qemu process mappings from vanishing while there is a present rmap. If that happens, and there is a fault for a gfn whose qemu mapping has been removed, a different physical zero page will be allocated: rmap a - gfn 0 - physical host page 0 mapping for gfn 0 gets removed guest faults in gfn 0 through the same pte chain rmap a - gfn 0 - physical host page 1 When instantiating the shadow mapping for the second time, the is_rmap_pte check succeeds, so we release the reference grabbed by gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing to a physical page without having an additional reference on that page. The following makes the host not crash under such a condition, but the condition itself is invalid leading to inconsistent state on the guest. So IMHO it shouldnt be allowed to happen in the first place. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0cdfba..4c93b79 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t +gva) return page; } +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page) +{ + int ret = 0; + unsigned long host_pfn = (*spte PT64_BASE_ADDR_MASK) PAGE_SHIFT; + + if (is_rmap_pte(*spte)) { + if (host_pfn != page_to_pfn(page)) + rmap_remove(kvm, spte); + else + ret = 1; + } + + return ret; +} + static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, unsigned pt_access, unsigned pte_access, int user_fault, int write_fault, int dirty, @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 +*shadow_pte, struct page *page) { u64 spte; - int was_rmapped = is_rmap_pte(*shadow_pte); + int was_rmapped = was_spte_rmapped(vcpu-kvm, shadow_pte, page); int was_writeble = is_writeble_pte(*shadow_pte); /* - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [patch 0/3] QEMU balloon support
This patchset resends Anthony's QEMU balloon support plus: - Truncates the target size to ram size - Enables madvise() conditioned on KVM_ZAP_GFN ioctl -- - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel