Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
 Hi Avi,

 Good that you're back.

 On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote:
   
 Marcelo Tosatti wrote:
 
 This patchset resends Anthony's QEMU balloon support plus:

 - Truncates the target size to ram size
 - Enables madvise() conditioned on KVM_ZAP_GFN ioctl

  
   
 Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
 possible situations:

 - zap needed, but not available: don't madvise()
 - zap needed and available: zap and madvise()
 - zap unneeded: madvise()
 

 Right. That is what the patch does. You just have to fill in
 have_mmu_notifiers here:

 +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn)
 +{
 +   unsigned long addr;
 +   int have_mmu_notifiers = 0;
 +
 +   down_read(kvm-slots_lock);
 +   addr = gfn_to_hva(kvm, gfn);
 +
 +   if (kvm_is_error_hva(addr)) {
 +   up_read(kvm-slots_lock);
 +   return -EINVAL;
 +   }
 +
 +   if (!have_mmu_notifiers) {
 +   spin_lock(kvm-mmu_lock);
 +   rmap_nuke(kvm, gfn);
 +   spin_unlock(kvm-mmu_lock);
 +   }
 +   up_read(kvm-slots_lock);

 So as to avoid rmap_nuke, since that will be done through the madvise()
 path.

   

Why not do it in userspace?

I'll likely merge zap directly into the backwards compatibility support 
(it won't ever appear in mainline since mmu notifiers will be merged 
sooner).

 Did you find out what's causing the errors in the first place (if zap is 
 not used)?  It worries me greatly.
 

 Yes, the problem is that the rmap code does not handle the qemu process
 mappings from vanishing while there is a present rmap. If that happens,
 and there is a fault for a gfn whose qemu mapping has been removed, a
 different physical zero page will be allocated:

   rmap a - gfn 0 - physical host page 0
   mapping for gfn 0 gets removed
   guest faults in gfn 0 through the same pte chain
   rmap a - gfn 0 - physical host page 1

 When instantiating the shadow mapping for the second time, the
 is_rmap_pte check succeeds, so we release the reference grabbed by
 gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
 to a physical page without having an additional reference on that page.

 The following makes the host not crash under such a condition, but the
 condition itself is invalid leading to inconsistent state on the guest.
 So IMHO it shouldnt be allowed to happen in the first place.

   

The only way to prevent it is with mmu notifiers, which we may not have 
for 2.6.25.  So I'd like to send this for 2.6.25-rc.

Please add a signoff.

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index f0cdfba..4c93b79 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t
 +gva)
 return page;
  }

 +static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page)
 +{
 +   int ret = 0;
 +   unsigned long host_pfn = (*spte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
   

hfn_t hfn

 +
 +   if (is_rmap_pte(*spte)) {
 +   if (host_pfn != page_to_pfn(page))
 +   rmap_remove(kvm, spte);
 +   else
 +   ret = 1;
 +   }
 +
 +   return ret;
 +}
 +
  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
  unsigned pt_access, unsigned pte_access,
  int user_fault, int write_fault, int dirty,
 @@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64
 +*shadow_pte,
  struct page *page)
  {
 u64 spte;
 -   int was_rmapped = is_rmap_pte(*shadow_pte);
 +   int was_rmapped = was_spte_rmapped(vcpu-kvm, shadow_pte, page);
 int was_writeble = is_writeble_pte(*shadow_pte);
   

Calling code with side effects in an initializer is a bit confusing.  I 
suggest simply inlining the code into mmu_set_spte().

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-17 Thread Marcelo Tosatti
On Mon, Mar 17, 2008 at 01:11:11PM +0200, Avi Kivity wrote:

 +   up_read(kvm-slots_lock);
 
 So as to avoid rmap_nuke, since that will be done through the madvise()
 path.
 
   
 
 Why not do it in userspace?

I don't see any way of knowing whether you have or not mmu notifiers from 
userspace except adding an ioctl.

 I'll likely merge zap directly into the backwards compatibility support 
 (it won't ever appear in mainline since mmu notifiers will be merged 
 sooner).

Even with mmu notifiers QEMU needs to ask the host kernel do you have
mmu notifiers? before zapping any page, otherwise the guest will crash.

So some method of finding if the host kernel has mmu notifiers needs to
be in place even in mainline.

And what about allocation of ioctl numbers? Will you reserve the number
used for ZAP_GFN on mainline?

 Did you find out what's causing the errors in the first place (if zap is 
 not used)?  It worries me greatly.
 
 
 Yes, the problem is that the rmap code does not handle the qemu process
 mappings from vanishing while there is a present rmap. If that happens,
 and there is a fault for a gfn whose qemu mapping has been removed, a
 different physical zero page will be allocated:
 
  rmap a - gfn 0 - physical host page 0
  mapping for gfn 0 gets removed
  guest faults in gfn 0 through the same pte chain
  rmap a - gfn 0 - physical host page 1
 
 When instantiating the shadow mapping for the second time, the
 is_rmap_pte check succeeds, so we release the reference grabbed by
 gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
 to a physical page without having an additional reference on that page.
 
 The following makes the host not crash under such a condition, but the
 condition itself is invalid leading to inconsistent state on the guest.
 So IMHO it shouldnt be allowed to happen in the first place.
 
   
 
 The only way to prevent it is with mmu notifiers, which we may not have 
 for 2.6.25.  So I'd like to send this for 2.6.25-rc.

Well, you can prevent it by never allowing a page from being removed while 
there are shadow mappings for it. So while this fixes the host crash, it 
won't fix the guest crash.

KVM: MMU: handle page removal with shadow mapping

Do not assume that a shadow mapping will always point to the same host
frame number. Fixes crash with madvise(MADV_DONTNEED).

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f0cdfba..4caa1a0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1016,8 +1016,16 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 struct page *page)
 {
u64 spte;
-   int was_rmapped = is_rmap_pte(*shadow_pte);
+   int was_rmapped = 0;
int was_writeble = is_writeble_pte(*shadow_pte);
+   hfn_t host_pfn = (*shadow_pte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
+
+   if (is_rmap_pte(*shadow_pte)) {
+   if (host_pfn != page_to_pfn(page))
+   rmap_remove(vcpu-kvm, shadow_pte);
+   else
+   was_rmapped = 1;
+   }
 
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-16 Thread Avi Kivity
Marcelo Tosatti wrote:
 This patchset resends Anthony's QEMU balloon support plus:

 - Truncates the target size to ram size
 - Enables madvise() conditioned on KVM_ZAP_GFN ioctl

   

Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
possible situations:

- zap needed, but not available: don't madvise()
- zap needed and available: zap and madvise()
- zap unneeded: madvise()

Did you find out what's causing the errors in the first place (if zap is 
not used)?  It worries me greatly.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-16 Thread Marcelo Tosatti
Hi Avi,

Good that you're back.

On Sun, Mar 16, 2008 at 04:00:06PM +0200, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 This patchset resends Anthony's QEMU balloon support plus:
 
 - Truncates the target size to ram size
 - Enables madvise() conditioned on KVM_ZAP_GFN ioctl
 
   
 
 Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
 possible situations:
 
 - zap needed, but not available: don't madvise()
 - zap needed and available: zap and madvise()
 - zap unneeded: madvise()

Right. That is what the patch does. You just have to fill in
have_mmu_notifiers here:

+int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn)
+{
+   unsigned long addr;
+   int have_mmu_notifiers = 0;
+
+   down_read(kvm-slots_lock);
+   addr = gfn_to_hva(kvm, gfn);
+
+   if (kvm_is_error_hva(addr)) {
+   up_read(kvm-slots_lock);
+   return -EINVAL;
+   }
+
+   if (!have_mmu_notifiers) {
+   spin_lock(kvm-mmu_lock);
+   rmap_nuke(kvm, gfn);
+   spin_unlock(kvm-mmu_lock);
+   }
+   up_read(kvm-slots_lock);

So as to avoid rmap_nuke, since that will be done through the madvise()
path.

 Did you find out what's causing the errors in the first place (if zap is 
 not used)?  It worries me greatly.

Yes, the problem is that the rmap code does not handle the qemu process
mappings from vanishing while there is a present rmap. If that happens,
and there is a fault for a gfn whose qemu mapping has been removed, a
different physical zero page will be allocated:

rmap a - gfn 0 - physical host page 0
mapping for gfn 0 gets removed
guest faults in gfn 0 through the same pte chain
rmap a - gfn 0 - physical host page 1

When instantiating the shadow mapping for the second time, the
is_rmap_pte check succeeds, so we release the reference grabbed by
gfn_to_page() at mmu_set_spte(). We now have a shadow mapping pointing
to a physical page without having an additional reference on that page.

The following makes the host not crash under such a condition, but the
condition itself is invalid leading to inconsistent state on the guest.
So IMHO it shouldnt be allowed to happen in the first place.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f0cdfba..4c93b79 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1009,6 +1009,21 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t
+gva)
return page;
 }

+static int was_spte_rmapped(struct kvm *kvm, u64 *spte, struct page *page)
+{
+   int ret = 0;
+   unsigned long host_pfn = (*spte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
+
+   if (is_rmap_pte(*spte)) {
+   if (host_pfn != page_to_pfn(page))
+   rmap_remove(kvm, spte);
+   else
+   ret = 1;
+   }
+
+   return ret;
+}
+
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 unsigned pt_access, unsigned pte_access,
 int user_fault, int write_fault, int dirty,
@@ -1016,7 +1031,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64
+*shadow_pte,
 struct page *page)
 {
u64 spte;
-   int was_rmapped = is_rmap_pte(*shadow_pte);
+   int was_rmapped = was_spte_rmapped(vcpu-kvm, shadow_pte, page);
int was_writeble = is_writeble_pte(*shadow_pte);

/*






-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 0/3] QEMU balloon support

2008-03-12 Thread Marcelo Tosatti
This patchset resends Anthony's QEMU balloon support plus:

- Truncates the target size to ram size
- Enables madvise() conditioned on KVM_ZAP_GFN ioctl

-- 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel