Re: [kvm-devel] [patch] kvm: make cr3 loading more robust

2007-01-04 Thread Avi Kivity
Ingo Molnar wrote:
 another small detail is that currently KVM_SET_MEMORY_REGION appears to 
 be an add-only interface - it is not possible to 'unregister' RAM from a 
 VM.
   

Well, the _interface_ supports removing, the implementation does not :)

Everything was written in mind to allow memory hotplug.

 That keeps things easy for now, but if it's ever implemented then the 
 current cr3 of all vcpus of the VM needs to be validated against the 
 reduced memory slot map. (besides migrating all existing mappings from 
 the removed memory slot to other memory slots and redirecting all 
 in-flight DMA transactions, etc., etc. Which all needs heavy guest-OS 
 awareness as well.)
   

Actually I think it's quite easy:

- halt all vcpus
- wait for dma to complete
- mmu_free_roots()
- zap all page tables
- actually unplug memory
- mmu_alloc_roots()

The guest needs to cooperate, but it can do so using the native memory 
hotlpug mechanisms (whatever they are).  No guest modifications are needed.

-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch] kvm: make cr3 loading more robust

2007-01-04 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

 The guest needs to cooperate, but it can do so using the native memory 
 hotlpug mechanisms (whatever they are). [...]

as far a Linux guest goes, there's no such thing at the moment, at least 
in the mainline kernel. Most of the difficulties with RAM-unplug is on 
the guest OS side - i agree with you that doing it on the host side is 
easy. (because the host side does not really 'use' any of the guest's 
RAM.)

Ingo

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch] KVM: simplify mmu_alloc_roots()

2007-01-04 Thread Avi Kivity
Ingo Molnar wrote:
 Subject: [patch] KVM: simplify mmu_alloc_roots()
 From: Ingo Molnar [EMAIL PROTECTED]

 small optimization/cleanup:

 page == page_header(page-page_hpa)

   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Solaris 10 U2 installation failure

2007-01-04 Thread Avi Kivity
Parag Warudkar wrote:
 Avi Kivity [EMAIL PROTECTED] writes:


   
 32-bin kvm userspace can run a 64-bit guest, if you're using a 64-bit os 
 kernel, hence the 64-bit registers. Just ignore the 64-bit parts.

 

 Didn't understand. Allow me to clarify a bit -

 I am running a 32-bit Host OS (Linux i386) on a purely 32-bit CPU (Core Duo). 
 Solaris installation will first check if the processor is 64-bit capable and
 only then install a 64-bit kernel. In that case, when Solaris asks KVM if the
 CPU is 64-bit, is KVM lying to it even though the host CPU on which it is
 running is NOT 64-bit capable and then emulating all AMD64 instructions 
 without
 any help from the host CPU? That sounds more confusing!

 Even if it does something like that is there a way to tell KVM not to pretend
 like it is running a 64-bit CPU? I don't want to run Solaris in 64-bit mode 
 as I
 am running on 32-bit host with just 512Mb of total memory.
   

No, kvm doesn't pretend to be a 64-bit cpu when it isn't.

There are three cases wrt host bitness. Two are straightforward:

32-bit host: kvm pretends to be a 32-bit cpu whether the cpu supports 
long mode or not.
64-bit host: the cpu supports long mode, and we pass that on to the guest.

There is a third case: 64-bit host kernel but 32-bit qemu.  In that 
case, we also support 64-bit guests.

Because of that last case, 32-bit qemu is compiled with support for 
64-bit guests.  That means that even in a pure 32-bit environment, qemu 
has 64-bit registers (even though it can't make use of them).

If you're running a 32-bit environment, simply ignore r8-r15 and the 
high 32 bits of other registers.  That's what kvm.ko does :)

-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Compile error with openSuse 10.2

2007-01-04 Thread Peter Smith
When compiling KVM I get the following error:-

In file included from /home/peter/applications-home/kvm-9/qemu/usb-linux.c:29:
/usr/include/linux/usbdevice_fs.h:49: error: variable or field `__user' 
declared void
/usr/include/linux/usbdevice_fs.h:49: error: syntax error before '*' token

My environment is
openSuse 10.2
gcc 3.4.6
kernel 2.6.18.2 (32 bit)

I would be grateful for some pointers,
Peter

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] KVM: Prevent stale bits in cr0 and cr4

2007-01-04 Thread Avi Kivity
Hardware virtualization implementations allow the guests to freely change
some of the bits in cr0 and cr4, but trap when changing the other bits.  This
is useful to avoid excessive exits due to changing, for example, the ts flag.

It also means the kvm's copy of cr0 and cr4 may be stale with respect to these
bits.  most of the time this doesn't matter as these bits are not very
interesting.  Other times, however (for example when returning cr0 to
userspace), they are, so get the fresh contents of these bits from the guest
by means of a new arch operation.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/kvm_main.c
===
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -390,6 +390,7 @@ EXPORT_SYMBOL_GPL(set_cr0);
 
 void lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 {
+   kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu);
set_cr0(vcpu, (vcpu-cr0  ~0x0ful) | (msw  0x0f));
 }
 EXPORT_SYMBOL_GPL(lmsw);
@@ -917,9 +918,10 @@ int emulate_invlpg(struct kvm_vcpu *vcpu
 
 int emulate_clts(struct kvm_vcpu *vcpu)
 {
-   unsigned long cr0 = vcpu-cr0;
+   unsigned long cr0;
 
-   cr0 = ~CR0_TS_MASK;
+   kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu);
+   cr0 = vcpu-cr0  ~CR0_TS_MASK;
kvm_arch_ops-set_cr0(vcpu, cr0);
return X86EMUL_CONTINUE;
 }
@@ -1072,6 +1074,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu
 
 unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr)
 {
+   kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu);
switch (cr) {
case 0:
return vcpu-cr0;
@@ -1406,6 +1409,7 @@ static int kvm_dev_ioctl_get_sregs(struc
sregs-gdt.limit = dt.limit;
sregs-gdt.base = dt.base;
 
+   kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu);
sregs-cr0 = vcpu-cr0;
sregs-cr2 = vcpu-cr2;
sregs-cr3 = vcpu-cr3;
@@ -1470,6 +1474,8 @@ static int kvm_dev_ioctl_set_sregs(struc
 #endif
vcpu-apic_base = sregs-apic_base;
 
+   kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu);
+
mmu_reset_needed |= vcpu-cr0 != sregs-cr0;
kvm_arch_ops-set_cr0_no_modeswitch(vcpu, sregs-cr0);
 
Index: linux-2.6/drivers/kvm/kvm.h
===
--- linux-2.6.orig/drivers/kvm/kvm.h
+++ linux-2.6/drivers/kvm/kvm.h
@@ -283,6 +283,7 @@ struct kvm_arch_ops {
void (*set_segment)(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
+   void (*decache_cr0_cr4_guest_bits)(struct kvm_vcpu *vcpu);
void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
void (*set_cr0_no_modeswitch)(struct kvm_vcpu *vcpu,
  unsigned long cr0);
Index: linux-2.6/drivers/kvm/svm.c
===
--- linux-2.6.orig/drivers/kvm/svm.c
+++ linux-2.6/drivers/kvm/svm.c
@@ -702,6 +702,10 @@ static void svm_set_gdt(struct kvm_vcpu 
vcpu-svm-vmcb-save.gdtr.base = dt-base ;
 }
 
+static void svm_decache_cr0_cr4_guest_bits(struct kvm_vcpu *vcpu)
+{
+}
+
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
 #ifdef CONFIG_X86_64
@@ -1645,6 +1649,7 @@ static struct kvm_arch_ops svm_arch_ops 
.get_segment = svm_get_segment,
.set_segment = svm_set_segment,
.get_cs_db_l_bits = svm_get_cs_db_l_bits,
+   .decache_cr0_cr4_guest_bits = svm_decache_cr0_cr4_guest_bits,
.set_cr0 = svm_set_cr0,
.set_cr0_no_modeswitch = svm_set_cr0,
.set_cr3 = svm_set_cr3,
Index: linux-2.6/drivers/kvm/vmx.c
===
--- linux-2.6.orig/drivers/kvm/vmx.c
+++ linux-2.6/drivers/kvm/vmx.c
@@ -737,6 +737,15 @@ static void exit_lmode(struct kvm_vcpu *
 
 #endif
 
+static void vmx_decache_cr0_cr4_guest_bits(struct kvm_vcpu *vcpu)
+{
+   vcpu-cr0 = KVM_GUEST_CR0_MASK;
+   vcpu-cr0 |= vmcs_readl(GUEST_CR0)  ~KVM_GUEST_CR0_MASK;
+
+   vcpu-cr4 = KVM_GUEST_CR4_MASK;
+   vcpu-cr4 |= vmcs_readl(GUEST_CR4)  ~KVM_GUEST_CR4_MASK;
+}
+
 static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
if (vcpu-rmode.active  (cr0  CR0_PE_MASK))
@@ -2002,6 +2011,7 @@ static struct kvm_arch_ops vmx_arch_ops 
.get_segment = vmx_get_segment,
.set_segment = vmx_set_segment,
.get_cs_db_l_bits = vmx_get_cs_db_l_bits,
+   .decache_cr0_cr4_guest_bits = vmx_decache_cr0_cr4_guest_bits,
.set_cr0 = vmx_set_cr0,
.set_cr0_no_modeswitch = vmx_set_cr0_no_modeswitch,
.set_cr3 = vmx_set_cr3,

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics 

[kvm-devel] [PATCH 0/33] KVM: MMU: Cache shadow page tables

2007-01-04 Thread Avi Kivity
The current kvm shadow page table implementation does not cache shadow 
page tables (except for global translations, used for kernel addresses) 
across context switches.  This means that after a context switch, every 
memory access will trap into the host.  After a while, the shadow page 
tables will be rebuild, and the guest can proceed at native speed until 
the next context switch.

The natural solution, then, is to cache shadow page tables across 
context switches.  Unfortunately, this introduces a bucketload of problems:

- the guest does not notify the processor (and hence kvm) that it 
modifies a page table entry if it has reason to believe that the 
modification will be followed by a tlb flush.  It becomes necessary to 
write-protect guest page tables so that we can use the page fault when 
the access occurs as a notification.
- write protecting the guest page tables means we need to keep track of 
which ptes map those guest page table. We need to add reverse mapping 
for all mapped writable guest pages.
- when the guest does access the write-protected page, we need to allow 
it to perform the write in some way.  We do that either by emulating the 
write, or removing all shadow page tables for that page and allowing the 
write to proceed, depending on circumstances.

This patchset implements the ideas above.  While a lot of tuning remains 
to be done (for example, a sane page replacement algorithm), a guest 
running with this patchset applied is much faster and more responsive 
than with 2.6.20-rc3.  Some preliminary benchmarks are available in 
http://article.gmane.org/gmane.comp.emulators.kvm.devel/661.

The patchset is bisectable compile-wise.

-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 3/33] KVM: MMU: Load the pae pdptrs on cr3 change like the processor does

2007-01-04 Thread Avi Kivity
In pae mode, a load of cr3 loads the four third-level page table entries
in addition to cr3 itself.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/kvm_main.c
===
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -298,14 +298,17 @@ static void inject_gp(struct kvm_vcpu *v
kvm_arch_ops-inject_gp(vcpu, 0);
 }
 
-static int pdptrs_have_reserved_bits_set(struct kvm_vcpu *vcpu,
-unsigned long cr3)
+/*
+ * Load the pae pdptrs.  Return true is they are all valid.
+ */
+static int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
gfn_t pdpt_gfn = cr3  PAGE_SHIFT;
-   unsigned offset = (cr3  (PAGE_SIZE-1))  5;
+   unsigned offset = ((cr3  (PAGE_SIZE-1))  5)  2;
int i;
u64 pdpte;
u64 *pdpt;
+   int ret;
struct kvm_memory_slot *memslot;
 
spin_lock(vcpu-kvm-lock);
@@ -313,16 +316,23 @@ static int pdptrs_have_reserved_bits_set
/* FIXME: !memslot - emulate? 0xff? */
pdpt = kmap_atomic(gfn_to_page(memslot, pdpt_gfn), KM_USER0);
 
+   ret = 1;
for (i = 0; i  4; ++i) {
pdpte = pdpt[offset + i];
-   if ((pdpte  1)  (pdpte  0xfff001e6ull))
-   break;
+   if ((pdpte  1)  (pdpte  0xfff001e6ull)) {
+   ret = 0;
+   goto out;
+   }
}
 
+   for (i = 0; i  4; ++i)
+   vcpu-pdptrs[i] = pdpt[offset + i];
+
+out:
kunmap_atomic(pdpt, KM_USER0);
spin_unlock(vcpu-kvm-lock);
 
-   return i != 4;
+   return ret;
 }
 
 void set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
@@ -368,8 +378,7 @@ void set_cr0(struct kvm_vcpu *vcpu, unsi
}
} else
 #endif
-   if (is_pae(vcpu) 
-   pdptrs_have_reserved_bits_set(vcpu, vcpu-cr3)) {
+   if (is_pae(vcpu)  !load_pdptrs(vcpu, vcpu-cr3)) {
printk(KERN_DEBUG set_cr0: #GP, pdptrs 
   reserved bits\n);
inject_gp(vcpu);
@@ -411,7 +420,7 @@ void set_cr4(struct kvm_vcpu *vcpu, unsi
return;
}
} else if (is_paging(vcpu)  !is_pae(vcpu)  (cr4  CR4_PAE_MASK)
-   pdptrs_have_reserved_bits_set(vcpu, vcpu-cr3)) {
+   !load_pdptrs(vcpu, vcpu-cr3)) {
printk(KERN_DEBUG set_cr4: #GP, pdptrs reserved bits\n);
inject_gp(vcpu);
}
@@ -443,7 +452,7 @@ void set_cr3(struct kvm_vcpu *vcpu, unsi
return;
}
if (is_paging(vcpu)  is_pae(vcpu) 
-   pdptrs_have_reserved_bits_set(vcpu, cr3)) {
+   !load_pdptrs(vcpu, cr3)) {
printk(KERN_DEBUG set_cr3: #GP, pdptrs 
   reserved bits\n);
inject_gp(vcpu);
Index: linux-2.6/drivers/kvm/kvm.h
===
--- linux-2.6.orig/drivers/kvm/kvm.h
+++ linux-2.6/drivers/kvm/kvm.h
@@ -185,6 +185,7 @@ struct kvm_vcpu {
unsigned long cr3;
unsigned long cr4;
unsigned long cr8;
+   u64 pdptrs[4]; /* pae */
u64 shadow_efer;
u64 apic_base;
int nmsrs;

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 4/33] KVM: MMU: Fold fetch_guest() into init_walker()

2007-01-04 Thread Avi Kivity
It is never necessary to fetch a guest entry from an intermediate page table
level (except for large pages), so avoid some confusion by always descending
into the lowest possible level.

Rename init_walker() to walk_addr() as it is no longer restricted to
initialization.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/paging_tmpl.h
===
--- linux-2.6.orig/drivers/kvm/paging_tmpl.h
+++ linux-2.6/drivers/kvm/paging_tmpl.h
@@ -54,14 +54,19 @@ struct guest_walker {
int level;
gfn_t table_gfn;
pt_element_t *table;
+   pt_element_t *ptep;
pt_element_t inherited_ar;
 };
 
-static void FNAME(init_walker)(struct guest_walker *walker,
-  struct kvm_vcpu *vcpu)
+/*
+ * Fetch a guest pte for a guest virtual address
+ */
+static void FNAME(walk_addr)(struct guest_walker *walker,
+struct kvm_vcpu *vcpu, gva_t addr)
 {
hpa_t hpa;
struct kvm_memory_slot *slot;
+   pt_element_t *ptep;
 
walker-level = vcpu-mmu.root_level;
walker-table_gfn = (vcpu-cr3  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
@@ -75,6 +80,38 @@ static void FNAME(init_walker)(struct gu
walker-table = (pt_element_t *)( (unsigned long)walker-table |
(unsigned long)(vcpu-cr3  ~(PAGE_MASK | CR3_FLAGS_MASK)) );
walker-inherited_ar = PT_USER_MASK | PT_WRITABLE_MASK;
+
+   for (;;) {
+   int index = PT_INDEX(addr, walker-level);
+   hpa_t paddr;
+
+   ptep = walker-table[index];
+   ASSERT(((unsigned long)walker-table  PAGE_MASK) ==
+  ((unsigned long)ptep  PAGE_MASK));
+
+   /* Don't set accessed bit on PAE PDPTRs */
+   if (vcpu-mmu.root_level != 3 || walker-level != 3)
+   if ((*ptep  (PT_PRESENT_MASK | PT_ACCESSED_MASK))
+   == PT_PRESENT_MASK)
+   *ptep |= PT_ACCESSED_MASK;
+
+   if (!is_present_pte(*ptep) ||
+   walker-level == PT_PAGE_TABLE_LEVEL ||
+   (walker-level == PT_DIRECTORY_LEVEL 
+(*ptep  PT_PAGE_SIZE_MASK) 
+(PTTYPE == 64 || is_pse(vcpu
+   break;
+
+   if (walker-level != 3 || is_long_mode(vcpu))
+   walker-inherited_ar = walker-table[index];
+   walker-table_gfn = (*ptep  PT_BASE_ADDR_MASK)  PAGE_SHIFT;
+   paddr = safe_gpa_to_hpa(vcpu, *ptep  PT_BASE_ADDR_MASK);
+   kunmap_atomic(walker-table, KM_USER0);
+   walker-table = kmap_atomic(pfn_to_page(paddr  PAGE_SHIFT),
+   KM_USER0);
+   --walker-level;
+   }
+   walker-ptep = ptep;
 }
 
 static void FNAME(release_walker)(struct guest_walker *walker)
@@ -110,41 +147,6 @@ static void FNAME(set_pde)(struct kvm_vc
 }
 
 /*
- * Fetch a guest pte from a specific level in the paging hierarchy.
- */
-static pt_element_t *FNAME(fetch_guest)(struct kvm_vcpu *vcpu,
-   struct guest_walker *walker,
-   int level,
-   gva_t addr)
-{
-
-   ASSERT(level  0   level = walker-level);
-
-   for (;;) {
-   int index = PT_INDEX(addr, walker-level);
-   hpa_t paddr;
-
-   ASSERT(((unsigned long)walker-table  PAGE_MASK) ==
-  ((unsigned long)walker-table[index]  PAGE_MASK));
-   if (level == walker-level ||
-   !is_present_pte(walker-table[index]) ||
-   (walker-level == PT_DIRECTORY_LEVEL 
-(walker-table[index]  PT_PAGE_SIZE_MASK) 
-(PTTYPE == 64 || is_pse(vcpu
-   return walker-table[index];
-   if (walker-level != 3 || is_long_mode(vcpu))
-   walker-inherited_ar = walker-table[index];
-   walker-table_gfn = (walker-table[index]  PT_BASE_ADDR_MASK)
-PAGE_SHIFT;
-   paddr = safe_gpa_to_hpa(vcpu, walker-table[index]  
PT_BASE_ADDR_MASK);
-   kunmap_atomic(walker-table, KM_USER0);
-   walker-table = kmap_atomic(pfn_to_page(paddr  PAGE_SHIFT),
-   KM_USER0);
-   --walker-level;
-   }
-}
-
-/*
  * Fetch a shadow pte for a specific level in the paging hierarchy.
  */
 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
@@ -153,6 +155,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
hpa_t shadow_addr;
int level;
u64 *prev_shadow_ent = NULL;
+   pt_element_t *guest_ent = walker-ptep;
+
+   if (!is_present_pte(*guest_ent))
+   return NULL;
 
shadow_addr = vcpu-mmu.root_hpa;
level = 

[kvm-devel] [PATCH 5/33] KVM: MU: Special treatment for shadow pae root pages

2007-01-04 Thread Avi Kivity
Since we're not going to cache the pae-mode shadow root pages, allocate
a single pae shadow that will hold the four lower-level pages, which
will act as roots.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -420,19 +420,63 @@ static int nonpaging_map(struct kvm_vcpu
}
 }
 
+static void mmu_free_roots(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+#ifdef CONFIG_X86_64
+   if (vcpu-mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   hpa_t root = vcpu-mmu.root_hpa;
+
+   ASSERT(VALID_PAGE(root));
+   release_pt_page_64(vcpu, root, PT64_ROOT_LEVEL);
+   vcpu-mmu.root_hpa = INVALID_PAGE;
+   return;
+   }
+#endif
+   for (i = 0; i  4; ++i) {
+   hpa_t root = vcpu-mmu.pae_root[i];
+
+   ASSERT(VALID_PAGE(root));
+   root = PT64_BASE_ADDR_MASK;
+   release_pt_page_64(vcpu, root, PT32E_ROOT_LEVEL - 1);
+   vcpu-mmu.pae_root[i] = INVALID_PAGE;
+   }
+   vcpu-mmu.root_hpa = INVALID_PAGE;
+}
+
+static void mmu_alloc_roots(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+#ifdef CONFIG_X86_64
+   if (vcpu-mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   hpa_t root = vcpu-mmu.root_hpa;
+
+   ASSERT(!VALID_PAGE(root));
+   root = kvm_mmu_alloc_page(vcpu, NULL);
+   vcpu-mmu.root_hpa = root;
+   return;
+   }
+#endif
+   for (i = 0; i  4; ++i) {
+   hpa_t root = vcpu-mmu.pae_root[i];
+
+   ASSERT(!VALID_PAGE(root));
+   root = kvm_mmu_alloc_page(vcpu, NULL);
+   vcpu-mmu.pae_root[i] = root | PT_PRESENT_MASK;
+   }
+   vcpu-mmu.root_hpa = __pa(vcpu-mmu.pae_root);
+}
+
 static void nonpaging_flush(struct kvm_vcpu *vcpu)
 {
hpa_t root = vcpu-mmu.root_hpa;
 
++kvm_stat.tlb_flush;
pgprintk(nonpaging_flush\n);
-   ASSERT(VALID_PAGE(root));
-   release_pt_page_64(vcpu, root, vcpu-mmu.shadow_root_level);
-   root = kvm_mmu_alloc_page(vcpu, NULL);
-   ASSERT(VALID_PAGE(root));
-   vcpu-mmu.root_hpa = root;
-   if (is_paging(vcpu))
-   root |= (vcpu-cr3  (CR3_PCD_MASK | CR3_WPT_MASK));
+   mmu_free_roots(vcpu);
+   mmu_alloc_roots(vcpu);
kvm_arch_ops-set_cr3(vcpu, root);
kvm_arch_ops-tlb_flush(vcpu);
 }
@@ -475,13 +519,7 @@ static void nonpaging_inval_page(struct 
 
 static void nonpaging_free(struct kvm_vcpu *vcpu)
 {
-   hpa_t root;
-
-   ASSERT(vcpu);
-   root = vcpu-mmu.root_hpa;
-   if (VALID_PAGE(root))
-   release_pt_page_64(vcpu, root, vcpu-mmu.shadow_root_level);
-   vcpu-mmu.root_hpa = INVALID_PAGE;
+   mmu_free_roots(vcpu);
 }
 
 static int nonpaging_init_context(struct kvm_vcpu *vcpu)
@@ -495,7 +533,7 @@ static int nonpaging_init_context(struct
context-free = nonpaging_free;
context-root_level = PT32E_ROOT_LEVEL;
context-shadow_root_level = PT32E_ROOT_LEVEL;
-   context-root_hpa = kvm_mmu_alloc_page(vcpu, NULL);
+   mmu_alloc_roots(vcpu);
ASSERT(VALID_PAGE(context-root_hpa));
kvm_arch_ops-set_cr3(vcpu, context-root_hpa);
return 0;
@@ -647,7 +685,7 @@ static void paging_free(struct kvm_vcpu 
 #include paging_tmpl.h
 #undef PTTYPE
 
-static int paging64_init_context(struct kvm_vcpu *vcpu)
+static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
 {
struct kvm_mmu *context = vcpu-mmu;
 
@@ -657,15 +695,20 @@ static int paging64_init_context(struct 
context-inval_page = paging_inval_page;
context-gva_to_gpa = paging64_gva_to_gpa;
context-free = paging_free;
-   context-root_level = PT64_ROOT_LEVEL;
-   context-shadow_root_level = PT64_ROOT_LEVEL;
-   context-root_hpa = kvm_mmu_alloc_page(vcpu, NULL);
+   context-root_level = level;
+   context-shadow_root_level = level;
+   mmu_alloc_roots(vcpu);
ASSERT(VALID_PAGE(context-root_hpa));
kvm_arch_ops-set_cr3(vcpu, context-root_hpa |
(vcpu-cr3  (CR3_PCD_MASK | CR3_WPT_MASK)));
return 0;
 }
 
+static int paging64_init_context(struct kvm_vcpu *vcpu)
+{
+   return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
+}
+
 static int paging32_init_context(struct kvm_vcpu *vcpu)
 {
struct kvm_mmu *context = vcpu-mmu;
@@ -677,7 +720,7 @@ static int paging32_init_context(struct 
context-free = paging_free;
context-root_level = PT32_ROOT_LEVEL;
context-shadow_root_level = PT32E_ROOT_LEVEL;
-   context-root_hpa = kvm_mmu_alloc_page(vcpu, NULL);
+   mmu_alloc_roots(vcpu);
ASSERT(VALID_PAGE(context-root_hpa));
kvm_arch_ops-set_cr3(vcpu, context-root_hpa |
(vcpu-cr3  (CR3_PCD_MASK | 

[kvm-devel] [PATCH 14/33] KVM: MMU: If emulating an instruction fails, try unprotecting the page

2007-01-04 Thread Avi Kivity
A page table may have been recycled into a regular page, and so any
instruction can be executed on it.  Unprotect the page and let the cpu
do its thing.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -478,11 +478,62 @@ static struct kvm_mmu_page *kvm_mmu_get_
return page;
 }
 
+static void kvm_mmu_page_unlink_children(struct kvm_vcpu *vcpu,
+struct kvm_mmu_page *page)
+{
+   BUG();
+}
+
 static void kvm_mmu_put_page(struct kvm_vcpu *vcpu,
 struct kvm_mmu_page *page,
 u64 *parent_pte)
 {
mmu_page_remove_parent_pte(page, parent_pte);
+   if (page-role.level  PT_PAGE_TABLE_LEVEL)
+   kvm_mmu_page_unlink_children(vcpu, page);
+   hlist_del(page-hash_link);
+   list_del(page-link);
+   list_add(page-link, vcpu-free_pages);
+}
+
+static void kvm_mmu_zap_page(struct kvm_vcpu *vcpu,
+struct kvm_mmu_page *page)
+{
+   u64 *parent_pte;
+
+   while (page-multimapped || page-parent_pte) {
+   if (!page-multimapped)
+   parent_pte = page-parent_pte;
+   else {
+   struct kvm_pte_chain *chain;
+
+   chain = container_of(page-parent_ptes.first,
+struct kvm_pte_chain, link);
+   parent_pte = chain-parent_ptes[0];
+   }
+   kvm_mmu_put_page(vcpu, page, parent_pte);
+   *parent_pte = 0;
+   }
+}
+
+static int kvm_mmu_unprotect_page(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   unsigned index;
+   struct hlist_head *bucket;
+   struct kvm_mmu_page *page;
+   struct hlist_node *node, *n;
+   int r;
+
+   pgprintk(%s: looking for gfn %lx\n, __FUNCTION__, gfn);
+   r = 0;
+   index = kvm_page_table_hashfn(gfn) % KVM_NUM_MMU_PAGES;
+   bucket = vcpu-kvm-mmu_page_hash[index];
+   hlist_for_each_entry_safe(page, node, n, bucket, hash_link)
+   if (page-gfn == gfn  !page-role.metaphysical) {
+   kvm_mmu_zap_page(vcpu, page);
+   r = 1;
+   }
+   return r;
 }
 
 static void page_header_update_slot(struct kvm *kvm, void *pte, gpa_t gpa)
@@ -1001,6 +1052,13 @@ void kvm_mmu_post_write(struct kvm_vcpu 
 {
 }
 
+int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
+{
+   gpa_t gpa = vcpu-mmu.gva_to_gpa(vcpu, gva);
+
+   return kvm_mmu_unprotect_page(vcpu, gpa  PAGE_SHIFT);
+}
+
 static void free_mmu_pages(struct kvm_vcpu *vcpu)
 {
while (!list_empty(vcpu-free_pages)) {
Index: linux-2.6/drivers/kvm/kvm_main.c
===
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -1063,6 +1063,8 @@ int emulate_instruction(struct kvm_vcpu 
}
 
if (r) {
+   if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
+   return EMULATE_DONE;
if (!vcpu-mmio_needed) {
report_emulation_failure(emulate_ctxt);
return EMULATE_FAIL;
Index: linux-2.6/drivers/kvm/kvm.h
===
--- linux-2.6.orig/drivers/kvm/kvm.h
+++ linux-2.6/drivers/kvm/kvm.h
@@ -450,6 +450,7 @@ unsigned long segment_base(u16 selector)
 
 void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes);
 void kvm_mmu_post_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes);
+int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 
 static inline struct page *_gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 20/33] KVM: MMU: Handle misaligned accesses to write protected guest page tables

2007-01-04 Thread Avi Kivity
A misaligned access affects two shadow ptes instead of just one.

Since a misaligned access is unlikely to occur on a real page table, just
zap the page out of existence, avoiding further trouble.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -954,21 +954,36 @@ void kvm_mmu_pre_write(struct kvm_vcpu *
gfn_t gfn = gpa  PAGE_SHIFT;
struct kvm_mmu_page *page;
struct kvm_mmu_page *child;
-   struct hlist_node *node;
+   struct hlist_node *node, *n;
struct hlist_head *bucket;
unsigned index;
u64 *spte;
u64 pte;
unsigned offset = offset_in_page(gpa);
+   unsigned pte_size;
unsigned page_offset;
+   unsigned misaligned;
int level;
 
pgprintk(%s: gpa %llx bytes %d\n, __FUNCTION__, gpa, bytes);
index = kvm_page_table_hashfn(gfn) % KVM_NUM_MMU_PAGES;
bucket = vcpu-kvm-mmu_page_hash[index];
-   hlist_for_each_entry(page, node, bucket, hash_link) {
+   hlist_for_each_entry_safe(page, node, n, bucket, hash_link) {
if (page-gfn != gfn || page-role.metaphysical)
continue;
+   pte_size = page-role.glevels == PT32_ROOT_LEVEL ? 4 : 8;
+   misaligned = (offset ^ (offset + bytes - 1))  ~(pte_size - 1);
+   if (misaligned) {
+   /*
+* Misaligned accesses are too much trouble to fix
+* up; also, they usually indicate a page is not used
+* as a page table.
+*/
+   pgprintk(misaligned: gpa %llx bytes %d role %x\n,
+gpa, bytes, page-role.word);
+   kvm_mmu_zap_page(vcpu, page);
+   continue;
+   }
page_offset = offset;
level = page-role.level;
if (page-role.glevels == PT32_ROOT_LEVEL) {

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 21/33] KVM: MMU: ove is_empty_shadow_page() above kvm_mmu_free_page()

2007-01-04 Thread Avi Kivity
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -303,16 +303,6 @@ static void rmap_write_protect(struct kv
}
 }
 
-static void kvm_mmu_free_page(struct kvm_vcpu *vcpu, hpa_t page_hpa)
-{
-   struct kvm_mmu_page *page_head = page_header(page_hpa);
-
-   list_del(page_head-link);
-   page_head-page_hpa = page_hpa;
-   list_add(page_head-link, vcpu-free_pages);
-   ++vcpu-kvm-n_free_mmu_pages;
-}
-
 static int is_empty_shadow_page(hpa_t page_hpa)
 {
u32 *pos;
@@ -324,6 +314,16 @@ static int is_empty_shadow_page(hpa_t pa
return 1;
 }
 
+static void kvm_mmu_free_page(struct kvm_vcpu *vcpu, hpa_t page_hpa)
+{
+   struct kvm_mmu_page *page_head = page_header(page_hpa);
+
+   list_del(page_head-link);
+   page_head-page_hpa = page_hpa;
+   list_add(page_head-link, vcpu-free_pages);
+   ++vcpu-kvm-n_free_mmu_pages;
+}
+
 static unsigned kvm_page_table_hashfn(gfn_t gfn)
 {
return gfn;

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 26/33] KVM: MMU: Fix cmpxchg8b emulation

2007-01-04 Thread Avi Kivity
cmpxchg8b uses edx:eax as the compare operand, not edi:eax.

cmpxchg8b is used by 32-bit pae guests to set page table entries atomically,
and this is emulated touching shadowed guest page tables.

Also, implement it for 32-bit hosts.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/x86_emulate.c
===
--- linux-2.6.orig/drivers/kvm/x86_emulate.c
+++ linux-2.6/drivers/kvm/x86_emulate.c
@@ -1323,7 +1323,7 @@ twobyte_special_insn:
 ctxt)) != 0))
goto done;
if ((old_lo != _regs[VCPU_REGS_RAX])
-   || (old_hi != _regs[VCPU_REGS_RDI])) {
+   || (old_hi != _regs[VCPU_REGS_RDX])) {
_regs[VCPU_REGS_RAX] = old_lo;
_regs[VCPU_REGS_RDX] = old_hi;
_eflags = ~EFLG_ZF;
Index: linux-2.6/drivers/kvm/kvm_main.c
===
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -936,6 +936,30 @@ static int emulator_cmpxchg_emulated(uns
return emulator_write_emulated(addr, new, bytes, ctxt);
 }
 
+#ifdef CONFIG_X86_32
+
+static int emulator_cmpxchg8b_emulated(unsigned long addr,
+  unsigned long old_lo,
+  unsigned long old_hi,
+  unsigned long new_lo,
+  unsigned long new_hi,
+  struct x86_emulate_ctxt *ctxt)
+{
+   static int reported;
+   int r;
+
+   if (!reported) {
+   reported = 1;
+   printk(KERN_WARNING kvm: emulating exchange8b as write\n);
+   }
+   r = emulator_write_emulated(addr, new_lo, 4, ctxt);
+   if (r != X86EMUL_CONTINUE)
+   return r;
+   return emulator_write_emulated(addr+4, new_hi, 4, ctxt);
+}
+
+#endif
+
 static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg)
 {
return kvm_arch_ops-get_segment_base(vcpu, seg);
@@ -1010,6 +1034,9 @@ struct x86_emulate_ops emulate_ops = {
.read_emulated   = emulator_read_emulated,
.write_emulated  = emulator_write_emulated,
.cmpxchg_emulated= emulator_cmpxchg_emulated,
+#ifdef CONFIG_X86_32
+   .cmpxchg8b_emulated  = emulator_cmpxchg8b_emulated,
+#endif
 };
 
 int emulate_instruction(struct kvm_vcpu *vcpu,

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 28/33] KVM: MMU: Free pages on kvm destruction

2007-01-04 Thread Avi Kivity
Because mmu pages have attached rmap and parent pte chain structures, we need
to zap them before freeing so the attached structures are freed.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -1065,9 +1065,14 @@ EXPORT_SYMBOL_GPL(kvm_mmu_free_some_page
 
 static void free_mmu_pages(struct kvm_vcpu *vcpu)
 {
-   while (!list_empty(vcpu-free_pages)) {
-   struct kvm_mmu_page *page;
+   struct kvm_mmu_page *page;
 
+   while (!list_empty(vcpu-kvm-active_mmu_pages)) {
+   page = container_of(vcpu-kvm-active_mmu_pages.next,
+   struct kvm_mmu_page, link);
+   kvm_mmu_zap_page(vcpu, page);
+   }
+   while (!list_empty(vcpu-free_pages)) {
page = list_entry(vcpu-free_pages.next,
  struct kvm_mmu_page, link);
list_del(page-link);

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 29/33] KVM: MMU: Replace atomic allocations by preallocated objects

2007-01-04 Thread Avi Kivity
The mmu sometimes needs memory for reverse mapping and parent pte chains.
however, we can't allocate from within the mmu because of the atomic context.

So, move the allocations to a central place that can be executed before
the main mmu machinery, where we can bail out on failure before any damage is
done.

(error handling is deffered for now, but the basic structure is there)

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -166,6 +166,84 @@ static int is_rmap_pte(u64 pte)
== (PT_WRITABLE_MASK | PT_PRESENT_MASK);
 }
 
+static void mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+  size_t objsize, int min)
+{
+   void *obj;
+
+   if (cache-nobjs = min)
+   return;
+   while (cache-nobjs  ARRAY_SIZE(cache-objects)) {
+   obj = kzalloc(objsize, GFP_NOWAIT);
+   if (!obj)
+   BUG();
+   cache-objects[cache-nobjs++] = obj;
+   }
+}
+
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+   while (mc-nobjs)
+   kfree(mc-objects[--mc-nobjs]);
+}
+
+static void mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
+{
+   mmu_topup_memory_cache(vcpu-mmu_pte_chain_cache,
+  sizeof(struct kvm_pte_chain), 4);
+   mmu_topup_memory_cache(vcpu-mmu_rmap_desc_cache,
+  sizeof(struct kvm_rmap_desc), 1);
+}
+
+static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
+{
+   mmu_free_memory_cache(vcpu-mmu_pte_chain_cache);
+   mmu_free_memory_cache(vcpu-mmu_rmap_desc_cache);
+}
+
+static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
+   size_t size)
+{
+   void *p;
+
+   BUG_ON(!mc-nobjs);
+   p = mc-objects[--mc-nobjs];
+   memset(p, 0, size);
+   return p;
+}
+
+static void mmu_memory_cache_free(struct kvm_mmu_memory_cache *mc, void *obj)
+{
+   if (mc-nobjs  KVM_NR_MEM_OBJS)
+   mc-objects[mc-nobjs++] = obj;
+   else
+   kfree(obj);
+}
+
+static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu)
+{
+   return mmu_memory_cache_alloc(vcpu-mmu_pte_chain_cache,
+ sizeof(struct kvm_pte_chain));
+}
+
+static void mmu_free_pte_chain(struct kvm_vcpu *vcpu,
+  struct kvm_pte_chain *pc)
+{
+   mmu_memory_cache_free(vcpu-mmu_pte_chain_cache, pc);
+}
+
+static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
+{
+   return mmu_memory_cache_alloc(vcpu-mmu_rmap_desc_cache,
+ sizeof(struct kvm_rmap_desc));
+}
+
+static void mmu_free_rmap_desc(struct kvm_vcpu *vcpu,
+  struct kvm_rmap_desc *rd)
+{
+   mmu_memory_cache_free(vcpu-mmu_rmap_desc_cache, rd);
+}
+
 /*
  * Reverse mapping data structures:
  *
@@ -175,7 +253,7 @@ static int is_rmap_pte(u64 pte)
  * If page-private bit zero is one, (then page-private  ~1) points
  * to a struct kvm_rmap_desc containing more mappings.
  */
-static void rmap_add(struct kvm *kvm, u64 *spte)
+static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte)
 {
struct page *page;
struct kvm_rmap_desc *desc;
@@ -189,9 +267,7 @@ static void rmap_add(struct kvm *kvm, u6
page-private = (unsigned long)spte;
} else if (!(page-private  1)) {
rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte);
-   desc = kzalloc(sizeof *desc, GFP_NOWAIT);
-   if (!desc)
-   BUG(); /* FIXME: return error */
+   desc = mmu_alloc_rmap_desc(vcpu);
desc-shadow_ptes[0] = (u64 *)page-private;
desc-shadow_ptes[1] = spte;
page-private = (unsigned long)desc | 1;
@@ -201,9 +277,7 @@ static void rmap_add(struct kvm *kvm, u6
while (desc-shadow_ptes[RMAP_EXT-1]  desc-more)
desc = desc-more;
if (desc-shadow_ptes[RMAP_EXT-1]) {
-   desc-more = kzalloc(sizeof *desc-more, GFP_NOWAIT);
-   if (!desc-more)
-   BUG(); /* FIXME: return error */
+   desc-more = mmu_alloc_rmap_desc(vcpu);
desc = desc-more;
}
for (i = 0; desc-shadow_ptes[i]; ++i)
@@ -212,7 +286,8 @@ static void rmap_add(struct kvm *kvm, u6
}
 }
 
-static void rmap_desc_remove_entry(struct page *page,
+static void rmap_desc_remove_entry(struct kvm_vcpu *vcpu,
+  struct page *page,
   struct kvm_rmap_desc *desc,
   int i,
   struct 

[kvm-devel] [PATCH 30/33] KVM: MMU: Detect oom conditions and propagate error to userspace

2007-01-04 Thread Avi Kivity
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -166,19 +166,20 @@ static int is_rmap_pte(u64 pte)
== (PT_WRITABLE_MASK | PT_PRESENT_MASK);
 }
 
-static void mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
-  size_t objsize, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+ size_t objsize, int min)
 {
void *obj;
 
if (cache-nobjs = min)
-   return;
+   return 0;
while (cache-nobjs  ARRAY_SIZE(cache-objects)) {
obj = kzalloc(objsize, GFP_NOWAIT);
if (!obj)
-   BUG();
+   return -ENOMEM;
cache-objects[cache-nobjs++] = obj;
}
+   return 0;
 }
 
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
@@ -187,12 +188,18 @@ static void mmu_free_memory_cache(struct
kfree(mc-objects[--mc-nobjs]);
 }
 
-static void mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
+static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_topup_memory_cache(vcpu-mmu_pte_chain_cache,
-  sizeof(struct kvm_pte_chain), 4);
-   mmu_topup_memory_cache(vcpu-mmu_rmap_desc_cache,
-  sizeof(struct kvm_rmap_desc), 1);
+   int r;
+
+   r = mmu_topup_memory_cache(vcpu-mmu_pte_chain_cache,
+  sizeof(struct kvm_pte_chain), 4);
+   if (r)
+   goto out;
+   r = mmu_topup_memory_cache(vcpu-mmu_rmap_desc_cache,
+  sizeof(struct kvm_rmap_desc), 1);
+out:
+   return r;
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
@@ -824,8 +831,11 @@ static int nonpaging_page_fault(struct k
 {
gpa_t addr = gva;
hpa_t paddr;
+   int r;
 
-   mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
 
ASSERT(vcpu);
ASSERT(VALID_PAGE(vcpu-mmu.root_hpa));
@@ -1052,7 +1062,7 @@ int kvm_mmu_reset_context(struct kvm_vcp
r = init_kvm_mmu(vcpu);
if (r  0)
goto out;
-   mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu);
 out:
return r;
 }
Index: linux-2.6/drivers/kvm/svm.c
===
--- linux-2.6.orig/drivers/kvm/svm.c
+++ linux-2.6/drivers/kvm/svm.c
@@ -852,6 +852,7 @@ static int pf_interception(struct kvm_vc
u64 fault_address;
u32 error_code;
enum emulation_result er;
+   int r;
 
if (is_external_interrupt(exit_int_info))
push_irq(vcpu, exit_int_info  SVM_EVTINJ_VEC_MASK);
@@ -860,7 +861,12 @@ static int pf_interception(struct kvm_vc
 
fault_address  = vcpu-svm-vmcb-control.exit_info_2;
error_code = vcpu-svm-vmcb-control.exit_info_1;
-   if (!kvm_mmu_page_fault(vcpu, fault_address, error_code)) {
+   r = kvm_mmu_page_fault(vcpu, fault_address, error_code);
+   if (r  0) {
+   spin_unlock(vcpu-kvm-lock);
+   return r;
+   }
+   if (!r) {
spin_unlock(vcpu-kvm-lock);
return 1;
}
@@ -1398,6 +1404,7 @@ static int svm_vcpu_run(struct kvm_vcpu 
u16 fs_selector;
u16 gs_selector;
u16 ldt_selector;
+   int r;
 
 again:
do_interrupt_requests(vcpu, kvm_run);
@@ -1565,7 +1572,8 @@ again:
return 0;
}
 
-   if (handle_exit(vcpu, kvm_run)) {
+   r = handle_exit(vcpu, kvm_run);
+   if (r  0) {
if (signal_pending(current)) {
++kvm_stat.signal_exits;
post_kvm_run_save(vcpu, kvm_run);
@@ -1581,7 +1589,7 @@ again:
goto again;
}
post_kvm_run_save(vcpu, kvm_run);
-   return 0;
+   return r;
 }
 
 static void svm_flush_tlb(struct kvm_vcpu *vcpu)
Index: linux-2.6/drivers/kvm/paging_tmpl.h
===
--- linux-2.6.orig/drivers/kvm/paging_tmpl.h
+++ linux-2.6/drivers/kvm/paging_tmpl.h
@@ -339,7 +339,8 @@ static int FNAME(fix_write_pf)(struct kv
  *   - normal guest page fault due to the guest pte marked not present, not
  * writable, or not executable
  *
- *  Returns: 1 if we need to emulate the instruction, 0 otherwise
+ *  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
+ *   a negative value on error.
  */
 static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
   u32 error_code)
@@ -351,10 +352,13 @@ static int FNAME(page_fault)(struct kvm_
u64 *shadow_pte;
int fixed;

[kvm-devel] [PATCH 33/33] KVM: MMU: add audit code to check mappings, etc are correct

2007-01-04 Thread Avi Kivity
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -26,8 +26,31 @@
 #include vmx.h
 #include kvm.h
 
-#define pgprintk(x...) do { printk(x); } while (0)
-#define rmap_printk(x...) do { printk(x); } while (0)
+#undef MMU_DEBUG
+
+#undef AUDIT
+
+#ifdef AUDIT
+static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg);
+#else
+static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {}
+#endif
+
+#ifdef MMU_DEBUG
+
+#define pgprintk(x...) do { if (dbg) printk(x); } while (0)
+#define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
+
+#else
+
+#define pgprintk(x...) do { } while (0)
+#define rmap_printk(x...) do { } while (0)
+
+#endif
+
+#if defined(MMU_DEBUG) || defined(AUDIT)
+static int dbg = 1;
+#endif
 
 #define ASSERT(x)  \
if (!(x)) { \
@@ -1271,3 +1294,163 @@ void kvm_mmu_slot_remove_write_access(st
}
}
 }
+
+#ifdef AUDIT
+
+static const char *audit_msg;
+
+static gva_t canonicalize(gva_t gva)
+{
+#ifdef CONFIG_X86_64
+   gva = (long long)(gva  16)  16;
+#endif
+   return gva;
+}
+
+static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
+   gva_t va, int level)
+{
+   u64 *pt = __va(page_pte  PT64_BASE_ADDR_MASK);
+   int i;
+   gva_t va_delta = 1ul  (PAGE_SHIFT + 9 * (level - 1));
+
+   for (i = 0; i  PT64_ENT_PER_PAGE; ++i, va += va_delta) {
+   u64 ent = pt[i];
+
+   if (!ent  PT_PRESENT_MASK)
+   continue;
+
+   va = canonicalize(va);
+   if (level  1)
+   audit_mappings_page(vcpu, ent, va, level - 1);
+   else {
+   gpa_t gpa = vcpu-mmu.gva_to_gpa(vcpu, va);
+   hpa_t hpa = gpa_to_hpa(vcpu, gpa);
+
+   if ((ent  PT_PRESENT_MASK)
+(ent  PT64_BASE_ADDR_MASK) != hpa)
+   printk(KERN_ERR audit error: (%s) levels %d
+   gva %lx gpa %llx hpa %llx ent %llx\n,
+  audit_msg, vcpu-mmu.root_level,
+  va, gpa, hpa, ent);
+   }
+   }
+}
+
+static void audit_mappings(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   if (vcpu-mmu.root_level == 4)
+   audit_mappings_page(vcpu, vcpu-mmu.root_hpa, 0, 4);
+   else
+   for (i = 0; i  4; ++i)
+   if (vcpu-mmu.pae_root[i]  PT_PRESENT_MASK)
+   audit_mappings_page(vcpu,
+   vcpu-mmu.pae_root[i],
+   i  30,
+   2);
+}
+
+static int count_rmaps(struct kvm_vcpu *vcpu)
+{
+   int nmaps = 0;
+   int i, j, k;
+
+   for (i = 0; i  KVM_MEMORY_SLOTS; ++i) {
+   struct kvm_memory_slot *m = vcpu-kvm-memslots[i];
+   struct kvm_rmap_desc *d;
+
+   for (j = 0; j  m-npages; ++j) {
+   struct page *page = m-phys_mem[j];
+
+   if (!page-private)
+   continue;
+   if (!(page-private  1)) {
+   ++nmaps;
+   continue;
+   }
+   d = (struct kvm_rmap_desc *)(page-private  ~1ul);
+   while (d) {
+   for (k = 0; k  RMAP_EXT; ++k)
+   if (d-shadow_ptes[k])
+   ++nmaps;
+   else
+   break;
+   d = d-more;
+   }
+   }
+   }
+   return nmaps;
+}
+
+static int count_writable_mappings(struct kvm_vcpu *vcpu)
+{
+   int nmaps = 0;
+   struct kvm_mmu_page *page;
+   int i;
+
+   list_for_each_entry(page, vcpu-kvm-active_mmu_pages, link) {
+   u64 *pt = __va(page-page_hpa);
+
+   if (page-role.level != PT_PAGE_TABLE_LEVEL)
+   continue;
+
+   for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
+   u64 ent = pt[i];
+
+   if (!(ent  PT_PRESENT_MASK))
+   continue;
+   if (!(ent  PT_WRITABLE_MASK))
+   continue;
+   ++nmaps;
+   }
+   }
+   return nmaps;
+}
+
+static void audit_rmap(struct kvm_vcpu *vcpu)
+{
+   int n_rmap = count_rmaps(vcpu);

Re: [kvm-devel] [PATCH 0/33] KVM: MMU: Cache shadow page tables

2007-01-04 Thread Andrew Morton
On Thu, 04 Jan 2007 17:48:45 +0200
Avi Kivity [EMAIL PROTECTED] wrote:

 The current kvm shadow page table implementation does not cache shadow 
 page tables (except for global translations, used for kernel addresses) 
 across context switches.  This means that after a context switch, every 
 memory access will trap into the host.  After a while, the shadow page 
 tables will be rebuild, and the guest can proceed at native speed until 
 the next context switch.
 
 The natural solution, then, is to cache shadow page tables across 
 context switches.  Unfortunately, this introduces a bucketload of problems:
 
 - the guest does not notify the processor (and hence kvm) that it 
 modifies a page table entry if it has reason to believe that the 
 modification will be followed by a tlb flush.  It becomes necessary to 
 write-protect guest page tables so that we can use the page fault when 
 the access occurs as a notification.
 - write protecting the guest page tables means we need to keep track of 
 which ptes map those guest page table. We need to add reverse mapping 
 for all mapped writable guest pages.
 - when the guest does access the write-protected page, we need to allow 
 it to perform the write in some way.  We do that either by emulating the 
 write, or removing all shadow page tables for that page and allowing the 
 write to proceed, depending on circumstances.
 
 This patchset implements the ideas above.  While a lot of tuning remains 
 to be done (for example, a sane page replacement algorithm), a guest 
 running with this patchset applied is much faster and more responsive 
 than with 2.6.20-rc3.  Some preliminary benchmarks are available in 
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/661.
 
 The patchset is bisectable compile-wise.

Is this intended for 2.6.20, or would you prefer that we release what we
have now and hold this off for 2.6.21?

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/33] KVM: MMU: Cache shadow page tables

2007-01-04 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
 Is this intended for 2.6.20, or would you prefer that we release what we
 have now and hold this off for 2.6.21?
   
 
 Even though these patches are potentially destabilazing, I'd like them 
 (and a few other patches) to go into 2.6.20:
 
 - kvm did not exist in 2.6.19, hence we cannot regress from that
 - this patchset is the difference between a working proof of concept and 
 a generally usable system
 - from my testing, it's quite stable

seconded - i have tested the new MMU changes quite extensively and they 
are converging nicely. It brings down context-switch costs by a factor 
of 10 and more, even for microbenchmarks: instead of throwing away the 
full shadow pagetable hiearchy we have worked so hard to construct this 
patchset allows the intelligent caching of shadow pagetables. The effect 
is human-visible as well - the system got visibly snappier.

(I'd increase the shadow cache pool from the current 256 pages to at 
least 1024 pages, but that's a detail.)

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 2/9] KVM: Initialize vcpu-kvm a little earlier

2007-01-04 Thread Avi Kivity
Fixes oops on early close of /dev/kvm.

Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/kvm_main.c
===
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -230,6 +230,7 @@ static int kvm_dev_open(struct inode *in
struct kvm_vcpu *vcpu = kvm-vcpus[i];
 
mutex_init(vcpu-mutex);
+   vcpu-kvm = kvm;
vcpu-mmu.root_hpa = INVALID_PAGE;
INIT_LIST_HEAD(vcpu-free_pages);
}
@@ -530,7 +531,6 @@ static int kvm_dev_ioctl_create_vcpu(str
vcpu-guest_fx_image = vcpu-host_fx_image + FX_IMAGE_SIZE;
 
vcpu-cpu = -1;  /* First load will set up TR */
-   vcpu-kvm = kvm;
r = kvm_arch_ops-vcpu_create(vcpu);
if (r  0)
goto out_free_vcpus;

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 4/9] KVM: Add missing 'break'

2007-01-04 Thread Avi Kivity
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/kvm_main.c
===
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -1922,6 +1922,7 @@ static long kvm_dev_ioctl(struct file *f
 num_msrs_to_save * sizeof(u32)))
goto out;
r = 0;
+   break;
}
default:
;

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 8/9] KVM: Simplify mmu_alloc_roots()

2007-01-04 Thread Avi Kivity
From: Ingo Molnar [EMAIL PROTECTED]

Small optimization/cleanup:

page == page_header(page-page_hpa)

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

Index: linux-2.6/drivers/kvm/mmu.c
===
--- linux-2.6.orig/drivers/kvm/mmu.c
+++ linux-2.6/drivers/kvm/mmu.c
@@ -820,9 +820,9 @@ static void mmu_alloc_roots(struct kvm_v
hpa_t root = vcpu-mmu.root_hpa;
 
ASSERT(!VALID_PAGE(root));
-   root = kvm_mmu_get_page(vcpu, root_gfn, 0,
-   PT64_ROOT_LEVEL, 0, NULL)-page_hpa;
-   page = page_header(root);
+   page = kvm_mmu_get_page(vcpu, root_gfn, 0,
+   PT64_ROOT_LEVEL, 0, NULL);
+   root = page-page_hpa;
++page-root_count;
vcpu-mmu.root_hpa = root;
return;
@@ -836,10 +836,10 @@ static void mmu_alloc_roots(struct kvm_v
root_gfn = vcpu-pdptrs[i]  PAGE_SHIFT;
else if (vcpu-mmu.root_level == 0)
root_gfn = 0;
-   root = kvm_mmu_get_page(vcpu, root_gfn, i  30,
+   page = kvm_mmu_get_page(vcpu, root_gfn, i  30,
PT32_ROOT_LEVEL, !is_paging(vcpu),
-   NULL)-page_hpa;
-   page = page_header(root);
+   NULL);
+   root = page-page_hpa;
++page-root_count;
vcpu-mmu.pae_root[i] = root | PT_PRESENT_MASK;
}

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel