Re: [kvm-devel] [patch] kvm: make cr3 loading more robust
Ingo Molnar wrote: another small detail is that currently KVM_SET_MEMORY_REGION appears to be an add-only interface - it is not possible to 'unregister' RAM from a VM. Well, the _interface_ supports removing, the implementation does not :) Everything was written in mind to allow memory hotplug. That keeps things easy for now, but if it's ever implemented then the current cr3 of all vcpus of the VM needs to be validated against the reduced memory slot map. (besides migrating all existing mappings from the removed memory slot to other memory slots and redirecting all in-flight DMA transactions, etc., etc. Which all needs heavy guest-OS awareness as well.) Actually I think it's quite easy: - halt all vcpus - wait for dma to complete - mmu_free_roots() - zap all page tables - actually unplug memory - mmu_alloc_roots() The guest needs to cooperate, but it can do so using the native memory hotlpug mechanisms (whatever they are). No guest modifications are needed. -- error compiling committee.c: too many arguments to function - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch] kvm: make cr3 loading more robust
* Avi Kivity [EMAIL PROTECTED] wrote: The guest needs to cooperate, but it can do so using the native memory hotlpug mechanisms (whatever they are). [...] as far a Linux guest goes, there's no such thing at the moment, at least in the mainline kernel. Most of the difficulties with RAM-unplug is on the guest OS side - i agree with you that doing it on the host side is easy. (because the host side does not really 'use' any of the guest's RAM.) Ingo - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch] KVM: simplify mmu_alloc_roots()
Ingo Molnar wrote: Subject: [patch] KVM: simplify mmu_alloc_roots() From: Ingo Molnar [EMAIL PROTECTED] small optimization/cleanup: page == page_header(page-page_hpa) Applied, thanks. -- error compiling committee.c: too many arguments to function - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Solaris 10 U2 installation failure
Parag Warudkar wrote: Avi Kivity [EMAIL PROTECTED] writes: 32-bin kvm userspace can run a 64-bit guest, if you're using a 64-bit os kernel, hence the 64-bit registers. Just ignore the 64-bit parts. Didn't understand. Allow me to clarify a bit - I am running a 32-bit Host OS (Linux i386) on a purely 32-bit CPU (Core Duo). Solaris installation will first check if the processor is 64-bit capable and only then install a 64-bit kernel. In that case, when Solaris asks KVM if the CPU is 64-bit, is KVM lying to it even though the host CPU on which it is running is NOT 64-bit capable and then emulating all AMD64 instructions without any help from the host CPU? That sounds more confusing! Even if it does something like that is there a way to tell KVM not to pretend like it is running a 64-bit CPU? I don't want to run Solaris in 64-bit mode as I am running on 32-bit host with just 512Mb of total memory. No, kvm doesn't pretend to be a 64-bit cpu when it isn't. There are three cases wrt host bitness. Two are straightforward: 32-bit host: kvm pretends to be a 32-bit cpu whether the cpu supports long mode or not. 64-bit host: the cpu supports long mode, and we pass that on to the guest. There is a third case: 64-bit host kernel but 32-bit qemu. In that case, we also support 64-bit guests. Because of that last case, 32-bit qemu is compiled with support for 64-bit guests. That means that even in a pure 32-bit environment, qemu has 64-bit registers (even though it can't make use of them). If you're running a 32-bit environment, simply ignore r8-r15 and the high 32 bits of other registers. That's what kvm.ko does :) -- error compiling committee.c: too many arguments to function - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Compile error with openSuse 10.2
When compiling KVM I get the following error:- In file included from /home/peter/applications-home/kvm-9/qemu/usb-linux.c:29: /usr/include/linux/usbdevice_fs.h:49: error: variable or field `__user' declared void /usr/include/linux/usbdevice_fs.h:49: error: syntax error before '*' token My environment is openSuse 10.2 gcc 3.4.6 kernel 2.6.18.2 (32 bit) I would be grateful for some pointers, Peter - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] KVM: Prevent stale bits in cr0 and cr4
Hardware virtualization implementations allow the guests to freely change some of the bits in cr0 and cr4, but trap when changing the other bits. This is useful to avoid excessive exits due to changing, for example, the ts flag. It also means the kvm's copy of cr0 and cr4 may be stale with respect to these bits. most of the time this doesn't matter as these bits are not very interesting. Other times, however (for example when returning cr0 to userspace), they are, so get the fresh contents of these bits from the guest by means of a new arch operation. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/kvm_main.c === --- linux-2.6.orig/drivers/kvm/kvm_main.c +++ linux-2.6/drivers/kvm/kvm_main.c @@ -390,6 +390,7 @@ EXPORT_SYMBOL_GPL(set_cr0); void lmsw(struct kvm_vcpu *vcpu, unsigned long msw) { + kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu); set_cr0(vcpu, (vcpu-cr0 ~0x0ful) | (msw 0x0f)); } EXPORT_SYMBOL_GPL(lmsw); @@ -917,9 +918,10 @@ int emulate_invlpg(struct kvm_vcpu *vcpu int emulate_clts(struct kvm_vcpu *vcpu) { - unsigned long cr0 = vcpu-cr0; + unsigned long cr0; - cr0 = ~CR0_TS_MASK; + kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu); + cr0 = vcpu-cr0 ~CR0_TS_MASK; kvm_arch_ops-set_cr0(vcpu, cr0); return X86EMUL_CONTINUE; } @@ -1072,6 +1074,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr) { + kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu); switch (cr) { case 0: return vcpu-cr0; @@ -1406,6 +1409,7 @@ static int kvm_dev_ioctl_get_sregs(struc sregs-gdt.limit = dt.limit; sregs-gdt.base = dt.base; + kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu); sregs-cr0 = vcpu-cr0; sregs-cr2 = vcpu-cr2; sregs-cr3 = vcpu-cr3; @@ -1470,6 +1474,8 @@ static int kvm_dev_ioctl_set_sregs(struc #endif vcpu-apic_base = sregs-apic_base; + kvm_arch_ops-decache_cr0_cr4_guest_bits(vcpu); + mmu_reset_needed |= vcpu-cr0 != sregs-cr0; kvm_arch_ops-set_cr0_no_modeswitch(vcpu, sregs-cr0); Index: linux-2.6/drivers/kvm/kvm.h === --- linux-2.6.orig/drivers/kvm/kvm.h +++ linux-2.6/drivers/kvm/kvm.h @@ -283,6 +283,7 @@ struct kvm_arch_ops { void (*set_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l); + void (*decache_cr0_cr4_guest_bits)(struct kvm_vcpu *vcpu); void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0); void (*set_cr0_no_modeswitch)(struct kvm_vcpu *vcpu, unsigned long cr0); Index: linux-2.6/drivers/kvm/svm.c === --- linux-2.6.orig/drivers/kvm/svm.c +++ linux-2.6/drivers/kvm/svm.c @@ -702,6 +702,10 @@ static void svm_set_gdt(struct kvm_vcpu vcpu-svm-vmcb-save.gdtr.base = dt-base ; } +static void svm_decache_cr0_cr4_guest_bits(struct kvm_vcpu *vcpu) +{ +} + static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { #ifdef CONFIG_X86_64 @@ -1645,6 +1649,7 @@ static struct kvm_arch_ops svm_arch_ops .get_segment = svm_get_segment, .set_segment = svm_set_segment, .get_cs_db_l_bits = svm_get_cs_db_l_bits, + .decache_cr0_cr4_guest_bits = svm_decache_cr0_cr4_guest_bits, .set_cr0 = svm_set_cr0, .set_cr0_no_modeswitch = svm_set_cr0, .set_cr3 = svm_set_cr3, Index: linux-2.6/drivers/kvm/vmx.c === --- linux-2.6.orig/drivers/kvm/vmx.c +++ linux-2.6/drivers/kvm/vmx.c @@ -737,6 +737,15 @@ static void exit_lmode(struct kvm_vcpu * #endif +static void vmx_decache_cr0_cr4_guest_bits(struct kvm_vcpu *vcpu) +{ + vcpu-cr0 = KVM_GUEST_CR0_MASK; + vcpu-cr0 |= vmcs_readl(GUEST_CR0) ~KVM_GUEST_CR0_MASK; + + vcpu-cr4 = KVM_GUEST_CR4_MASK; + vcpu-cr4 |= vmcs_readl(GUEST_CR4) ~KVM_GUEST_CR4_MASK; +} + static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { if (vcpu-rmode.active (cr0 CR0_PE_MASK)) @@ -2002,6 +2011,7 @@ static struct kvm_arch_ops vmx_arch_ops .get_segment = vmx_get_segment, .set_segment = vmx_set_segment, .get_cs_db_l_bits = vmx_get_cs_db_l_bits, + .decache_cr0_cr4_guest_bits = vmx_decache_cr0_cr4_guest_bits, .set_cr0 = vmx_set_cr0, .set_cr0_no_modeswitch = vmx_set_cr0_no_modeswitch, .set_cr3 = vmx_set_cr3, - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics
[kvm-devel] [PATCH 0/33] KVM: MMU: Cache shadow page tables
The current kvm shadow page table implementation does not cache shadow page tables (except for global translations, used for kernel addresses) across context switches. This means that after a context switch, every memory access will trap into the host. After a while, the shadow page tables will be rebuild, and the guest can proceed at native speed until the next context switch. The natural solution, then, is to cache shadow page tables across context switches. Unfortunately, this introduces a bucketload of problems: - the guest does not notify the processor (and hence kvm) that it modifies a page table entry if it has reason to believe that the modification will be followed by a tlb flush. It becomes necessary to write-protect guest page tables so that we can use the page fault when the access occurs as a notification. - write protecting the guest page tables means we need to keep track of which ptes map those guest page table. We need to add reverse mapping for all mapped writable guest pages. - when the guest does access the write-protected page, we need to allow it to perform the write in some way. We do that either by emulating the write, or removing all shadow page tables for that page and allowing the write to proceed, depending on circumstances. This patchset implements the ideas above. While a lot of tuning remains to be done (for example, a sane page replacement algorithm), a guest running with this patchset applied is much faster and more responsive than with 2.6.20-rc3. Some preliminary benchmarks are available in http://article.gmane.org/gmane.comp.emulators.kvm.devel/661. The patchset is bisectable compile-wise. -- error compiling committee.c: too many arguments to function - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 3/33] KVM: MMU: Load the pae pdptrs on cr3 change like the processor does
In pae mode, a load of cr3 loads the four third-level page table entries in addition to cr3 itself. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/kvm_main.c === --- linux-2.6.orig/drivers/kvm/kvm_main.c +++ linux-2.6/drivers/kvm/kvm_main.c @@ -298,14 +298,17 @@ static void inject_gp(struct kvm_vcpu *v kvm_arch_ops-inject_gp(vcpu, 0); } -static int pdptrs_have_reserved_bits_set(struct kvm_vcpu *vcpu, -unsigned long cr3) +/* + * Load the pae pdptrs. Return true is they are all valid. + */ +static int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3) { gfn_t pdpt_gfn = cr3 PAGE_SHIFT; - unsigned offset = (cr3 (PAGE_SIZE-1)) 5; + unsigned offset = ((cr3 (PAGE_SIZE-1)) 5) 2; int i; u64 pdpte; u64 *pdpt; + int ret; struct kvm_memory_slot *memslot; spin_lock(vcpu-kvm-lock); @@ -313,16 +316,23 @@ static int pdptrs_have_reserved_bits_set /* FIXME: !memslot - emulate? 0xff? */ pdpt = kmap_atomic(gfn_to_page(memslot, pdpt_gfn), KM_USER0); + ret = 1; for (i = 0; i 4; ++i) { pdpte = pdpt[offset + i]; - if ((pdpte 1) (pdpte 0xfff001e6ull)) - break; + if ((pdpte 1) (pdpte 0xfff001e6ull)) { + ret = 0; + goto out; + } } + for (i = 0; i 4; ++i) + vcpu-pdptrs[i] = pdpt[offset + i]; + +out: kunmap_atomic(pdpt, KM_USER0); spin_unlock(vcpu-kvm-lock); - return i != 4; + return ret; } void set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) @@ -368,8 +378,7 @@ void set_cr0(struct kvm_vcpu *vcpu, unsi } } else #endif - if (is_pae(vcpu) - pdptrs_have_reserved_bits_set(vcpu, vcpu-cr3)) { + if (is_pae(vcpu) !load_pdptrs(vcpu, vcpu-cr3)) { printk(KERN_DEBUG set_cr0: #GP, pdptrs reserved bits\n); inject_gp(vcpu); @@ -411,7 +420,7 @@ void set_cr4(struct kvm_vcpu *vcpu, unsi return; } } else if (is_paging(vcpu) !is_pae(vcpu) (cr4 CR4_PAE_MASK) - pdptrs_have_reserved_bits_set(vcpu, vcpu-cr3)) { + !load_pdptrs(vcpu, vcpu-cr3)) { printk(KERN_DEBUG set_cr4: #GP, pdptrs reserved bits\n); inject_gp(vcpu); } @@ -443,7 +452,7 @@ void set_cr3(struct kvm_vcpu *vcpu, unsi return; } if (is_paging(vcpu) is_pae(vcpu) - pdptrs_have_reserved_bits_set(vcpu, cr3)) { + !load_pdptrs(vcpu, cr3)) { printk(KERN_DEBUG set_cr3: #GP, pdptrs reserved bits\n); inject_gp(vcpu); Index: linux-2.6/drivers/kvm/kvm.h === --- linux-2.6.orig/drivers/kvm/kvm.h +++ linux-2.6/drivers/kvm/kvm.h @@ -185,6 +185,7 @@ struct kvm_vcpu { unsigned long cr3; unsigned long cr4; unsigned long cr8; + u64 pdptrs[4]; /* pae */ u64 shadow_efer; u64 apic_base; int nmsrs; - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 4/33] KVM: MMU: Fold fetch_guest() into init_walker()
It is never necessary to fetch a guest entry from an intermediate page table level (except for large pages), so avoid some confusion by always descending into the lowest possible level. Rename init_walker() to walk_addr() as it is no longer restricted to initialization. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/paging_tmpl.h === --- linux-2.6.orig/drivers/kvm/paging_tmpl.h +++ linux-2.6/drivers/kvm/paging_tmpl.h @@ -54,14 +54,19 @@ struct guest_walker { int level; gfn_t table_gfn; pt_element_t *table; + pt_element_t *ptep; pt_element_t inherited_ar; }; -static void FNAME(init_walker)(struct guest_walker *walker, - struct kvm_vcpu *vcpu) +/* + * Fetch a guest pte for a guest virtual address + */ +static void FNAME(walk_addr)(struct guest_walker *walker, +struct kvm_vcpu *vcpu, gva_t addr) { hpa_t hpa; struct kvm_memory_slot *slot; + pt_element_t *ptep; walker-level = vcpu-mmu.root_level; walker-table_gfn = (vcpu-cr3 PT64_BASE_ADDR_MASK) PAGE_SHIFT; @@ -75,6 +80,38 @@ static void FNAME(init_walker)(struct gu walker-table = (pt_element_t *)( (unsigned long)walker-table | (unsigned long)(vcpu-cr3 ~(PAGE_MASK | CR3_FLAGS_MASK)) ); walker-inherited_ar = PT_USER_MASK | PT_WRITABLE_MASK; + + for (;;) { + int index = PT_INDEX(addr, walker-level); + hpa_t paddr; + + ptep = walker-table[index]; + ASSERT(((unsigned long)walker-table PAGE_MASK) == + ((unsigned long)ptep PAGE_MASK)); + + /* Don't set accessed bit on PAE PDPTRs */ + if (vcpu-mmu.root_level != 3 || walker-level != 3) + if ((*ptep (PT_PRESENT_MASK | PT_ACCESSED_MASK)) + == PT_PRESENT_MASK) + *ptep |= PT_ACCESSED_MASK; + + if (!is_present_pte(*ptep) || + walker-level == PT_PAGE_TABLE_LEVEL || + (walker-level == PT_DIRECTORY_LEVEL +(*ptep PT_PAGE_SIZE_MASK) +(PTTYPE == 64 || is_pse(vcpu + break; + + if (walker-level != 3 || is_long_mode(vcpu)) + walker-inherited_ar = walker-table[index]; + walker-table_gfn = (*ptep PT_BASE_ADDR_MASK) PAGE_SHIFT; + paddr = safe_gpa_to_hpa(vcpu, *ptep PT_BASE_ADDR_MASK); + kunmap_atomic(walker-table, KM_USER0); + walker-table = kmap_atomic(pfn_to_page(paddr PAGE_SHIFT), + KM_USER0); + --walker-level; + } + walker-ptep = ptep; } static void FNAME(release_walker)(struct guest_walker *walker) @@ -110,41 +147,6 @@ static void FNAME(set_pde)(struct kvm_vc } /* - * Fetch a guest pte from a specific level in the paging hierarchy. - */ -static pt_element_t *FNAME(fetch_guest)(struct kvm_vcpu *vcpu, - struct guest_walker *walker, - int level, - gva_t addr) -{ - - ASSERT(level 0 level = walker-level); - - for (;;) { - int index = PT_INDEX(addr, walker-level); - hpa_t paddr; - - ASSERT(((unsigned long)walker-table PAGE_MASK) == - ((unsigned long)walker-table[index] PAGE_MASK)); - if (level == walker-level || - !is_present_pte(walker-table[index]) || - (walker-level == PT_DIRECTORY_LEVEL -(walker-table[index] PT_PAGE_SIZE_MASK) -(PTTYPE == 64 || is_pse(vcpu - return walker-table[index]; - if (walker-level != 3 || is_long_mode(vcpu)) - walker-inherited_ar = walker-table[index]; - walker-table_gfn = (walker-table[index] PT_BASE_ADDR_MASK) -PAGE_SHIFT; - paddr = safe_gpa_to_hpa(vcpu, walker-table[index] PT_BASE_ADDR_MASK); - kunmap_atomic(walker-table, KM_USER0); - walker-table = kmap_atomic(pfn_to_page(paddr PAGE_SHIFT), - KM_USER0); - --walker-level; - } -} - -/* * Fetch a shadow pte for a specific level in the paging hierarchy. */ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, @@ -153,6 +155,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu hpa_t shadow_addr; int level; u64 *prev_shadow_ent = NULL; + pt_element_t *guest_ent = walker-ptep; + + if (!is_present_pte(*guest_ent)) + return NULL; shadow_addr = vcpu-mmu.root_hpa; level =
[kvm-devel] [PATCH 5/33] KVM: MU: Special treatment for shadow pae root pages
Since we're not going to cache the pae-mode shadow root pages, allocate a single pae shadow that will hold the four lower-level pages, which will act as roots. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -420,19 +420,63 @@ static int nonpaging_map(struct kvm_vcpu } } +static void mmu_free_roots(struct kvm_vcpu *vcpu) +{ + int i; + +#ifdef CONFIG_X86_64 + if (vcpu-mmu.shadow_root_level == PT64_ROOT_LEVEL) { + hpa_t root = vcpu-mmu.root_hpa; + + ASSERT(VALID_PAGE(root)); + release_pt_page_64(vcpu, root, PT64_ROOT_LEVEL); + vcpu-mmu.root_hpa = INVALID_PAGE; + return; + } +#endif + for (i = 0; i 4; ++i) { + hpa_t root = vcpu-mmu.pae_root[i]; + + ASSERT(VALID_PAGE(root)); + root = PT64_BASE_ADDR_MASK; + release_pt_page_64(vcpu, root, PT32E_ROOT_LEVEL - 1); + vcpu-mmu.pae_root[i] = INVALID_PAGE; + } + vcpu-mmu.root_hpa = INVALID_PAGE; +} + +static void mmu_alloc_roots(struct kvm_vcpu *vcpu) +{ + int i; + +#ifdef CONFIG_X86_64 + if (vcpu-mmu.shadow_root_level == PT64_ROOT_LEVEL) { + hpa_t root = vcpu-mmu.root_hpa; + + ASSERT(!VALID_PAGE(root)); + root = kvm_mmu_alloc_page(vcpu, NULL); + vcpu-mmu.root_hpa = root; + return; + } +#endif + for (i = 0; i 4; ++i) { + hpa_t root = vcpu-mmu.pae_root[i]; + + ASSERT(!VALID_PAGE(root)); + root = kvm_mmu_alloc_page(vcpu, NULL); + vcpu-mmu.pae_root[i] = root | PT_PRESENT_MASK; + } + vcpu-mmu.root_hpa = __pa(vcpu-mmu.pae_root); +} + static void nonpaging_flush(struct kvm_vcpu *vcpu) { hpa_t root = vcpu-mmu.root_hpa; ++kvm_stat.tlb_flush; pgprintk(nonpaging_flush\n); - ASSERT(VALID_PAGE(root)); - release_pt_page_64(vcpu, root, vcpu-mmu.shadow_root_level); - root = kvm_mmu_alloc_page(vcpu, NULL); - ASSERT(VALID_PAGE(root)); - vcpu-mmu.root_hpa = root; - if (is_paging(vcpu)) - root |= (vcpu-cr3 (CR3_PCD_MASK | CR3_WPT_MASK)); + mmu_free_roots(vcpu); + mmu_alloc_roots(vcpu); kvm_arch_ops-set_cr3(vcpu, root); kvm_arch_ops-tlb_flush(vcpu); } @@ -475,13 +519,7 @@ static void nonpaging_inval_page(struct static void nonpaging_free(struct kvm_vcpu *vcpu) { - hpa_t root; - - ASSERT(vcpu); - root = vcpu-mmu.root_hpa; - if (VALID_PAGE(root)) - release_pt_page_64(vcpu, root, vcpu-mmu.shadow_root_level); - vcpu-mmu.root_hpa = INVALID_PAGE; + mmu_free_roots(vcpu); } static int nonpaging_init_context(struct kvm_vcpu *vcpu) @@ -495,7 +533,7 @@ static int nonpaging_init_context(struct context-free = nonpaging_free; context-root_level = PT32E_ROOT_LEVEL; context-shadow_root_level = PT32E_ROOT_LEVEL; - context-root_hpa = kvm_mmu_alloc_page(vcpu, NULL); + mmu_alloc_roots(vcpu); ASSERT(VALID_PAGE(context-root_hpa)); kvm_arch_ops-set_cr3(vcpu, context-root_hpa); return 0; @@ -647,7 +685,7 @@ static void paging_free(struct kvm_vcpu #include paging_tmpl.h #undef PTTYPE -static int paging64_init_context(struct kvm_vcpu *vcpu) +static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level) { struct kvm_mmu *context = vcpu-mmu; @@ -657,15 +695,20 @@ static int paging64_init_context(struct context-inval_page = paging_inval_page; context-gva_to_gpa = paging64_gva_to_gpa; context-free = paging_free; - context-root_level = PT64_ROOT_LEVEL; - context-shadow_root_level = PT64_ROOT_LEVEL; - context-root_hpa = kvm_mmu_alloc_page(vcpu, NULL); + context-root_level = level; + context-shadow_root_level = level; + mmu_alloc_roots(vcpu); ASSERT(VALID_PAGE(context-root_hpa)); kvm_arch_ops-set_cr3(vcpu, context-root_hpa | (vcpu-cr3 (CR3_PCD_MASK | CR3_WPT_MASK))); return 0; } +static int paging64_init_context(struct kvm_vcpu *vcpu) +{ + return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL); +} + static int paging32_init_context(struct kvm_vcpu *vcpu) { struct kvm_mmu *context = vcpu-mmu; @@ -677,7 +720,7 @@ static int paging32_init_context(struct context-free = paging_free; context-root_level = PT32_ROOT_LEVEL; context-shadow_root_level = PT32E_ROOT_LEVEL; - context-root_hpa = kvm_mmu_alloc_page(vcpu, NULL); + mmu_alloc_roots(vcpu); ASSERT(VALID_PAGE(context-root_hpa)); kvm_arch_ops-set_cr3(vcpu, context-root_hpa | (vcpu-cr3 (CR3_PCD_MASK |
[kvm-devel] [PATCH 14/33] KVM: MMU: If emulating an instruction fails, try unprotecting the page
A page table may have been recycled into a regular page, and so any instruction can be executed on it. Unprotect the page and let the cpu do its thing. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -478,11 +478,62 @@ static struct kvm_mmu_page *kvm_mmu_get_ return page; } +static void kvm_mmu_page_unlink_children(struct kvm_vcpu *vcpu, +struct kvm_mmu_page *page) +{ + BUG(); +} + static void kvm_mmu_put_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, u64 *parent_pte) { mmu_page_remove_parent_pte(page, parent_pte); + if (page-role.level PT_PAGE_TABLE_LEVEL) + kvm_mmu_page_unlink_children(vcpu, page); + hlist_del(page-hash_link); + list_del(page-link); + list_add(page-link, vcpu-free_pages); +} + +static void kvm_mmu_zap_page(struct kvm_vcpu *vcpu, +struct kvm_mmu_page *page) +{ + u64 *parent_pte; + + while (page-multimapped || page-parent_pte) { + if (!page-multimapped) + parent_pte = page-parent_pte; + else { + struct kvm_pte_chain *chain; + + chain = container_of(page-parent_ptes.first, +struct kvm_pte_chain, link); + parent_pte = chain-parent_ptes[0]; + } + kvm_mmu_put_page(vcpu, page, parent_pte); + *parent_pte = 0; + } +} + +static int kvm_mmu_unprotect_page(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + unsigned index; + struct hlist_head *bucket; + struct kvm_mmu_page *page; + struct hlist_node *node, *n; + int r; + + pgprintk(%s: looking for gfn %lx\n, __FUNCTION__, gfn); + r = 0; + index = kvm_page_table_hashfn(gfn) % KVM_NUM_MMU_PAGES; + bucket = vcpu-kvm-mmu_page_hash[index]; + hlist_for_each_entry_safe(page, node, n, bucket, hash_link) + if (page-gfn == gfn !page-role.metaphysical) { + kvm_mmu_zap_page(vcpu, page); + r = 1; + } + return r; } static void page_header_update_slot(struct kvm *kvm, void *pte, gpa_t gpa) @@ -1001,6 +1052,13 @@ void kvm_mmu_post_write(struct kvm_vcpu { } +int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) +{ + gpa_t gpa = vcpu-mmu.gva_to_gpa(vcpu, gva); + + return kvm_mmu_unprotect_page(vcpu, gpa PAGE_SHIFT); +} + static void free_mmu_pages(struct kvm_vcpu *vcpu) { while (!list_empty(vcpu-free_pages)) { Index: linux-2.6/drivers/kvm/kvm_main.c === --- linux-2.6.orig/drivers/kvm/kvm_main.c +++ linux-2.6/drivers/kvm/kvm_main.c @@ -1063,6 +1063,8 @@ int emulate_instruction(struct kvm_vcpu } if (r) { + if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) + return EMULATE_DONE; if (!vcpu-mmio_needed) { report_emulation_failure(emulate_ctxt); return EMULATE_FAIL; Index: linux-2.6/drivers/kvm/kvm.h === --- linux-2.6.orig/drivers/kvm/kvm.h +++ linux-2.6/drivers/kvm/kvm.h @@ -450,6 +450,7 @@ unsigned long segment_base(u16 selector) void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes); void kvm_mmu_post_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes); +int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); static inline struct page *_gfn_to_page(struct kvm *kvm, gfn_t gfn) { - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 20/33] KVM: MMU: Handle misaligned accesses to write protected guest page tables
A misaligned access affects two shadow ptes instead of just one. Since a misaligned access is unlikely to occur on a real page table, just zap the page out of existence, avoiding further trouble. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -954,21 +954,36 @@ void kvm_mmu_pre_write(struct kvm_vcpu * gfn_t gfn = gpa PAGE_SHIFT; struct kvm_mmu_page *page; struct kvm_mmu_page *child; - struct hlist_node *node; + struct hlist_node *node, *n; struct hlist_head *bucket; unsigned index; u64 *spte; u64 pte; unsigned offset = offset_in_page(gpa); + unsigned pte_size; unsigned page_offset; + unsigned misaligned; int level; pgprintk(%s: gpa %llx bytes %d\n, __FUNCTION__, gpa, bytes); index = kvm_page_table_hashfn(gfn) % KVM_NUM_MMU_PAGES; bucket = vcpu-kvm-mmu_page_hash[index]; - hlist_for_each_entry(page, node, bucket, hash_link) { + hlist_for_each_entry_safe(page, node, n, bucket, hash_link) { if (page-gfn != gfn || page-role.metaphysical) continue; + pte_size = page-role.glevels == PT32_ROOT_LEVEL ? 4 : 8; + misaligned = (offset ^ (offset + bytes - 1)) ~(pte_size - 1); + if (misaligned) { + /* +* Misaligned accesses are too much trouble to fix +* up; also, they usually indicate a page is not used +* as a page table. +*/ + pgprintk(misaligned: gpa %llx bytes %d role %x\n, +gpa, bytes, page-role.word); + kvm_mmu_zap_page(vcpu, page); + continue; + } page_offset = offset; level = page-role.level; if (page-role.glevels == PT32_ROOT_LEVEL) { - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 21/33] KVM: MMU: ove is_empty_shadow_page() above kvm_mmu_free_page()
Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -303,16 +303,6 @@ static void rmap_write_protect(struct kv } } -static void kvm_mmu_free_page(struct kvm_vcpu *vcpu, hpa_t page_hpa) -{ - struct kvm_mmu_page *page_head = page_header(page_hpa); - - list_del(page_head-link); - page_head-page_hpa = page_hpa; - list_add(page_head-link, vcpu-free_pages); - ++vcpu-kvm-n_free_mmu_pages; -} - static int is_empty_shadow_page(hpa_t page_hpa) { u32 *pos; @@ -324,6 +314,16 @@ static int is_empty_shadow_page(hpa_t pa return 1; } +static void kvm_mmu_free_page(struct kvm_vcpu *vcpu, hpa_t page_hpa) +{ + struct kvm_mmu_page *page_head = page_header(page_hpa); + + list_del(page_head-link); + page_head-page_hpa = page_hpa; + list_add(page_head-link, vcpu-free_pages); + ++vcpu-kvm-n_free_mmu_pages; +} + static unsigned kvm_page_table_hashfn(gfn_t gfn) { return gfn; - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 26/33] KVM: MMU: Fix cmpxchg8b emulation
cmpxchg8b uses edx:eax as the compare operand, not edi:eax. cmpxchg8b is used by 32-bit pae guests to set page table entries atomically, and this is emulated touching shadowed guest page tables. Also, implement it for 32-bit hosts. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/x86_emulate.c === --- linux-2.6.orig/drivers/kvm/x86_emulate.c +++ linux-2.6/drivers/kvm/x86_emulate.c @@ -1323,7 +1323,7 @@ twobyte_special_insn: ctxt)) != 0)) goto done; if ((old_lo != _regs[VCPU_REGS_RAX]) - || (old_hi != _regs[VCPU_REGS_RDI])) { + || (old_hi != _regs[VCPU_REGS_RDX])) { _regs[VCPU_REGS_RAX] = old_lo; _regs[VCPU_REGS_RDX] = old_hi; _eflags = ~EFLG_ZF; Index: linux-2.6/drivers/kvm/kvm_main.c === --- linux-2.6.orig/drivers/kvm/kvm_main.c +++ linux-2.6/drivers/kvm/kvm_main.c @@ -936,6 +936,30 @@ static int emulator_cmpxchg_emulated(uns return emulator_write_emulated(addr, new, bytes, ctxt); } +#ifdef CONFIG_X86_32 + +static int emulator_cmpxchg8b_emulated(unsigned long addr, + unsigned long old_lo, + unsigned long old_hi, + unsigned long new_lo, + unsigned long new_hi, + struct x86_emulate_ctxt *ctxt) +{ + static int reported; + int r; + + if (!reported) { + reported = 1; + printk(KERN_WARNING kvm: emulating exchange8b as write\n); + } + r = emulator_write_emulated(addr, new_lo, 4, ctxt); + if (r != X86EMUL_CONTINUE) + return r; + return emulator_write_emulated(addr+4, new_hi, 4, ctxt); +} + +#endif + static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg) { return kvm_arch_ops-get_segment_base(vcpu, seg); @@ -1010,6 +1034,9 @@ struct x86_emulate_ops emulate_ops = { .read_emulated = emulator_read_emulated, .write_emulated = emulator_write_emulated, .cmpxchg_emulated= emulator_cmpxchg_emulated, +#ifdef CONFIG_X86_32 + .cmpxchg8b_emulated = emulator_cmpxchg8b_emulated, +#endif }; int emulate_instruction(struct kvm_vcpu *vcpu, - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 28/33] KVM: MMU: Free pages on kvm destruction
Because mmu pages have attached rmap and parent pte chain structures, we need to zap them before freeing so the attached structures are freed. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -1065,9 +1065,14 @@ EXPORT_SYMBOL_GPL(kvm_mmu_free_some_page static void free_mmu_pages(struct kvm_vcpu *vcpu) { - while (!list_empty(vcpu-free_pages)) { - struct kvm_mmu_page *page; + struct kvm_mmu_page *page; + while (!list_empty(vcpu-kvm-active_mmu_pages)) { + page = container_of(vcpu-kvm-active_mmu_pages.next, + struct kvm_mmu_page, link); + kvm_mmu_zap_page(vcpu, page); + } + while (!list_empty(vcpu-free_pages)) { page = list_entry(vcpu-free_pages.next, struct kvm_mmu_page, link); list_del(page-link); - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 29/33] KVM: MMU: Replace atomic allocations by preallocated objects
The mmu sometimes needs memory for reverse mapping and parent pte chains. however, we can't allocate from within the mmu because of the atomic context. So, move the allocations to a central place that can be executed before the main mmu machinery, where we can bail out on failure before any damage is done. (error handling is deffered for now, but the basic structure is there) Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -166,6 +166,84 @@ static int is_rmap_pte(u64 pte) == (PT_WRITABLE_MASK | PT_PRESENT_MASK); } +static void mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, + size_t objsize, int min) +{ + void *obj; + + if (cache-nobjs = min) + return; + while (cache-nobjs ARRAY_SIZE(cache-objects)) { + obj = kzalloc(objsize, GFP_NOWAIT); + if (!obj) + BUG(); + cache-objects[cache-nobjs++] = obj; + } +} + +static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +{ + while (mc-nobjs) + kfree(mc-objects[--mc-nobjs]); +} + +static void mmu_topup_memory_caches(struct kvm_vcpu *vcpu) +{ + mmu_topup_memory_cache(vcpu-mmu_pte_chain_cache, + sizeof(struct kvm_pte_chain), 4); + mmu_topup_memory_cache(vcpu-mmu_rmap_desc_cache, + sizeof(struct kvm_rmap_desc), 1); +} + +static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) +{ + mmu_free_memory_cache(vcpu-mmu_pte_chain_cache); + mmu_free_memory_cache(vcpu-mmu_rmap_desc_cache); +} + +static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc, + size_t size) +{ + void *p; + + BUG_ON(!mc-nobjs); + p = mc-objects[--mc-nobjs]; + memset(p, 0, size); + return p; +} + +static void mmu_memory_cache_free(struct kvm_mmu_memory_cache *mc, void *obj) +{ + if (mc-nobjs KVM_NR_MEM_OBJS) + mc-objects[mc-nobjs++] = obj; + else + kfree(obj); +} + +static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu) +{ + return mmu_memory_cache_alloc(vcpu-mmu_pte_chain_cache, + sizeof(struct kvm_pte_chain)); +} + +static void mmu_free_pte_chain(struct kvm_vcpu *vcpu, + struct kvm_pte_chain *pc) +{ + mmu_memory_cache_free(vcpu-mmu_pte_chain_cache, pc); +} + +static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu) +{ + return mmu_memory_cache_alloc(vcpu-mmu_rmap_desc_cache, + sizeof(struct kvm_rmap_desc)); +} + +static void mmu_free_rmap_desc(struct kvm_vcpu *vcpu, + struct kvm_rmap_desc *rd) +{ + mmu_memory_cache_free(vcpu-mmu_rmap_desc_cache, rd); +} + /* * Reverse mapping data structures: * @@ -175,7 +253,7 @@ static int is_rmap_pte(u64 pte) * If page-private bit zero is one, (then page-private ~1) points * to a struct kvm_rmap_desc containing more mappings. */ -static void rmap_add(struct kvm *kvm, u64 *spte) +static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte) { struct page *page; struct kvm_rmap_desc *desc; @@ -189,9 +267,7 @@ static void rmap_add(struct kvm *kvm, u6 page-private = (unsigned long)spte; } else if (!(page-private 1)) { rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte); - desc = kzalloc(sizeof *desc, GFP_NOWAIT); - if (!desc) - BUG(); /* FIXME: return error */ + desc = mmu_alloc_rmap_desc(vcpu); desc-shadow_ptes[0] = (u64 *)page-private; desc-shadow_ptes[1] = spte; page-private = (unsigned long)desc | 1; @@ -201,9 +277,7 @@ static void rmap_add(struct kvm *kvm, u6 while (desc-shadow_ptes[RMAP_EXT-1] desc-more) desc = desc-more; if (desc-shadow_ptes[RMAP_EXT-1]) { - desc-more = kzalloc(sizeof *desc-more, GFP_NOWAIT); - if (!desc-more) - BUG(); /* FIXME: return error */ + desc-more = mmu_alloc_rmap_desc(vcpu); desc = desc-more; } for (i = 0; desc-shadow_ptes[i]; ++i) @@ -212,7 +286,8 @@ static void rmap_add(struct kvm *kvm, u6 } } -static void rmap_desc_remove_entry(struct page *page, +static void rmap_desc_remove_entry(struct kvm_vcpu *vcpu, + struct page *page, struct kvm_rmap_desc *desc, int i, struct
[kvm-devel] [PATCH 30/33] KVM: MMU: Detect oom conditions and propagate error to userspace
Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -166,19 +166,20 @@ static int is_rmap_pte(u64 pte) == (PT_WRITABLE_MASK | PT_PRESENT_MASK); } -static void mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, - size_t objsize, int min) +static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, + size_t objsize, int min) { void *obj; if (cache-nobjs = min) - return; + return 0; while (cache-nobjs ARRAY_SIZE(cache-objects)) { obj = kzalloc(objsize, GFP_NOWAIT); if (!obj) - BUG(); + return -ENOMEM; cache-objects[cache-nobjs++] = obj; } + return 0; } static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) @@ -187,12 +188,18 @@ static void mmu_free_memory_cache(struct kfree(mc-objects[--mc-nobjs]); } -static void mmu_topup_memory_caches(struct kvm_vcpu *vcpu) +static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu) { - mmu_topup_memory_cache(vcpu-mmu_pte_chain_cache, - sizeof(struct kvm_pte_chain), 4); - mmu_topup_memory_cache(vcpu-mmu_rmap_desc_cache, - sizeof(struct kvm_rmap_desc), 1); + int r; + + r = mmu_topup_memory_cache(vcpu-mmu_pte_chain_cache, + sizeof(struct kvm_pte_chain), 4); + if (r) + goto out; + r = mmu_topup_memory_cache(vcpu-mmu_rmap_desc_cache, + sizeof(struct kvm_rmap_desc), 1); +out: + return r; } static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) @@ -824,8 +831,11 @@ static int nonpaging_page_fault(struct k { gpa_t addr = gva; hpa_t paddr; + int r; - mmu_topup_memory_caches(vcpu); + r = mmu_topup_memory_caches(vcpu); + if (r) + return r; ASSERT(vcpu); ASSERT(VALID_PAGE(vcpu-mmu.root_hpa)); @@ -1052,7 +1062,7 @@ int kvm_mmu_reset_context(struct kvm_vcp r = init_kvm_mmu(vcpu); if (r 0) goto out; - mmu_topup_memory_caches(vcpu); + r = mmu_topup_memory_caches(vcpu); out: return r; } Index: linux-2.6/drivers/kvm/svm.c === --- linux-2.6.orig/drivers/kvm/svm.c +++ linux-2.6/drivers/kvm/svm.c @@ -852,6 +852,7 @@ static int pf_interception(struct kvm_vc u64 fault_address; u32 error_code; enum emulation_result er; + int r; if (is_external_interrupt(exit_int_info)) push_irq(vcpu, exit_int_info SVM_EVTINJ_VEC_MASK); @@ -860,7 +861,12 @@ static int pf_interception(struct kvm_vc fault_address = vcpu-svm-vmcb-control.exit_info_2; error_code = vcpu-svm-vmcb-control.exit_info_1; - if (!kvm_mmu_page_fault(vcpu, fault_address, error_code)) { + r = kvm_mmu_page_fault(vcpu, fault_address, error_code); + if (r 0) { + spin_unlock(vcpu-kvm-lock); + return r; + } + if (!r) { spin_unlock(vcpu-kvm-lock); return 1; } @@ -1398,6 +1404,7 @@ static int svm_vcpu_run(struct kvm_vcpu u16 fs_selector; u16 gs_selector; u16 ldt_selector; + int r; again: do_interrupt_requests(vcpu, kvm_run); @@ -1565,7 +1572,8 @@ again: return 0; } - if (handle_exit(vcpu, kvm_run)) { + r = handle_exit(vcpu, kvm_run); + if (r 0) { if (signal_pending(current)) { ++kvm_stat.signal_exits; post_kvm_run_save(vcpu, kvm_run); @@ -1581,7 +1589,7 @@ again: goto again; } post_kvm_run_save(vcpu, kvm_run); - return 0; + return r; } static void svm_flush_tlb(struct kvm_vcpu *vcpu) Index: linux-2.6/drivers/kvm/paging_tmpl.h === --- linux-2.6.orig/drivers/kvm/paging_tmpl.h +++ linux-2.6/drivers/kvm/paging_tmpl.h @@ -339,7 +339,8 @@ static int FNAME(fix_write_pf)(struct kv * - normal guest page fault due to the guest pte marked not present, not * writable, or not executable * - * Returns: 1 if we need to emulate the instruction, 0 otherwise + * Returns: 1 if we need to emulate the instruction, 0 otherwise, or + * a negative value on error. */ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code) @@ -351,10 +352,13 @@ static int FNAME(page_fault)(struct kvm_ u64 *shadow_pte; int fixed;
[kvm-devel] [PATCH 33/33] KVM: MMU: add audit code to check mappings, etc are correct
Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -26,8 +26,31 @@ #include vmx.h #include kvm.h -#define pgprintk(x...) do { printk(x); } while (0) -#define rmap_printk(x...) do { printk(x); } while (0) +#undef MMU_DEBUG + +#undef AUDIT + +#ifdef AUDIT +static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg); +#else +static void kvm_mmu_audit(struct kvm_vcpu *vcpu, const char *msg) {} +#endif + +#ifdef MMU_DEBUG + +#define pgprintk(x...) do { if (dbg) printk(x); } while (0) +#define rmap_printk(x...) do { if (dbg) printk(x); } while (0) + +#else + +#define pgprintk(x...) do { } while (0) +#define rmap_printk(x...) do { } while (0) + +#endif + +#if defined(MMU_DEBUG) || defined(AUDIT) +static int dbg = 1; +#endif #define ASSERT(x) \ if (!(x)) { \ @@ -1271,3 +1294,163 @@ void kvm_mmu_slot_remove_write_access(st } } } + +#ifdef AUDIT + +static const char *audit_msg; + +static gva_t canonicalize(gva_t gva) +{ +#ifdef CONFIG_X86_64 + gva = (long long)(gva 16) 16; +#endif + return gva; +} + +static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte, + gva_t va, int level) +{ + u64 *pt = __va(page_pte PT64_BASE_ADDR_MASK); + int i; + gva_t va_delta = 1ul (PAGE_SHIFT + 9 * (level - 1)); + + for (i = 0; i PT64_ENT_PER_PAGE; ++i, va += va_delta) { + u64 ent = pt[i]; + + if (!ent PT_PRESENT_MASK) + continue; + + va = canonicalize(va); + if (level 1) + audit_mappings_page(vcpu, ent, va, level - 1); + else { + gpa_t gpa = vcpu-mmu.gva_to_gpa(vcpu, va); + hpa_t hpa = gpa_to_hpa(vcpu, gpa); + + if ((ent PT_PRESENT_MASK) +(ent PT64_BASE_ADDR_MASK) != hpa) + printk(KERN_ERR audit error: (%s) levels %d + gva %lx gpa %llx hpa %llx ent %llx\n, + audit_msg, vcpu-mmu.root_level, + va, gpa, hpa, ent); + } + } +} + +static void audit_mappings(struct kvm_vcpu *vcpu) +{ + int i; + + if (vcpu-mmu.root_level == 4) + audit_mappings_page(vcpu, vcpu-mmu.root_hpa, 0, 4); + else + for (i = 0; i 4; ++i) + if (vcpu-mmu.pae_root[i] PT_PRESENT_MASK) + audit_mappings_page(vcpu, + vcpu-mmu.pae_root[i], + i 30, + 2); +} + +static int count_rmaps(struct kvm_vcpu *vcpu) +{ + int nmaps = 0; + int i, j, k; + + for (i = 0; i KVM_MEMORY_SLOTS; ++i) { + struct kvm_memory_slot *m = vcpu-kvm-memslots[i]; + struct kvm_rmap_desc *d; + + for (j = 0; j m-npages; ++j) { + struct page *page = m-phys_mem[j]; + + if (!page-private) + continue; + if (!(page-private 1)) { + ++nmaps; + continue; + } + d = (struct kvm_rmap_desc *)(page-private ~1ul); + while (d) { + for (k = 0; k RMAP_EXT; ++k) + if (d-shadow_ptes[k]) + ++nmaps; + else + break; + d = d-more; + } + } + } + return nmaps; +} + +static int count_writable_mappings(struct kvm_vcpu *vcpu) +{ + int nmaps = 0; + struct kvm_mmu_page *page; + int i; + + list_for_each_entry(page, vcpu-kvm-active_mmu_pages, link) { + u64 *pt = __va(page-page_hpa); + + if (page-role.level != PT_PAGE_TABLE_LEVEL) + continue; + + for (i = 0; i PT64_ENT_PER_PAGE; ++i) { + u64 ent = pt[i]; + + if (!(ent PT_PRESENT_MASK)) + continue; + if (!(ent PT_WRITABLE_MASK)) + continue; + ++nmaps; + } + } + return nmaps; +} + +static void audit_rmap(struct kvm_vcpu *vcpu) +{ + int n_rmap = count_rmaps(vcpu);
Re: [kvm-devel] [PATCH 0/33] KVM: MMU: Cache shadow page tables
On Thu, 04 Jan 2007 17:48:45 +0200 Avi Kivity [EMAIL PROTECTED] wrote: The current kvm shadow page table implementation does not cache shadow page tables (except for global translations, used for kernel addresses) across context switches. This means that after a context switch, every memory access will trap into the host. After a while, the shadow page tables will be rebuild, and the guest can proceed at native speed until the next context switch. The natural solution, then, is to cache shadow page tables across context switches. Unfortunately, this introduces a bucketload of problems: - the guest does not notify the processor (and hence kvm) that it modifies a page table entry if it has reason to believe that the modification will be followed by a tlb flush. It becomes necessary to write-protect guest page tables so that we can use the page fault when the access occurs as a notification. - write protecting the guest page tables means we need to keep track of which ptes map those guest page table. We need to add reverse mapping for all mapped writable guest pages. - when the guest does access the write-protected page, we need to allow it to perform the write in some way. We do that either by emulating the write, or removing all shadow page tables for that page and allowing the write to proceed, depending on circumstances. This patchset implements the ideas above. While a lot of tuning remains to be done (for example, a sane page replacement algorithm), a guest running with this patchset applied is much faster and more responsive than with 2.6.20-rc3. Some preliminary benchmarks are available in http://article.gmane.org/gmane.comp.emulators.kvm.devel/661. The patchset is bisectable compile-wise. Is this intended for 2.6.20, or would you prefer that we release what we have now and hold this off for 2.6.21? - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/33] KVM: MMU: Cache shadow page tables
* Avi Kivity [EMAIL PROTECTED] wrote: Andrew Morton wrote: Is this intended for 2.6.20, or would you prefer that we release what we have now and hold this off for 2.6.21? Even though these patches are potentially destabilazing, I'd like them (and a few other patches) to go into 2.6.20: - kvm did not exist in 2.6.19, hence we cannot regress from that - this patchset is the difference between a working proof of concept and a generally usable system - from my testing, it's quite stable seconded - i have tested the new MMU changes quite extensively and they are converging nicely. It brings down context-switch costs by a factor of 10 and more, even for microbenchmarks: instead of throwing away the full shadow pagetable hiearchy we have worked so hard to construct this patchset allows the intelligent caching of shadow pagetables. The effect is human-visible as well - the system got visibly snappier. (I'd increase the shadow cache pool from the current 256 pages to at least 1024 pages, but that's a detail.) Acked-by: Ingo Molnar [EMAIL PROTECTED] Ingo - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 2/9] KVM: Initialize vcpu-kvm a little earlier
Fixes oops on early close of /dev/kvm. Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/kvm_main.c === --- linux-2.6.orig/drivers/kvm/kvm_main.c +++ linux-2.6/drivers/kvm/kvm_main.c @@ -230,6 +230,7 @@ static int kvm_dev_open(struct inode *in struct kvm_vcpu *vcpu = kvm-vcpus[i]; mutex_init(vcpu-mutex); + vcpu-kvm = kvm; vcpu-mmu.root_hpa = INVALID_PAGE; INIT_LIST_HEAD(vcpu-free_pages); } @@ -530,7 +531,6 @@ static int kvm_dev_ioctl_create_vcpu(str vcpu-guest_fx_image = vcpu-host_fx_image + FX_IMAGE_SIZE; vcpu-cpu = -1; /* First load will set up TR */ - vcpu-kvm = kvm; r = kvm_arch_ops-vcpu_create(vcpu); if (r 0) goto out_free_vcpus; - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 4/9] KVM: Add missing 'break'
Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/kvm_main.c === --- linux-2.6.orig/drivers/kvm/kvm_main.c +++ linux-2.6/drivers/kvm/kvm_main.c @@ -1922,6 +1922,7 @@ static long kvm_dev_ioctl(struct file *f num_msrs_to_save * sizeof(u32))) goto out; r = 0; + break; } default: ; - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 8/9] KVM: Simplify mmu_alloc_roots()
From: Ingo Molnar [EMAIL PROTECTED] Small optimization/cleanup: page == page_header(page-page_hpa) Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Avi Kivity [EMAIL PROTECTED] Index: linux-2.6/drivers/kvm/mmu.c === --- linux-2.6.orig/drivers/kvm/mmu.c +++ linux-2.6/drivers/kvm/mmu.c @@ -820,9 +820,9 @@ static void mmu_alloc_roots(struct kvm_v hpa_t root = vcpu-mmu.root_hpa; ASSERT(!VALID_PAGE(root)); - root = kvm_mmu_get_page(vcpu, root_gfn, 0, - PT64_ROOT_LEVEL, 0, NULL)-page_hpa; - page = page_header(root); + page = kvm_mmu_get_page(vcpu, root_gfn, 0, + PT64_ROOT_LEVEL, 0, NULL); + root = page-page_hpa; ++page-root_count; vcpu-mmu.root_hpa = root; return; @@ -836,10 +836,10 @@ static void mmu_alloc_roots(struct kvm_v root_gfn = vcpu-pdptrs[i] PAGE_SHIFT; else if (vcpu-mmu.root_level == 0) root_gfn = 0; - root = kvm_mmu_get_page(vcpu, root_gfn, i 30, + page = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, !is_paging(vcpu), - NULL)-page_hpa; - page = page_header(root); + NULL); + root = page-page_hpa; ++page-root_count; vcpu-mmu.pae_root[i] = root | PT_PRESENT_MASK; } - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel