Re: [patch 10/10] KVM: MMU: speed up mmu_unsync_walk
On Fri, Sep 19, 2008 at 06:26:56PM -0700, Avi Kivity wrote: >> --- kvm.orig/include/asm-x86/kvm_host.h >> +++ kvm/include/asm-x86/kvm_host.h >> @@ -201,6 +201,7 @@ struct kvm_mmu_page { >> u64 *parent_pte; /* !multimapped */ >> struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */ >> }; >> +DECLARE_BITMAP(unsync_child_bitmap, 512); >> }; >> > > Later, we can throw this bitmap out to a separate object. Yeah. > Also, it may make sense to replace it with an array of u16s. Why? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 09/10] KVM: MMU: out of sync shadow core v2
On Fri, Sep 19, 2008 at 06:22:52PM -0700, Avi Kivity wrote: > Instead of private, have an object contain both callback and private > data, and use container_of(). Reduces the chance of type errors. OK. >> +while (parent->unsync_children) { >> +for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { >> +u64 ent = sp->spt[i]; >> + >> +if (is_shadow_present_pte(ent)) { >> +struct kvm_mmu_page *child; >> +child = page_header(ent & PT64_BASE_ADDR_MASK); > > What does this do? Walks all children of given page with no efficiency. Its replaced later by the bitmap version. >> +static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) >> +{ >> +if (sp->role.glevels != vcpu->arch.mmu.root_level) { >> +kvm_mmu_zap_page(vcpu->kvm, sp); >> +return 1; >> +} >> > > Suppose we switch to real mode, touch a pte, switch back. Is this handled? The shadow page will go unsync on pte touch and resynced as soon as its visible (after return to paging). Or, while still in real mode, it might be zapped by kvm_mmu_get_page->kvm_sync_page. Am I missing something? >> @@ -991,8 +1066,18 @@ static struct kvm_mmu_page *kvm_mmu_get_ >> gfn, role.word); >> index = kvm_page_table_hashfn(gfn); >> bucket = &vcpu->kvm->arch.mmu_page_hash[index]; >> -hlist_for_each_entry(sp, node, bucket, hash_link) >> -if (sp->gfn == gfn && sp->role.word == role.word) { >> +hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link) >> +if (sp->gfn == gfn) { >> +if (sp->unsync) >> +if (kvm_sync_page(vcpu, sp)) >> +continue; >> + >> +if (sp->role.word != role.word) >> +continue; >> + >> +if (sp->unsync_children) >> +vcpu->arch.mmu.need_root_sync = 1; >> > > mmu_reload() maybe? Hum, will think about it. >> static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) >> -return 0; >> +return ret; >> } >> > > Why does the caller care if zap also zapped some other random pages? To > restart walking the list? Yes. The next element for_each_entry_safe saved could have been zapped. >> +/* don't unsync if pagetable is shadowed with multiple roles */ >> +hlist_for_each_entry_safe(s, node, n, bucket, hash_link) { >> +if (s->gfn != sp->gfn || s->role.metaphysical) >> +continue; >> +if (s->role.word != sp->role.word) >> +return 1; >> +} >> > > This will happen for nonpae paging. But why not allow it? Zap all > unsynced pages on mode switch. > > Oh, if a page is both a page directory and page table, yes. Yes. > So to allow nonpae oos, check the level instead. Windows 2008 64-bit has all sorts of sharing a pagetable at multiple levels too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 07/10] KVM: MMU: mmu_parent_walk
On Fri, Sep 19, 2008 at 05:56:46PM -0700, Avi Kivity wrote: >> +} while (level > start_level-1); >> +} >> + > > Could be much simplified with recursion, no? As the depth is limited to > 4, there's no stack overflow problem. The early version was recursive, but since its a generic helper I preferred a non-recursive function. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 06/10] KVM: x86: trap invlpg
On Fri, Sep 19, 2008 at 05:53:22PM -0700, Avi Kivity wrote: > Marcelo Tosatti wrote: >> +static int FNAME(shadow_invlpg_entry)(struct kvm_shadow_walk *_sw, >> + struct kvm_vcpu *vcpu, u64 addr, >> + u64 *sptep, int level) >> +{ >> + >> +if (level == PT_PAGE_TABLE_LEVEL) { >> +if (is_shadow_present_pte(*sptep)) >> +rmap_remove(vcpu->kvm, sptep); >> +set_shadow_pte(sptep, shadow_trap_nonpresent_pte); >> > > Need to flush the real tlb as well. The local TLB you mean? +void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) +{ + spin_lock(&vcpu->kvm->mmu_lock); + vcpu->arch.mmu.invlpg(vcpu, gva); + spin_unlock(&vcpu->kvm->mmu_lock); + kvm_mmu_flush_tlb(vcpu); +} +EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 03/10] KVM: MMU: do not write-protect large mappings
On Fri, Sep 19, 2008 at 05:29:28PM -0700, Avi Kivity wrote: > Marcelo Tosatti wrote: >> There is not much point in write protecting large mappings. This >> can only happen when a page is shadowed during the window between >> is_largepage_backed and mmu_lock acquision. Zap the entry instead, so >> the next pagefault will find a shadowed page via is_largepage_backed and >> fallback to 4k translations. >> >> Simplifies out of sync shadow. >> >> Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> >> >> Index: kvm/arch/x86/kvm/mmu.c >> === >> --- kvm.orig/arch/x86/kvm/mmu.c >> +++ kvm/arch/x86/kvm/mmu.c >> @@ -1180,11 +1180,16 @@ static int set_spte(struct kvm_vcpu *vcp >> || (write_fault && !is_write_protection(vcpu) && !user_fault)) { >> struct kvm_mmu_page *shadow; >> + if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) { >> +ret = 1; >> +spte = shadow_trap_nonpresent_pte; >> +goto set_pte; >> +} > > Don't we need to drop a reference to the page? mmu_set_spte will do it if necessary: if (!was_rmapped) { rmap_add(vcpu, shadow_pte, gfn, largepage); if (!is_rmap_pte(*shadow_pte)) kvm_release_pfn_clean(pfn); But as noted, this part is wrong: @@ -307,11 +307,10 @@ static int FNAME(shadow_walk_entry)(stru return 1; } - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) + if (is_shadow_present_pte(*sptep)) return 0; - if (is_large_pte(*sptep)) - rmap_remove(vcpu->kvm, sptep); + WARN_ON (is_large_pte(*sptep)); Since it might still be necessary to replace a read-only large mapping (which rmap_write_protect will not nuke) with a normal pte pointer. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 02/10] KVM: MMU: move local TLB flush to mmu_set_spte
On Fri, Sep 19, 2008 at 05:21:09PM -0700, Avi Kivity wrote: > Marcelo Tosatti wrote: >> Since the sync page path can collapse flushes. >> >> Also only flush if the spte was writable before. >> >> Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> >> >> @@ -1241,9 +1239,12 @@ static void mmu_set_spte(struct kvm_vcpu >> } >> } >> if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault, >> - dirty, largepage, gfn, pfn, speculative)) >> + dirty, largepage, gfn, pfn, speculative)) { >> if (write_fault) >> *ptwrite = 1; >> +if (was_writeble) >> +kvm_x86_ops->tlb_flush(vcpu); >> +} >> > > I think we had cases where the spte.pfn contents changed, for example > when a large page was replaced by a normal page, True. And the TLB is not flushed now for large->normal replace, in case the pte thats faulting is read-only. The local (and remote) TLB's must be flushed on large->normal replace. (BTW the largepage patch is wrong, will reply to that soon). > and also: > >} else if (pfn != spte_to_pfn(*shadow_pte)) { That one is likely to crash the guest anyway, so I don't see the need for a flush there: > > Did you find out what's causing the errors in the first place (if > > zap is not used)? It worries me greatly. > Yes, the problem is that the rmap code does not handle the qemu > process > mappings from vanishing while there is a present rmap. If that > happens, > and there is a fault for a gfn whose qemu mapping has been removed, a > different physical zero page will be allocated: > > rmap a -> gfn 0 -> physical host page 0 > mapping for gfn 0 gets removed > guest faults in gfn 0 through the same pte "chain" > rmap a -> gfn 0 -> physical host page 1 > > When instantiating the shadow mapping for the second time, the > "is_rmap_pte" check succeeds, so we release the reference grabbed by > gfn_to_page() at mmu_set_spte(). We now have a shadow mapping > pointing > to a physical page without having an additional reference on that > page. > > The following makes the host not crash under such a condition, but > the condition itself is invalid leading to inconsistent state on the > guest. > So IMHO it shouldnt be allowed to happen in the first place. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Graceful shutdown for OpenBSD
[EMAIL PROTECTED] wrote: Maybe "power button pressed" or something? Yes. If OpenBSD supports power buttons, this should work. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] register mmio slots
Glauber Costa wrote: By analysing phys_offset, we know whether a region is an mmio region or not. If it is, register it as so. We don't reuse the same slot infrastructure already existant, because there is a relationship between the slot number for kvm the kernel module, and the index in the slots vector for libkvm. However, we can do best in the future and use only a single data structure for both. Why is kvm interested in emulated mmio regions, at all? @@ -1032,11 +1042,9 @@ void kvm_mutex_lock(void) int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr, unsigned int size) { -return kvm_register_coalesced_mmio(kvm_context, addr, size); } int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr, unsigned int size) { -return kvm_unregister_coalesced_mmio(kvm_context, addr, size); } Why? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/9] coalesce mmio regions with an explicit call
Glauber Costa wrote: Remove explicit calls to mmio coalescing. Rather, include it in the registration functions. index 5ae3960..2d97b34 100644 --- a/qemu/hw/e1000.c +++ b/qemu/hw/e1000.c @@ -942,18 +942,6 @@ e1000_mmio_map(PCIDevice *pci_dev, int region_num, d->mmio_base = addr; cpu_register_physical_memory(addr, PNPMMIO_SIZE, d->mmio_index); - -if (kvm_enabled()) { - int i; -uint32_t excluded_regs[] = { -E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS, -E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE -}; -qemu_kvm_register_coalesced_mmio(addr, excluded_regs[0]); -for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++) -qemu_kvm_register_coalesced_mmio(addr + excluded_regs[i] + 4, - excluded_regs[i + 1] - excluded_regs[i] - 4); -} } Where did all of this go? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] add debuging facilities to memory registration at libkvm
Glauber Costa wrote: +#ifdef DEBUG_MEMREG + fprintf(stderr, "%s, memory: gpa: %llx, size: %llx, uaddr: %llx, slot: %x, flags: %lx\n", + __func__, memory.guest_phys_addr, memory.memory_size, + memory.userspace_addr, memory.slot, memory.flags); +#endif Please wrap in a dprintf() macro or similar. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] substitute is_allocated_mem with more general is_containing_region
Glauber Costa wrote: is_allocated_mem is a function that checks if every relevant aspect of the memory slot match (start and size). Replace it with a more generic function that checks if a memory region is totally contained into another. The former case is also covered. I think enabling dirty page tracking requires the slot to match exactly. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] allow intersecting region to be on the boundary.
Glauber Costa wrote: Signed-off-by: Glauber Costa <[EMAIL PROTECTED]> --- libkvm/libkvm.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index e768e44..fa65c30 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -130,8 +130,8 @@ int get_intersecting_slot(unsigned long phys_addr) int i; for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS ; ++i) - if (slots[i].len && slots[i].phys_addr < phys_addr && - (slots[i].phys_addr + slots[i].len) > phys_addr) + if (slots[i].len && slots[i].phys_addr <= phys_addr && + (slots[i].phys_addr + slots[i].len) >= phys_addr) return i; return -1; consider slots[i].phys_addr = 0 slots[i].len = 1 phys_addr = 1 with the new calculation, i (well, not me personally) will be considered an intersecting slot. Not that I (me this time) can understand how you can calculate interval intersection without the entire interval. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html