Hi Avi, Andrew, please see the comments on the need for a atomic get_user_pages() below.
On Sat, Dec 22, 2007 at 09:42:27PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: > > In preparation for a mmu spinlock, avoid schedules in mmu_set_spte() > > by using follow_page() instead of get_user_pages(). > > > > The fault handling work by get_user_pages() now happens outside the lock. > > > > Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> > > > > Index: kvm.quilt/arch/x86/kvm/mmu.c > > @@ -983,8 +985,11 @@ static int nonpaging_map(struct kvm_vcpu > > table = __va(table_addr); > > > > if (level == 1) { > > + struct page *page; > > + page = gfn_to_page(vcpu->kvm, gfn); > > mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL, > > 0, write, 1, &pt_write, gfn); > > + kvm_release_page_clean(page); > > return pt_write || is_io_pte(table[index]); > > } > > Is this hunk necessary? It will swap in the page in case it's swapped > out, but not sure we care about performance in that case. The issue is that the mmap call done by QEMU won't instantiate the pte's of the guest memory area. So it needs to either do that get_user_pages()->handle_mm_fault() sequence outside mmu_set_spte() or mmap() has to pass MAP_POPULATE, prefaulting all guest pages (which is highly suboptimal). > > n.c > > @@ -453,6 +453,25 @@ static unsigned long gfn_to_hva(struct k > > return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE); > > } > > > > +struct page *kvm_follow_page(struct kvm *kvm, gfn_t gfn) > > +{ > > + unsigned long addr; > > + struct vm_area_struct *vma; > > + > > + addr = gfn_to_hva(kvm, gfn); > > + /* MMIO access */ > > + if (kvm_is_error_hva(addr)) { > > + get_page(bad_page); > > + return bad_page; > > + } > > + > > + vma = find_vma(current->mm, addr); > > + if (!vma) > > + return NULL; > > + > > + return follow_page(vma, addr, FOLL_GET|FOLL_TOUCH); > > +} > > It may be better to create get_user_page_inatomic() as the last four > lines rather than exporting follow_page(). Maybe an mm maintainer can > comment. Andrew, KVM needs to find the struct page which maps to a certain address in the QEMU process without sleeping since that lookup is performed with a spinlock held. So it needs to either export follow_page() or use a get_user_pages() variant which doesnt call cond_resched() or handle_mm_fault(), simply failing if the page is not pagetable mapped. What you think? > > --- kvm.quilt.orig/arch/x86/kvm/paging_tmpl.h > > +++ kvm.quilt/arch/x86/kvm/paging_tmpl.h > > @@ -67,6 +67,7 @@ struct guest_walker { > > gfn_t table_gfn[PT_MAX_FULL_LEVELS]; > > pt_element_t ptes[PT_MAX_FULL_LEVELS]; > > gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; > > + struct page *page; > > unsigned pt_access; > > unsigned pte_access; > > gfn_t gfn; > > @@ -203,14 +204,18 @@ walk: > > --walker->level; > > } > > > > + walker->page = gfn_to_page(vcpu->kvm, walker->gfn); > > + > > if (write_fault && !is_dirty_pte(pte)) { > > bool ret; > > > > mark_page_dirty(vcpu->kvm, table_gfn); > > ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, > > pte|PT_DIRTY_MASK); > > - if (ret) > > + if (ret) { > > + kvm_release_page_clean(walker->page); > > goto walk; > > + } > > pte |= PT_DIRTY_MASK; > > mutex_lock(&vcpu->kvm->lock); > > kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte)); > > @@ -323,8 +328,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu > > r = kvm_read_guest_atomic(vcpu->kvm, > > walker->pte_gpa[level - 2], > > &curr_pte, sizeof(curr_pte)); > > - if (r || curr_pte != walker->ptes[level - 2]) > > - return NULL; > > + if (r || curr_pte != walker->ptes[level - 2]) { > > + shadow_ent = NULL; > > + goto out; > > + } > > } > > shadow_addr = __pa(shadow_page->spt); > > shadow_pte = shadow_addr | PT_PRESENT_MASK | PT_ACCESSED_MASK > > @@ -336,7 +343,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu > > user_fault, write_fault, > > walker->ptes[walker->level-1] & PT_DIRTY_MASK, > > ptwrite, walker->gfn); > > - > > +out: > > + kvm_release_page_clean(walker->page); > > return shadow_ent; > > } > > > > walker->page is never used, so I think it can be completely dropped. > mmu_set_spte() already does the right thing without its help. The page being mapped by the guest needs to be faulted in by someone... So while walker->page is unused, it guarantees that the handler will properly fault in the pagetables to know which physical address the page resides at. ie. it is the place which guarantees that the page gets mapped if its already in memory or swapped in and then mapped. Thanks. > -- > Do not meddle in the internals of kernels, for they are subtle and quick to > panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel