Re: KVM: MMU: optimize set_spte for page sync
Marcelo Tosatti wrote: Do you have objections for submitting this patch for 2.6.28 ? The hash lookup kills performance of pagetable write + context switch intensive workloads. Can you quantify? 2.6.28 is out of the question, 2.6.28.stable is possible, depending on the actual performance difference. I'd hate to introduce a correctness problem just to improve performance. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
On Wed, Nov 26, 2008 at 01:12:20PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: Here it goes. KVM: MMU: optimize set_spte for page sync The write protect verification in set_spte is unnecessary for page sync. Its guaranteed that, if the unsync spte was writable, the target page does not have a write protected shadow (if it had, the spte would have been write protected under mmu_lock by rmap_write_protect before). Same reasoning applies to mark_page_dirty: the gfn has been marked as dirty via the pagefault path. The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. Applied, thanks. Avi, Do you have objections for submitting this patch for 2.6.28 ? The hash lookup kills performance of pagetable write + context switch intensive workloads. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 44/44] KVM: MMU: optimize set_spte for page sync
From: Marcelo Tosatti [EMAIL PROTECTED] The write protect verification in set_spte is unnecessary for page sync. Its guaranteed that, if the unsync spte was writable, the target page does not have a write protected shadow (if it had, the spte would have been write protected under mmu_lock by rmap_write_protect before). Same reasoning applies to mark_page_dirty: the gfn has been marked as dirty via the pagefault path. The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Signed-off-by: Avi Kivity [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index fa3486d..dd20b19 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1593,6 +1593,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, spte |= PT_WRITABLE_MASK; + /* +* Optimization: for pte sync, if spte was writable the hash +* lookup is unnecessary (and expensive). Write protection +* is responsibility of mmu_get_page / kvm_sync_page. +* Same reasoning can be applied to dirty page accounting. +*/ + if (!can_unsync is_writeble_pte(*shadow_pte)) + goto set_pte; + if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { pgprintk(%s: found shadow page for %lx, marking ro\n, __func__, gfn); -- 1.6.0.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: optimize set_spte for page sync
From: Marcelo Tosatti [EMAIL PROTECTED] The write protect verification in set_spte is unnecessary for page sync. Its guaranteed that, if the unsync spte was writable, the target page does not have a write protected shadow (if it had, the spte would have been write protected under mmu_lock by rmap_write_protect before). Same reasoning applies to mark_page_dirty: the gfn has been marked as dirty via the pagefault path. The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Signed-off-by: Avi Kivity [EMAIL PROTECTED] diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index fa3486d..dd20b19 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1593,6 +1593,15 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, spte |= PT_WRITABLE_MASK; + /* +* Optimization: for pte sync, if spte was writable the hash +* lookup is unnecessary (and expensive). Write protection +* is responsibility of mmu_get_page / kvm_sync_page. +* Same reasoning can be applied to dirty page accounting. +*/ + if (!can_unsync is_writeble_pte(*shadow_pte)) + goto set_pte; + if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { pgprintk(%s: found shadow page for %lx, marking ro\n, __func__, gfn); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
Marcelo Tosatti wrote: Here it goes. KVM: MMU: optimize set_spte for page sync The write protect verification in set_spte is unnecessary for page sync. Its guaranteed that, if the unsync spte was writable, the target page does not have a write protected shadow (if it had, the spte would have been write protected under mmu_lock by rmap_write_protect before). Same reasoning applies to mark_page_dirty: the gfn has been marked as dirty via the pagefault path. The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
Marcelo Tosatti wrote: *shadow_pte can point to a different page if the guest updates pagetable, there is a fault before resync, the fault updates the spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn cache is updated since: } else if (pfn != spte_to_pfn(*shadow_pte)) { printk(hfn old %lx new %lx\n, spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu-kvm, shadow_pte); Okay. Please resend but without the reversal of can_unsync, it will make a more readable patch. If you like, send a follow on that only does the reversal. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
On Tue, Nov 25, 2008 at 04:38:13PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: *shadow_pte can point to a different page if the guest updates pagetable, there is a fault before resync, the fault updates the spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn cache is updated since: } else if (pfn != spte_to_pfn(*shadow_pte)) { printk(hfn old %lx new %lx\n, spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu-kvm, shadow_pte); Okay. Please resend but without the reversal of can_unsync, it will make a more readable patch. If you like, send a follow on that only does the reversal. Here it goes. KVM: MMU: optimize set_spte for page sync The write protect verification in set_spte is unnecessary for page sync. Its guaranteed that, if the unsync spte was writable, the target page does not have a write protected shadow (if it had, the spte would have been write protected under mmu_lock by rmap_write_protect before). Same reasoning applies to mark_page_dirty: the gfn has been marked as dirty via the pagefault path. The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -1593,6 +1593,15 @@ static int set_spte(struct kvm_vcpu *vcp spte |= PT_WRITABLE_MASK; + /* +* Optimization: for pte sync, if spte was writable the hash +* lookup is unnecessary (and expensive). Write protection +* is responsibility of mmu_get_page / kvm_sync_page. +* Same reasoning can be applied to dirty page accounting. +*/ + if (!can_unsync is_writeble_pte(*shadow_pte)) + goto set_pte; + if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { pgprintk(%s: found shadow page for %lx, marking ro\n, __func__, gfn); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
On Sun, Nov 23, 2008 at 12:36:29PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp spte |= PT_WRITABLE_MASK; - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { +/* + * Optimization: for pte sync, if spte was writable the hash + * lookup is unnecessary (and expensive). Write protection + * is responsibility of mmu_get_page / kvm_sync_page. + * Same reasoning can be applied to dirty page accounting. + */ +if (sync_page is_writeble_pte(*shadow_pte)) +goto set_pte; What if *shadow_pte points at a different page? Is that possible? To a different gfn? Then sync_page will have nuked the spte: if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) || !(gpte PT_ACCESSED_MASK)) { u64 nonpresent; .. set_shadow_pte(sp-spt[i], nonpresent); } Otherwise: /* * Using the cached information from sp-gfns is safe because: * - The spte has a reference to the struct page, so the pfn for a given * gfn can't change unless all sptes pointing to it are nuked first. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: optimize set_spte for page sync
On Mon, Nov 24, 2008 at 01:04:23PM +0100, Marcelo Tosatti wrote: On Sun, Nov 23, 2008 at 12:36:29PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp spte |= PT_WRITABLE_MASK; - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { + /* + * Optimization: for pte sync, if spte was writable the hash + * lookup is unnecessary (and expensive). Write protection + * is responsibility of mmu_get_page / kvm_sync_page. + * Same reasoning can be applied to dirty page accounting. + */ + if (sync_page is_writeble_pte(*shadow_pte)) + goto set_pte; What if *shadow_pte points at a different page? Is that possible? To a different gfn? Then sync_page will have nuked the spte: if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) || !(gpte PT_ACCESSED_MASK)) { u64 nonpresent; .. set_shadow_pte(sp-spt[i], nonpresent); } Otherwise: /* * Using the cached information from sp-gfns is safe because: * - The spte has a reference to the struct page, so the pfn for a given * gfn can't change unless all sptes pointing to it are nuked first. *shadow_pte can point to a different page if the guest updates pagetable, there is a fault before resync, the fault updates the spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn cache is updated since: } else if (pfn != spte_to_pfn(*shadow_pte)) { printk(hfn old %lx new %lx\n, spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu-kvm, shadow_pte); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: MMU: optimize set_spte for page sync
The write protect verification in set_spte is unnecessary for page sync. Its guaranteed that, if the unsync spte was writable, the target page does not have a write protected shadow (if it had, the spte would have been write protected under mmu_lock by rmap_write_protect before). Same reasoning applies to mark_page_dirty: the gfn has been marked as dirty via the pagefault path. The cost of hash table and memslot lookups are quite significant if the workload is pagetable write intensive resulting in increased mmu_lock contention. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -1529,7 +1529,7 @@ static int kvm_unsync_page(struct kvm_vc } static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, - bool can_unsync) + bool sync_page) { struct kvm_mmu_page *shadow; @@ -1539,7 +1539,7 @@ static int mmu_need_write_protect(struct return 1; if (shadow-unsync) return 0; - if (can_unsync oos_shadow) + if (!sync_page oos_shadow) return kvm_unsync_page(vcpu, shadow); return 1; } @@ -1550,7 +1550,7 @@ static int set_spte(struct kvm_vcpu *vcp unsigned pte_access, int user_fault, int write_fault, int dirty, int largepage, gfn_t gfn, pfn_t pfn, bool speculative, - bool can_unsync) + bool sync_page) { u64 spte; int ret = 0; @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp spte |= PT_WRITABLE_MASK; - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { + /* +* Optimization: for pte sync, if spte was writable the hash +* lookup is unnecessary (and expensive). Write protection +* is responsibility of mmu_get_page / kvm_sync_page. +* Same reasoning can be applied to dirty page accounting. +*/ + if (sync_page is_writeble_pte(*shadow_pte)) + goto set_pte; + + if (mmu_need_write_protect(vcpu, gfn, sync_page)) { pgprintk(%s: found shadow page for %lx, marking ro\n, __func__, gfn); ret = 1; @@ -1648,7 +1657,7 @@ static void mmu_set_spte(struct kvm_vcpu } } if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault, - dirty, largepage, gfn, pfn, speculative, true)) { + dirty, largepage, gfn, pfn, speculative, false)) { if (write_fault) *ptwrite = 1; kvm_x86_ops-tlb_flush(vcpu); Index: kvm/arch/x86/kvm/paging_tmpl.h === --- kvm.orig/arch/x86/kvm/paging_tmpl.h +++ kvm/arch/x86/kvm/paging_tmpl.h @@ -580,7 +580,7 @@ static int FNAME(sync_page)(struct kvm_v pte_access = sp-role.access FNAME(gpte_access)(vcpu, gpte); set_spte(vcpu, sp-spt[i], pte_access, 0, 0, is_dirty_pte(gpte), 0, gfn, -spte_to_pfn(sp-spt[i]), true, false); +spte_to_pfn(sp-spt[i]), true, true); } return !nr_present; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html