[PATCH] kvm: rework remove-write-access for a slot

2010-06-02 Thread Lai Jiangshan
Current code uses slot_bitmap to find ptes who map a page
from the memory slot, it is not precise: some ptes in the shadow page
are not map any page from the memory slot.

This patch uses rmap to find the ptes precisely, and remove
the unused slot_bitmap.

Signed-off-by: Lai Jiangshan 
---
diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 1e7ecdd..d749399 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -183,13 +183,6 @@ Shadow pages contain the following information:
 perform a reverse map from a pte to a gfn. When role.direct is set, any
 element of this array can be calculated from the gfn field when used, in
 this case, the array of gfns is not allocated. See role.direct and gfn.
-  slot_bitmap:
-A bitmap containing one bit per memory slot.  If the page contains a pte
-mapping a page from memory slot n, then bit n of slot_bitmap will be set
-(if a page is aliased among several slots, then it is not guaranteed that
-all slots will be marked).
-Used during dirty logging to avoid scanning a shadow page if none if its
-pages need tracking.
   root_count:
 A counter keeping track of how many hardware registers (guest cr3 or
 pdptrs) are now pointing at the page.  While this counter is nonzero, the
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0cd0f29..bf4f198 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -197,11 +197,6 @@ struct kvm_mmu_page {
u64 *spt;
/* hold the gfn of each spte inside spt */
gfn_t *gfns;
-   /*
-* One bit set per slot which has memory
-* in this shadow page.
-*/
-   DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
bool multimapped; /* More than one parent_pte? */
bool unsync;
int root_count;  /* Currently serving as active root */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c16c4ca..e097e81 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -941,7 +941,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
  PAGE_SIZE);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
-   bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
sp->multimapped = 0;
sp->parent_pte = parent_pte;
--vcpu->kvm->arch.n_free_mmu_pages;
@@ -1660,14 +1659,6 @@ restart:
}
 }
 
-static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
-{
-   int slot = memslot_id(kvm, gfn);
-   struct kvm_mmu_page *sp = page_header(__pa(pte));
-
-   __set_bit(slot, sp->slot_bitmap);
-}
-
 static void mmu_convert_notrap(struct kvm_mmu_page *sp)
 {
int i;
@@ -1979,7 +1970,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (!was_rmapped && is_large_pte(*sptep))
++vcpu->kvm->stat.lpages;
 
-   page_header_update_slot(vcpu->kvm, sptep, gfn);
if (!was_rmapped) {
rmap_count = rmap_add(vcpu, sptep, gfn);
kvm_release_pfn_clean(pfn);
@@ -2975,22 +2965,38 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
mmu_free_memory_caches(vcpu);
 }
 
+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+   while (spte) {
+   /* avoid RMW */
+   if (is_writable_pte(*spte))
+   *spte &= ~PT_WRITABLE_MASK;
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+}
+
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 {
-   struct kvm_mmu_page *sp;
+   int i;
+   unsigned long gfn_offset;
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   struct kvm_memory_slot *memslot = &slots->memslots[slot];
 
-   list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
-   int i;
-   u64 *pt;
+   for (gfn_offset = 0; gfn_offset < memslot->npages; gfn_offset++) {
+   rmapp_remove_write_access(kvm, &memslot->rmap[gfn_offset]);
 
-   if (!test_bit(slot, sp->slot_bitmap))
-   continue;
+   for (i = 0; i < KVM_NR_PAGE_SIZES - 1; i++) {
+   unsigned long gfn = memslot->base_gfn + gfn_offset;
+   unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
+   int idx = gfn / huge - memslot->base_gfn / huge;
 
-   pt = sp->spt;
-   for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
-   /* avoid RMW */
-   if (is_writable_pte(pt[i]))
-   pt[i] &= ~PT_WRITABLE_MASK;
+   if (!(gfn_offset || (gfn % huge)))
+   bre

Re: [PATCH] kvm: rework remove-write-access for a slot

2010-06-02 Thread Avi Kivity

On 06/02/2010 11:53 AM, Lai Jiangshan wrote:

Current code uses slot_bitmap to find ptes who map a page
from the memory slot, it is not precise: some ptes in the shadow page
are not map any page from the memory slot.

This patch uses rmap to find the ptes precisely, and remove
the unused slot_bitmap.

   


Patch looks good; a couple of comments:

- We might see a slowdown with !tdp, since we no longer have locality.  
Each page will map to an spte in a different page.  However, it's still 
worth it in my opinion.
- I thought of a different approach to write protection: write protect 
the L4 sptes, on write fault add write permission to the L4 spte and 
write protect the L3 sptes that it points to, etc.  This method can use 
the slot bitmap to reduce the number of write faults.  However we can 
reintroduce the slot bitmap if/when we use the method, this shouldn't 
block the patch.




+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long 
*rmapp)

+{
+u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+while (spte) {
+/* avoid RMW */
+if (is_writable_pte(*spte))
+*spte &= ~PT_WRITABLE_MASK;


Must use an atomic operation here to avoid losing dirty or accessed bit.


+spte = rmap_next(kvm, rmapp, spte);
+}
+}



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: rework remove-write-access for a slot

2010-06-04 Thread Lai Jiangshan
Avi Kivity wrote:
> On 06/02/2010 11:53 AM, Lai Jiangshan wrote:
>> Current code uses slot_bitmap to find ptes who map a page
>> from the memory slot, it is not precise: some ptes in the shadow page
>> are not map any page from the memory slot.
>>
>> This patch uses rmap to find the ptes precisely, and remove
>> the unused slot_bitmap.
>>
>>
> 
> Patch looks good; a couple of comments:
> 
> - We might see a slowdown with !tdp, since we no longer have locality. 
> Each page will map to an spte in a different page.  However, it's still
> worth it in my opinion.

Yes, this patch hurts the cache since we no longer have locality.
And if most pages of the slot are not mapped(rmap_next(kvm, rmapp, NULL)==NULL),
this patch will worse than old method I think.

This patch do things straightly, precisely.

> - I thought of a different approach to write protection: write protect
> the L4 sptes, on write fault add write permission to the L4 spte and
> write protect the L3 sptes that it points to, etc.  This method can use
> the slot bitmap to reduce the number of write faults.  However we can
> reintroduce the slot bitmap if/when we use the method, this shouldn't
> block the patch.

It is very a good approach and it is blazing fast.

I have no time to implement it currently,
could you update it into the TODO list?

> 
>>
>> +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long
>> *rmapp)
>> +{
>> +u64 *spte = rmap_next(kvm, rmapp, NULL);
>> +
>> +while (spte) {
>> +/* avoid RMW */
>> +if (is_writable_pte(*spte))
>> +*spte &= ~PT_WRITABLE_MASK;
> 
> Must use an atomic operation here to avoid losing dirty or accessed bit.
> 

Atomic operation is too expensive, I retained the comment "/* avoid RMW */"
and wait someone take a good approach for it.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: rework remove-write-access for a slot

2010-06-04 Thread Marcelo Tosatti
On Fri, Jun 04, 2010 at 04:14:08PM +0800, Lai Jiangshan wrote:
> Avi Kivity wrote:
> > On 06/02/2010 11:53 AM, Lai Jiangshan wrote:
> >> Current code uses slot_bitmap to find ptes who map a page
> >> from the memory slot, it is not precise: some ptes in the shadow page
> >> are not map any page from the memory slot.
> >>
> >> This patch uses rmap to find the ptes precisely, and remove
> >> the unused slot_bitmap.

Note that the current code is precise: memslot_id does unalias_gfn.

> > Patch looks good; a couple of comments:
> > 
> > - We might see a slowdown with !tdp, since we no longer have locality. 
> > Each page will map to an spte in a different page.  However, it's still
> > worth it in my opinion.
> 
> Yes, this patch hurts the cache since we no longer have locality.
> And if most pages of the slot are not mapped(rmap_next(kvm, rmapp, 
> NULL)==NULL),
> this patch will worse than old method I think.

Can you get some numbers before/after patch, with/without lots of shadow
pages instantiated? Better with large amount of memory for the guest.

Because shrinking kvm_mmu_page is good.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: rework remove-write-access for a slot

2010-06-06 Thread Avi Kivity

On 06/04/2010 11:14 AM, Lai Jiangshan wrote:




- I thought of a different approach to write protection: write protect
the L4 sptes, on write fault add write permission to the L4 spte and
write protect the L3 sptes that it points to, etc.  This method can use
the slot bitmap to reduce the number of write faults.  However we can
reintroduce the slot bitmap if/when we use the method, this shouldn't
block the patch.
 

It is very a good approach and it is blazing fast.

I have no time to implement it currently,
could you update it into the TODO list?
   


Done.


+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long
*rmapp)
+{
+u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+while (spte) {
+/* avoid RMW */
+if (is_writable_pte(*spte))
+*spte&= ~PT_WRITABLE_MASK;
   

Must use an atomic operation here to avoid losing dirty or accessed bit.

 

Atomic operation is too expensive, I retained the comment "/* avoid RMW */"
and wait someone take a good approach for it.
   


You are right, it is an existing problem.  I just posted a patchset 
which fixes the problem, when it's merged please rebase on top.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html