Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-05-07 Thread Lai Jiangshan
Marcelo Tosatti wrote:
 On Thu, May 06, 2010 at 03:03:48PM +0800, Lai Jiangshan wrote:
 Marcelo Tosatti wrote:
 On Thu, Apr 29, 2010 at 09:43:40PM +0300, Avi Kivity wrote:
 On 04/29/2010 09:09 PM, Marcelo Tosatti wrote:
 You missed quadrant on 4mb large page emulation with shadow (see updated
 patch below).
 Good catch.

 Also for some reason i can't understand the assumption
 does not hold for large sptes with TDP, so reverted for now.
 It's unrelated to TDP, same issue with shadow.  I think the
 calculation is correct.  For example the 4th spte for a level=2 page
 will yield gfn=4*512.
 Under testing i see sp at level 2, with sp-gfn == 4096, mmu_set_spte
 setting index 8 to gfn 4096 (whereas kvm_mmu_page_get_gfn returns 4096 +
 8*512).

 Lai, can you please take a look at it? You should see the
 kvm_mmu_page_set_gfn BUG_ON by using -mem-path on hugetlbfs.

 Could you tell me how you test it? It will be better if I follow
 your test steps.
 
 mount -t hugetlbfs none /mnt/
 echo xyz  /proc/sys/vm/nr_hugepages
 qemu-kvm parameters -mem-path /mnt/
 
 I also hit the kvm_mmu_page_set_gfn BUG_ON, It is because
 FNAME(fetch)() set sp-gfn wrong. The patch:
 [PATCH] kvm: calculate correct gfn for small host pages which emulates large 
 guest pages
 fix it.

 I can not hit kvm_mmu_page_set_gfn BUG_ON after this patch also
 applied.

 So could you tell me your test steps:
 The host: ept/npt enabled? 64bit? testing codes in host?
 
 Intel EPT enabled.
 
 The guest: OS? PAE? 32bit? 64bit? testing codes in guest?
 
 FC12 guest.
 


I forgot to check if the calculation of base gfn is correct
in __direct_map().

Subject: [PATCH] kvm, tdp: calculate correct base gfn for non-DIR level

the base gfn calculation is incorrect in __direct_map(),
it does not calculate correctly when level=3 or 4.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Marcelo Tosatti mtosa...@redhat.com
---
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a5c6719..6986a6f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1955,7 +1956,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
}
 
if (*iterator.sptep == shadow_trap_nonpresent_pte) {
-   pseudo_gfn = (iterator.addr  PT64_DIR_BASE_ADDR_MASK) 
 PAGE_SHIFT;
+   u64 base_addr = iterator.addr;
+
+   base_addr = PT64_LVL_ADDR_MASK(iterator.level);
+   pseudo_gfn = base_addr  PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-05-07 Thread Marcelo Tosatti
On Fri, May 07, 2010 at 04:25:36PM +0800, Lai Jiangshan wrote:
 Subject: [PATCH] kvm, tdp: calculate correct base gfn for non-DIR level
 
 the base gfn calculation is incorrect in __direct_map(),
 it does not calculate correctly when level=3 or 4.
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Marcelo Tosatti mtosa...@redhat.com
 ---
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index a5c6719..6986a6f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1955,7 +1956,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t 
 v, int write,
   }
  
   if (*iterator.sptep == shadow_trap_nonpresent_pte) {
 - pseudo_gfn = (iterator.addr  PT64_DIR_BASE_ADDR_MASK) 
  PAGE_SHIFT;
 + u64 base_addr = iterator.addr;
 +
 + base_addr = PT64_LVL_ADDR_MASK(iterator.level);
 + pseudo_gfn = base_addr  PAGE_SHIFT;
   sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
 iterator.level - 1,
 1, ACC_ALL, iterator.sptep);

Looks good, please resend the whole patch.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-05-06 Thread Lai Jiangshan
Marcelo Tosatti wrote:
 On Thu, Apr 29, 2010 at 09:43:40PM +0300, Avi Kivity wrote:
 On 04/29/2010 09:09 PM, Marcelo Tosatti wrote:
 You missed quadrant on 4mb large page emulation with shadow (see updated
 patch below).
 Good catch.

 Also for some reason i can't understand the assumption
 does not hold for large sptes with TDP, so reverted for now.
 It's unrelated to TDP, same issue with shadow.  I think the
 calculation is correct.  For example the 4th spte for a level=2 page
 will yield gfn=4*512.
 
 Under testing i see sp at level 2, with sp-gfn == 4096, mmu_set_spte
 setting index 8 to gfn 4096 (whereas kvm_mmu_page_get_gfn returns 4096 +
 8*512).
 
 Lai, can you please take a look at it? You should see the
 kvm_mmu_page_set_gfn BUG_ON by using -mem-path on hugetlbfs.
 

Could you tell me how you test it? It will be better if I follow
your test steps.

I also hit the kvm_mmu_page_set_gfn BUG_ON, It is because
FNAME(fetch)() set sp-gfn wrong. The patch:
[PATCH] kvm: calculate correct gfn for small host pages which emulates large 
guest pages
fix it.

I can not hit kvm_mmu_page_set_gfn BUG_ON after this patch also
applied.

So could you tell me your test steps:
The host: ept/npt enabled? 64bit? testing codes in host?
The guest: OS? PAE? 32bit? 64bit? testing codes in guest?

Lai
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-30 Thread Avi Kivity

On 04/30/2010 05:25 AM, Lai Jiangshan wrote:



It's unrelated to TDP, same issue with shadow.  I think the calculation
is correct.  For example the 4th spte for a level=2 page will yield
gfn=4*512.
 

Avi, Marcelo
Thank you very much.

The calculation I used is correct.
   


Yes.  btw, can you update the patch to also correct mmu.txt?


The fault is comes from the FNAME(fetch) when using shadow page.

I have fix it in the patch
[RFC PATCH] kvm: calculate correct gfn for small host pages which emulates large 
guest pages

(It seems that the mail lists is unreachable, did you receive it?)
   


The lists are down at the moment.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-30 Thread Avi Kivity

On 04/30/2010 11:54 AM, Lai Jiangshan wrote:

Avi Kivity wrote:
   

On 04/30/2010 05:25 AM, Lai Jiangshan wrote:
 
   

It's unrelated to TDP, same issue with shadow.  I think the calculation
is correct.  For example the 4th spte for a level=2 page will yield
gfn=4*512.

 

Avi, Marcelo
Thank you very much.

The calculation I used is correct.

   

Yes.  btw, can you update the patch to also correct mmu.txt?
 

The corresponding content in mmu.txt are:
   role.direct:
 If set, leaf sptes reachable from this page are for a linear range.
 Examples include real mode translation, large guest pages backed by small
 host pages, and gpa-hpa translations when NPT or EPT is active.
 The linear range starts at (gfn  PAGE_SHIFT) and its size is determined
 by role.level (2MB for first level, 1GB for second level, 0.5TB for third
 level, 256TB for fourth level)
 If clear, this page corresponds to a guest page table denoted by the gfn
 field.

   gfn:
 Either the guest page table containing the translations shadowed by this
 page, or the base page frame for linear translations.  See role.direct.

These are correct. My patch is fully base on this document.
I think it is not need to be fixed.

Did I miss something?

   


sp-gfns can now be NULL, so the documentation of this field needs to be 
updated.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-30 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 07:57:01PM +0800, Lai Jiangshan wrote:
 
 I think users will enable tdp when their hardwares support ept or npt.
 This patch can reduce about 50% kvm mmu memory usage for they.
 
 This simple patch use the fact that:
 
 When sp-role.direct is set, sp-gfns does not contain any essential
 information, leaf sptes reachable from this sp are for a continuate
 guest physical memory range(a linear range).
 So sp-gfns[i](if it was set) equals to sp-gfn + i. (PT_PAGE_TABLE_LEVEL)
 Obviously, it is not essential information, we can calculate it when need.
 
 It means we don't need sp-gfns when sp-role.direct=1,
 Thus we can save one page usage for every kvm_mmu_page.
 
 Note:
 Access to sp-gfns must be wrapped by kvm_mmu_page_get_gfn()
 or kvm_mmu_page_set_gfn().
 It is only exposed in FNAME(sync_page).

Lai,

You missed quadrant on 4mb large page emulation with shadow (see updated
patch below). Also for some reason i can't understand the assumption
does not hold for large sptes with TDP, so reverted for now.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3266d73..a9edfdb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -393,6 +393,27 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
kfree(rd);
 }
 
+static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
+{
+   gfn_t gfn;
+
+   if (!sp-role.direct)
+   return sp-gfns[index];
+
+   gfn = sp-gfn + index * (1  (sp-role.level - 1) * PT64_LEVEL_BITS);
+   gfn += sp-role.quadrant  PT64_LEVEL_BITS;
+
+   return gfn;
+}
+
+static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
+{
+   if (sp-role.direct)
+   BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index));
+   else
+   sp-gfns[index] = gfn;
+}
+
 /*
  * Return the pointer to the largepage write count for a given
  * gfn, handling slots that are not large page aligned.
@@ -543,7 +564,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t 
gfn)
return count;
gfn = unalias_gfn(vcpu-kvm, gfn);
sp = page_header(__pa(spte));
-   sp-gfns[spte - sp-spt] = gfn;
+   kvm_mmu_page_set_gfn(sp, spte - sp-spt, gfn);
rmapp = gfn_to_rmap(vcpu-kvm, gfn, sp-role.level);
if (!*rmapp) {
rmap_printk(rmap_add: %p %llx 0-1\n, spte, *spte);
@@ -601,6 +622,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
struct kvm_rmap_desc *prev_desc;
struct kvm_mmu_page *sp;
pfn_t pfn;
+   gfn_t gfn;
unsigned long *rmapp;
int i;
 
@@ -612,7 +634,8 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
kvm_set_pfn_accessed(pfn);
if (is_writable_pte(*spte))
kvm_set_pfn_dirty(pfn);
-   rmapp = gfn_to_rmap(kvm, sp-gfns[spte - sp-spt], sp-role.level);
+   gfn = kvm_mmu_page_get_gfn(sp, spte - sp-spt);
+   rmapp = gfn_to_rmap(kvm, gfn, sp-role.level);
if (!*rmapp) {
printk(KERN_ERR rmap_remove: %p %llx 0-BUG\n, spte, *spte);
BUG();
@@ -896,7 +919,8 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
ASSERT(is_empty_shadow_page(sp-spt));
list_del(sp-link);
__free_page(virt_to_page(sp-spt));
-   __free_page(virt_to_page(sp-gfns));
+   if (!sp-role.direct)
+   __free_page(virt_to_page(sp-gfns));
kfree(sp);
++kvm-arch.n_free_mmu_pages;
 }
@@ -907,13 +931,15 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
 }
 
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-  u64 *parent_pte)
+  u64 *parent_pte, int direct)
 {
struct kvm_mmu_page *sp;
 
sp = mmu_memory_cache_alloc(vcpu-arch.mmu_page_header_cache, sizeof 
*sp);
sp-spt = mmu_memory_cache_alloc(vcpu-arch.mmu_page_cache, PAGE_SIZE);
-   sp-gfns = mmu_memory_cache_alloc(vcpu-arch.mmu_page_cache, 
PAGE_SIZE);
+   if (!direct)
+   sp-gfns = mmu_memory_cache_alloc(vcpu-arch.mmu_page_cache,
+ PAGE_SIZE);
set_page_private(virt_to_page(sp-spt), (unsigned long)sp);
list_add(sp-link, vcpu-kvm-arch.active_mmu_pages);
bitmap_zero(sp-slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
@@ -1352,7 +1378,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (role.direct)
role.cr4_pae = 0;
role.access = access;
-   if (vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
+   if (vcpu-arch.mmu.root_level == PT32_ROOT_LEVEL) {
quadrant = gaddr  (PAGE_SHIFT + (PT64_PT_BITS * level));
quadrant = (1  ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
role.quadrant = quadrant;
@@ -1379,7 +1405,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 

Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-30 Thread Avi Kivity

On 04/29/2010 09:09 PM, Marcelo Tosatti wrote:


You missed quadrant on 4mb large page emulation with shadow (see updated
patch below).


Good catch.


Also for some reason i can't understand the assumption
does not hold for large sptes with TDP, so reverted for now.
   


It's unrelated to TDP, same issue with shadow.  I think the calculation 
is correct.  For example the 4th spte for a level=2 page will yield 
gfn=4*512.



@@ -393,6 +393,27 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
kfree(rd);
  }

+static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
+{
+   gfn_t gfn;
+
+   if (!sp-role.direct)
+   return sp-gfns[index];
+
+   gfn = sp-gfn + index * (1  (sp-role.level - 1) * PT64_LEVEL_BITS);
+   gfn += sp-role.quadrant  PT64_LEVEL_BITS;
   


PT64_LEVEL_BITS * level


+
+   return gfn;
+}
+

   



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-30 Thread Marcelo Tosatti
On Thu, Apr 29, 2010 at 09:43:40PM +0300, Avi Kivity wrote:
 On 04/29/2010 09:09 PM, Marcelo Tosatti wrote:
 
 You missed quadrant on 4mb large page emulation with shadow (see updated
 patch below).
 
 Good catch.
 
 Also for some reason i can't understand the assumption
 does not hold for large sptes with TDP, so reverted for now.
 
 It's unrelated to TDP, same issue with shadow.  I think the
 calculation is correct.  For example the 4th spte for a level=2 page
 will yield gfn=4*512.

Under testing i see sp at level 2, with sp-gfn == 4096, mmu_set_spte
setting index 8 to gfn 4096 (whereas kvm_mmu_page_get_gfn returns 4096 +
8*512).

Lai, can you please take a look at it? You should see the
kvm_mmu_page_set_gfn BUG_ON by using -mem-path on hugetlbfs.

 @@ -393,6 +393,27 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
  kfree(rd);
   }
 
 +static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 +{
 +gfn_t gfn;
 +
 +if (!sp-role.direct)
 +return sp-gfns[index];
 +
 +gfn = sp-gfn + index * (1  (sp-role.level - 1) * PT64_LEVEL_BITS);
 +gfn += sp-role.quadrant  PT64_LEVEL_BITS;
 
 PT64_LEVEL_BITS * level
 
 +
 +return gfn;
 +}
 +
 
 
 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-28 Thread Avi Kivity

On 04/28/2010 02:57 PM, Lai Jiangshan wrote:

I think users will enable tdp when their hardwares support ept or npt.
This patch can reduce about 50% kvm mmu memory usage for they.

   


Good one!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-04-28 Thread Marcelo Tosatti
On Wed, Apr 28, 2010 at 07:57:01PM +0800, Lai Jiangshan wrote:
 
 I think users will enable tdp when their hardwares support ept or npt.
 This patch can reduce about 50% kvm mmu memory usage for they.
 
 This simple patch use the fact that:
 
 When sp-role.direct is set, sp-gfns does not contain any essential
 information, leaf sptes reachable from this sp are for a continuate
 guest physical memory range(a linear range).
 So sp-gfns[i](if it was set) equals to sp-gfn + i. (PT_PAGE_TABLE_LEVEL)
 Obviously, it is not essential information, we can calculate it when need.
 
 It means we don't need sp-gfns when sp-role.direct=1,
 Thus we can save one page usage for every kvm_mmu_page.
 
 Note:
 Access to sp-gfns must be wrapped by kvm_mmu_page_get_gfn()
 or kvm_mmu_page_set_gfn().
 It is only exposed in FNAME(sync_page).
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html