Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-28 Thread Marcelo Tosatti
On Wed, Nov 28, 2012 at 01:27:38PM +0800, Xiao Guangrong wrote:
 On 11/18/2012 11:00 AM, Marcelo Tosatti wrote:
 map gfn 4?  See corrected step 7 above.
 
  Ah, this is a real bug, and unfortunately, it exists in current
  code. I will make a separate patchset to fix it. Thank you, Marcelo!
  
  Is it? Hum..
  
  Anyway, it would be great if you can write a testcase (should be similar
  in size to rmap_chain).
 
 Marcelo, is this patch acceptable?

Yes, can we get reexecute_instruction fix first? (which should then be
able to handle the case where a large read-only spte is created).

I'll merge the testcase later today.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-27 Thread Xiao Guangrong
On 11/18/2012 11:00 AM, Marcelo Tosatti wrote:
map gfn 4?  See corrected step 7 above.

 Ah, this is a real bug, and unfortunately, it exists in current
 code. I will make a separate patchset to fix it. Thank you, Marcelo!
 
 Is it? Hum..
 
 Anyway, it would be great if you can write a testcase (should be similar
 in size to rmap_chain).

Marcelo, is this patch acceptable?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-17 Thread Xiao Guangrong
On 11/16/2012 05:57 PM, Marcelo Tosatti wrote:
 On Fri, Nov 16, 2012 at 12:46:16PM +0800, Xiao Guangrong wrote:
 On 11/16/2012 11:56 AM, Marcelo Tosatti wrote:
 On Fri, Nov 16, 2012 at 11:39:12AM +0800, Xiao Guangrong wrote:
 On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
 On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
 On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
 On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
 Hi Marcelo,

 On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so 
 that
 the guest can happliy read memory through it

 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good 
 idea,
 | since it moves some work from protect-time to fault-time, so it 
 reduces
 | jitter.  This removes the need for the return value.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)

 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 


 It needs a page fault to install a pte even if it is the read access.
 After the change, the page fault can be avoided.

 Can you measure an improvement with this change?

 I have a test case to measure the read time which has been attached.
 It maps 4k pages at first (dirt-loggged), then switch to large sptes
 (stop dirt-logging), at the last, measure the read access time after 
 write
 protect sptes.

 Before: 23314111 nsAfter: 11404197 ns

 Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with 
 shadow),
 that is:

 - large page must be destroyed when write protecting due to 
 shadowed page.
 - with shadow, it does not make sense to write protect 
 large sptes as mentioned earlier.


 This case is removed now, the code when e49146dce8c3dc6f44 was applied 
 is:
 |
 |pt = sp-spt;
 |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
 |/* avoid RMW */
 |if (is_writable_pte(pt[i]))
 |update_spte(pt[i], pt[i]  
 ~PT_WRITABLE_MASK);
 |}

 The real problem in this code is it would write-protect the spte even if
 it is not a last spte that caused the middle-level shadow page table was
 write-protected. So e49146dce8c3dc6f44 added this code:
 |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
 |continue;
 |
 was good to fix this problem.

 Now, the current code is:
 |for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
 |if (!is_shadow_present_pte(pt[i]) ||
 |  !is_last_spte(pt[i], sp-role.level))
 |continue;
 |
 |spte_write_protect(kvm, pt[i], flush, false);
 |}
 It only write-protect the last spte. So, it allows large spte existent.
 (the large spte can be broken by drop_large_spte() on the page-fault 
 path.)

 So i wonder why is this part from your patch

 -   if (level  PT_PAGE_TABLE_LEVEL 
 -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
 -   ret = 1;
 -   drop_spte(vcpu-kvm, sptep);
 -   goto done;
 -   }

 necessary (assuming EPT is in use).

 This is safe, we change these code to:

 -if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 +if ((level  PT_PAGE_TABLE_LEVEL 
 +   has_wrprotected_page(vcpu-kvm, gfn, level)) ||
 +  mmu_need_write_protect(vcpu, gfn, can_unsync)) {
  pgprintk(%s: found shadow page for %llx, 
 marking ro\n,
   __func__, gfn);
  ret = 1;

 The spte become read-only which can ensure the shadow gfn can not be 
 changed.

 Btw, the origin code allows to create readonly spte under this case if 
 !(pte_access  WRITEABBLE)

 Regarding shadow: it should be fine as long as fault path always deletes
 large mappings, when shadowed pages are present in the region.

 For hard mmu is also safe, in this patch i added these code:

 @@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t 
 v, int write,
break;
}

 +  drop_large_spte(vcpu, iterator.sptep);
 +

 It can delete large mappings like soft mmu does.

 Anything i missed?


 Ah, unshadowing from reexecute_instruction does not handle
 large pages. I suppose that is what simplification refers 
 to.

 reexecute_instruction did not directly handle last spte, it just
 removes all shadow pages, then let cpu retry the instruction, the
 page can become writable when encounter #PF again, large spte is fine
 under 

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-17 Thread Marcelo Tosatti
On Sat, Nov 17, 2012 at 10:06:18PM +0800, Xiao Guangrong wrote:
 On 11/16/2012 05:57 PM, Marcelo Tosatti wrote:
  On Fri, Nov 16, 2012 at 12:46:16PM +0800, Xiao Guangrong wrote:
  On 11/16/2012 11:56 AM, Marcelo Tosatti wrote:
  On Fri, Nov 16, 2012 at 11:39:12AM +0800, Xiao Guangrong wrote:
  On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
  On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
  On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
  On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
  Hi Marcelo,
 
  On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
  On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
  Do not drop large spte until it can be insteaded by small pages so 
  that
  the guest can happliy read memory through it
 
  The idea is from Avi:
  | As I mentioned before, write-protecting a large spte is a good 
  idea,
  | since it moves some work from protect-time to fault-time, so it 
  reduces
  | jitter.  This removes the need for the return value.
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c |   34 +-
   1 files changed, 9 insertions(+), 25 deletions(-)
 
  Its likely that other 4k pages are mapped read-write in the 2mb 
  range 
  covered by a read-only 2mb map. Therefore its not entirely useful to
  map read-only. 
 
 
  It needs a page fault to install a pte even if it is the read access.
  After the change, the page fault can be avoided.
 
  Can you measure an improvement with this change?
 
  I have a test case to measure the read time which has been attached.
  It maps 4k pages at first (dirt-loggged), then switch to large sptes
  (stop dirt-logging), at the last, measure the read access time after 
  write
  protect sptes.
 
  Before: 23314111 ns  After: 11404197 ns
 
  Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with 
  shadow),
  that is:
 
  - large page must be destroyed when write protecting due to 
  shadowed page.
  - with shadow, it does not make sense to write protect 
  large sptes as mentioned earlier.
 
 
  This case is removed now, the code when e49146dce8c3dc6f44 was applied 
  is:
  |
  |pt = sp-spt;
  |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
  |/* avoid RMW */
  |if (is_writable_pte(pt[i]))
  |update_spte(pt[i], pt[i]  
  ~PT_WRITABLE_MASK);
  |}
 
  The real problem in this code is it would write-protect the spte even 
  if
  it is not a last spte that caused the middle-level shadow page table 
  was
  write-protected. So e49146dce8c3dc6f44 added this code:
  |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
  |continue;
  |
  was good to fix this problem.
 
  Now, the current code is:
  |  for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
  |  if (!is_shadow_present_pte(pt[i]) ||
  |!is_last_spte(pt[i], sp-role.level))
  |  continue;
  |
  |  spte_write_protect(kvm, pt[i], flush, false);
  |  }
  It only write-protect the last spte. So, it allows large spte existent.
  (the large spte can be broken by drop_large_spte() on the page-fault 
  path.)
 
  So i wonder why is this part from your patch
 
  -   if (level  PT_PAGE_TABLE_LEVEL 
  -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
  -   ret = 1;
  -   drop_spte(vcpu-kvm, sptep);
  -   goto done;
  -   }
 
  necessary (assuming EPT is in use).
 
  This is safe, we change these code to:
 
  -  if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
  +  if ((level  PT_PAGE_TABLE_LEVEL 
  + has_wrprotected_page(vcpu-kvm, gfn, level)) ||
  +mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 pgprintk(%s: found shadow page for %llx, 
  marking ro\n,
  __func__, gfn);
 ret = 1;
 
  The spte become read-only which can ensure the shadow gfn can not be 
  changed.
 
  Btw, the origin code allows to create readonly spte under this case if 
  !(pte_access  WRITEABBLE)
 
  Regarding shadow: it should be fine as long as fault path always deletes
  large mappings, when shadowed pages are present in the region.
 
  For hard mmu is also safe, in this patch i added these code:
 
  @@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, 
  gpa_t v, int write,
   break;
   }
 
  +drop_large_spte(vcpu, iterator.sptep);
  +
 
  It can delete large mappings like soft mmu does.
 
  Anything i missed?
 
 
  Ah, unshadowing from reexecute_instruction does not handle
  large pages. I suppose that is what simplification refers 
  to.
 
  

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-16 Thread Marcelo Tosatti
On Fri, Nov 16, 2012 at 12:46:16PM +0800, Xiao Guangrong wrote:
 On 11/16/2012 11:56 AM, Marcelo Tosatti wrote:
  On Fri, Nov 16, 2012 at 11:39:12AM +0800, Xiao Guangrong wrote:
  On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
  On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
  On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
  On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
  Hi Marcelo,
 
  On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
  On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
  Do not drop large spte until it can be insteaded by small pages so 
  that
  the guest can happliy read memory through it
 
  The idea is from Avi:
  | As I mentioned before, write-protecting a large spte is a good 
  idea,
  | since it moves some work from protect-time to fault-time, so it 
  reduces
  | jitter.  This removes the need for the return value.
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c |   34 +-
   1 files changed, 9 insertions(+), 25 deletions(-)
 
  Its likely that other 4k pages are mapped read-write in the 2mb range 
  covered by a read-only 2mb map. Therefore its not entirely useful to
  map read-only. 
 
 
  It needs a page fault to install a pte even if it is the read access.
  After the change, the page fault can be avoided.
 
  Can you measure an improvement with this change?
 
  I have a test case to measure the read time which has been attached.
  It maps 4k pages at first (dirt-loggged), then switch to large sptes
  (stop dirt-logging), at the last, measure the read access time after 
  write
  protect sptes.
 
  Before: 23314111 nsAfter: 11404197 ns
 
  Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with 
  shadow),
  that is:
 
  - large page must be destroyed when write protecting due to 
  shadowed page.
  - with shadow, it does not make sense to write protect 
  large sptes as mentioned earlier.
 
 
  This case is removed now, the code when e49146dce8c3dc6f44 was applied 
  is:
  |
  |pt = sp-spt;
  |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
  |/* avoid RMW */
  |if (is_writable_pte(pt[i]))
  |update_spte(pt[i], pt[i]  
  ~PT_WRITABLE_MASK);
  |}
 
  The real problem in this code is it would write-protect the spte even if
  it is not a last spte that caused the middle-level shadow page table was
  write-protected. So e49146dce8c3dc6f44 added this code:
  |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
  |continue;
  |
  was good to fix this problem.
 
  Now, the current code is:
  |for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
  |if (!is_shadow_present_pte(pt[i]) ||
  |  !is_last_spte(pt[i], sp-role.level))
  |continue;
  |
  |spte_write_protect(kvm, pt[i], flush, false);
  |}
  It only write-protect the last spte. So, it allows large spte existent.
  (the large spte can be broken by drop_large_spte() on the page-fault 
  path.)
 
  So i wonder why is this part from your patch
 
  -   if (level  PT_PAGE_TABLE_LEVEL 
  -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
  -   ret = 1;
  -   drop_spte(vcpu-kvm, sptep);
  -   goto done;
  -   }
 
  necessary (assuming EPT is in use).
 
  This is safe, we change these code to:
 
  -if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
  +if ((level  PT_PAGE_TABLE_LEVEL 
  +   has_wrprotected_page(vcpu-kvm, gfn, level)) ||
  +  mmu_need_write_protect(vcpu, gfn, can_unsync)) {
   pgprintk(%s: found shadow page for %llx, 
  marking ro\n,
__func__, gfn);
   ret = 1;
 
  The spte become read-only which can ensure the shadow gfn can not be 
  changed.
 
  Btw, the origin code allows to create readonly spte under this case if 
  !(pte_access  WRITEABBLE)
 
  Regarding shadow: it should be fine as long as fault path always deletes
  large mappings, when shadowed pages are present in the region.
 
  For hard mmu is also safe, in this patch i added these code:
 
  @@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t 
  v, int write,
 break;
 }
 
  +  drop_large_spte(vcpu, iterator.sptep);
  +
 
  It can delete large mappings like soft mmu does.
 
  Anything i missed?
 
 
  Ah, unshadowing from reexecute_instruction does not handle
  large pages. I suppose that is what simplification refers 
  to.
 
  reexecute_instruction did not directly handle last spte, it just
  removes all shadow pages, then let cpu retry the 

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-15 Thread Marcelo Tosatti
On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
 On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
  On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
  Hi Marcelo,
 
  On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
  On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
  Do not drop large spte until it can be insteaded by small pages so that
  the guest can happliy read memory through it
 
  The idea is from Avi:
  | As I mentioned before, write-protecting a large spte is a good idea,
  | since it moves some work from protect-time to fault-time, so it reduces
  | jitter.  This removes the need for the return value.
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c |   34 +-
   1 files changed, 9 insertions(+), 25 deletions(-)
 
  Its likely that other 4k pages are mapped read-write in the 2mb range 
  covered by a read-only 2mb map. Therefore its not entirely useful to
  map read-only. 
 
 
  It needs a page fault to install a pte even if it is the read access.
  After the change, the page fault can be avoided.
 
  Can you measure an improvement with this change?
 
  I have a test case to measure the read time which has been attached.
  It maps 4k pages at first (dirt-loggged), then switch to large sptes
  (stop dirt-logging), at the last, measure the read access time after write
  protect sptes.
 
  Before: 23314111 nsAfter: 11404197 ns
  
  Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with shadow),
  that is:
  
  - large page must be destroyed when write protecting due to 
  shadowed page.
  - with shadow, it does not make sense to write protect 
  large sptes as mentioned earlier.
  
 
 This case is removed now, the code when e49146dce8c3dc6f44 was applied is:
 |
 |pt = sp-spt;
 |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
 |/* avoid RMW */
 |if (is_writable_pte(pt[i]))
 |update_spte(pt[i], pt[i]  
 ~PT_WRITABLE_MASK);
 |}
 
 The real problem in this code is it would write-protect the spte even if
 it is not a last spte that caused the middle-level shadow page table was
 write-protected. So e49146dce8c3dc6f44 added this code:
 |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
 |continue;
 |
 was good to fix this problem.
 
 Now, the current code is:
 | for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
 | if (!is_shadow_present_pte(pt[i]) ||
 |   !is_last_spte(pt[i], sp-role.level))
 | continue;
 |
 | spte_write_protect(kvm, pt[i], flush, false);
 | }
 It only write-protect the last spte. So, it allows large spte existent.
 (the large spte can be broken by drop_large_spte() on the page-fault path.)
 
  So i wonder why is this part from your patch
  
  -   if (level  PT_PAGE_TABLE_LEVEL 
  -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
  -   ret = 1;
  -   drop_spte(vcpu-kvm, sptep);
  -   goto done;
  -   }
  
  necessary (assuming EPT is in use).
 
 This is safe, we change these code to:
 
 - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 + if ((level  PT_PAGE_TABLE_LEVEL 
 +has_wrprotected_page(vcpu-kvm, gfn, level)) ||
 +   mmu_need_write_protect(vcpu, gfn, can_unsync)) {
   pgprintk(%s: found shadow page for %llx, marking ro\n,
__func__, gfn);
   ret = 1;
 
 The spte become read-only which can ensure the shadow gfn can not be changed.
 
 Btw, the origin code allows to create readonly spte under this case if 
 !(pte_access  WRITEABBLE)

Regarding shadow: it should be fine as long as fault path always deletes
large mappings, when shadowed pages are present in the region.

Ah, unshadowing from reexecute_instruction does not handle
large pages. I suppose that is what simplification refers 
to.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-15 Thread Xiao Guangrong
On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
 On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
 On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
 On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
 Hi Marcelo,

 On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so that
 the guest can happliy read memory through it

 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it reduces
 | jitter.  This removes the need for the return value.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)

 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 


 It needs a page fault to install a pte even if it is the read access.
 After the change, the page fault can be avoided.

 Can you measure an improvement with this change?

 I have a test case to measure the read time which has been attached.
 It maps 4k pages at first (dirt-loggged), then switch to large sptes
 (stop dirt-logging), at the last, measure the read access time after write
 protect sptes.

 Before: 23314111 nsAfter: 11404197 ns

 Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with shadow),
 that is:

 - large page must be destroyed when write protecting due to 
 shadowed page.
 - with shadow, it does not make sense to write protect 
 large sptes as mentioned earlier.


 This case is removed now, the code when e49146dce8c3dc6f44 was applied is:
 |
 |pt = sp-spt;
 |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
 |/* avoid RMW */
 |if (is_writable_pte(pt[i]))
 |update_spte(pt[i], pt[i]  
 ~PT_WRITABLE_MASK);
 |}

 The real problem in this code is it would write-protect the spte even if
 it is not a last spte that caused the middle-level shadow page table was
 write-protected. So e49146dce8c3dc6f44 added this code:
 |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
 |continue;
 |
 was good to fix this problem.

 Now, the current code is:
 |for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
 |if (!is_shadow_present_pte(pt[i]) ||
 |  !is_last_spte(pt[i], sp-role.level))
 |continue;
 |
 |spte_write_protect(kvm, pt[i], flush, false);
 |}
 It only write-protect the last spte. So, it allows large spte existent.
 (the large spte can be broken by drop_large_spte() on the page-fault path.)

 So i wonder why is this part from your patch

 -   if (level  PT_PAGE_TABLE_LEVEL 
 -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
 -   ret = 1;
 -   drop_spte(vcpu-kvm, sptep);
 -   goto done;
 -   }

 necessary (assuming EPT is in use).

 This is safe, we change these code to:

 -if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 +if ((level  PT_PAGE_TABLE_LEVEL 
 +   has_wrprotected_page(vcpu-kvm, gfn, level)) ||
 +  mmu_need_write_protect(vcpu, gfn, can_unsync)) {
  pgprintk(%s: found shadow page for %llx, marking ro\n,
   __func__, gfn);
  ret = 1;

 The spte become read-only which can ensure the shadow gfn can not be changed.

 Btw, the origin code allows to create readonly spte under this case if 
 !(pte_access  WRITEABBLE)
 
 Regarding shadow: it should be fine as long as fault path always deletes
 large mappings, when shadowed pages are present in the region.

For hard mmu is also safe, in this patch i added these code:

@@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
break;
}

+   drop_large_spte(vcpu, iterator.sptep);
+

It can delete large mappings like soft mmu does.

Anything i missed?

 
 Ah, unshadowing from reexecute_instruction does not handle
 large pages. I suppose that is what simplification refers 
 to.

reexecute_instruction did not directly handle last spte, it just
removes all shadow pages, then let cpu retry the instruction, the
page can become writable when encounter #PF again, large spte is fine
under this case.

(Out of this thread: I notice reexecute_instruction allows to retry
 instruct only if tdp_enabled == 0, but on nested npt, it also has
 page write-protected by shadow pages. Maybe we need to improve this
 restriction.
)

--
To unsubscribe from this list: send the line 

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-15 Thread Marcelo Tosatti
On Fri, Nov 16, 2012 at 11:39:12AM +0800, Xiao Guangrong wrote:
 On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
  On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
  On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
  On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
  Hi Marcelo,
 
  On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
  On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
  Do not drop large spte until it can be insteaded by small pages so that
  the guest can happliy read memory through it
 
  The idea is from Avi:
  | As I mentioned before, write-protecting a large spte is a good idea,
  | since it moves some work from protect-time to fault-time, so it 
  reduces
  | jitter.  This removes the need for the return value.
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c |   34 +-
   1 files changed, 9 insertions(+), 25 deletions(-)
 
  Its likely that other 4k pages are mapped read-write in the 2mb range 
  covered by a read-only 2mb map. Therefore its not entirely useful to
  map read-only. 
 
 
  It needs a page fault to install a pte even if it is the read access.
  After the change, the page fault can be avoided.
 
  Can you measure an improvement with this change?
 
  I have a test case to measure the read time which has been attached.
  It maps 4k pages at first (dirt-loggged), then switch to large sptes
  (stop dirt-logging), at the last, measure the read access time after 
  write
  protect sptes.
 
  Before: 23314111 ns  After: 11404197 ns
 
  Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with shadow),
  that is:
 
  - large page must be destroyed when write protecting due to 
  shadowed page.
  - with shadow, it does not make sense to write protect 
  large sptes as mentioned earlier.
 
 
  This case is removed now, the code when e49146dce8c3dc6f44 was applied is:
  |
  |pt = sp-spt;
  |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
  |/* avoid RMW */
  |if (is_writable_pte(pt[i]))
  |update_spte(pt[i], pt[i]  
  ~PT_WRITABLE_MASK);
  |}
 
  The real problem in this code is it would write-protect the spte even if
  it is not a last spte that caused the middle-level shadow page table was
  write-protected. So e49146dce8c3dc6f44 added this code:
  |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
  |continue;
  |
  was good to fix this problem.
 
  Now, the current code is:
  |  for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
  |  if (!is_shadow_present_pte(pt[i]) ||
  |!is_last_spte(pt[i], sp-role.level))
  |  continue;
  |
  |  spte_write_protect(kvm, pt[i], flush, false);
  |  }
  It only write-protect the last spte. So, it allows large spte existent.
  (the large spte can be broken by drop_large_spte() on the page-fault path.)
 
  So i wonder why is this part from your patch
 
  -   if (level  PT_PAGE_TABLE_LEVEL 
  -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
  -   ret = 1;
  -   drop_spte(vcpu-kvm, sptep);
  -   goto done;
  -   }
 
  necessary (assuming EPT is in use).
 
  This is safe, we change these code to:
 
  -  if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
  +  if ((level  PT_PAGE_TABLE_LEVEL 
  + has_wrprotected_page(vcpu-kvm, gfn, level)) ||
  +mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 pgprintk(%s: found shadow page for %llx, marking ro\n,
  __func__, gfn);
 ret = 1;
 
  The spte become read-only which can ensure the shadow gfn can not be 
  changed.
 
  Btw, the origin code allows to create readonly spte under this case if 
  !(pte_access  WRITEABBLE)
  
  Regarding shadow: it should be fine as long as fault path always deletes
  large mappings, when shadowed pages are present in the region.
 
 For hard mmu is also safe, in this patch i added these code:
 
 @@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
 int write,
   break;
   }
 
 + drop_large_spte(vcpu, iterator.sptep);
 +
 
 It can delete large mappings like soft mmu does.
 
 Anything i missed?
 
  
  Ah, unshadowing from reexecute_instruction does not handle
  large pages. I suppose that is what simplification refers 
  to.
 
 reexecute_instruction did not directly handle last spte, it just
 removes all shadow pages, then let cpu retry the instruction, the
 page can become writable when encounter #PF again, large spte is fine
 under this case.

While searching for a given gpa, you don't find large gfn which is
mapping it, right? (that is, searching for 

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-15 Thread Xiao Guangrong
On 11/16/2012 11:56 AM, Marcelo Tosatti wrote:
 On Fri, Nov 16, 2012 at 11:39:12AM +0800, Xiao Guangrong wrote:
 On 11/16/2012 11:02 AM, Marcelo Tosatti wrote:
 On Thu, Nov 15, 2012 at 07:17:15AM +0800, Xiao Guangrong wrote:
 On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
 On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
 Hi Marcelo,

 On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so that
 the guest can happliy read memory through it

 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it 
 reduces
 | jitter.  This removes the need for the return value.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)

 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 


 It needs a page fault to install a pte even if it is the read access.
 After the change, the page fault can be avoided.

 Can you measure an improvement with this change?

 I have a test case to measure the read time which has been attached.
 It maps 4k pages at first (dirt-loggged), then switch to large sptes
 (stop dirt-logging), at the last, measure the read access time after 
 write
 protect sptes.

 Before: 23314111 ns  After: 11404197 ns

 Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with shadow),
 that is:

 - large page must be destroyed when write protecting due to 
 shadowed page.
 - with shadow, it does not make sense to write protect 
 large sptes as mentioned earlier.


 This case is removed now, the code when e49146dce8c3dc6f44 was applied is:
 |
 |pt = sp-spt;
 |for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
 |/* avoid RMW */
 |if (is_writable_pte(pt[i]))
 |update_spte(pt[i], pt[i]  
 ~PT_WRITABLE_MASK);
 |}

 The real problem in this code is it would write-protect the spte even if
 it is not a last spte that caused the middle-level shadow page table was
 write-protected. So e49146dce8c3dc6f44 added this code:
 |if (sp-role.level != PT_PAGE_TABLE_LEVEL)
 |continue;
 |
 was good to fix this problem.

 Now, the current code is:
 |  for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
 |  if (!is_shadow_present_pte(pt[i]) ||
 |!is_last_spte(pt[i], sp-role.level))
 |  continue;
 |
 |  spte_write_protect(kvm, pt[i], flush, false);
 |  }
 It only write-protect the last spte. So, it allows large spte existent.
 (the large spte can be broken by drop_large_spte() on the page-fault path.)

 So i wonder why is this part from your patch

 -   if (level  PT_PAGE_TABLE_LEVEL 
 -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
 -   ret = 1;
 -   drop_spte(vcpu-kvm, sptep);
 -   goto done;
 -   }

 necessary (assuming EPT is in use).

 This is safe, we change these code to:

 -  if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 +  if ((level  PT_PAGE_TABLE_LEVEL 
 + has_wrprotected_page(vcpu-kvm, gfn, level)) ||
 +mmu_need_write_protect(vcpu, gfn, can_unsync)) {
pgprintk(%s: found shadow page for %llx, marking ro\n,
 __func__, gfn);
ret = 1;

 The spte become read-only which can ensure the shadow gfn can not be 
 changed.

 Btw, the origin code allows to create readonly spte under this case if 
 !(pte_access  WRITEABBLE)

 Regarding shadow: it should be fine as long as fault path always deletes
 large mappings, when shadowed pages are present in the region.

 For hard mmu is also safe, in this patch i added these code:

 @@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t 
 v, int write,
  break;
  }

 +drop_large_spte(vcpu, iterator.sptep);
 +

 It can delete large mappings like soft mmu does.

 Anything i missed?


 Ah, unshadowing from reexecute_instruction does not handle
 large pages. I suppose that is what simplification refers 
 to.

 reexecute_instruction did not directly handle last spte, it just
 removes all shadow pages, then let cpu retry the instruction, the
 page can become writable when encounter #PF again, large spte is fine
 under this case.
 
 While searching for a given gpa, you don't find large gfn which is
 mapping it, right? (that is, searching for gfn 4 fails to find large
 read-only gfn 0). Unshadowing gfn 4 will 

Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-14 Thread Marcelo Tosatti
On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
 Hi Marcelo,
 
 On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
  On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
  Do not drop large spte until it can be insteaded by small pages so that
  the guest can happliy read memory through it
 
  The idea is from Avi:
  | As I mentioned before, write-protecting a large spte is a good idea,
  | since it moves some work from protect-time to fault-time, so it reduces
  | jitter.  This removes the need for the return value.
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c |   34 +-
   1 files changed, 9 insertions(+), 25 deletions(-)
  
  Its likely that other 4k pages are mapped read-write in the 2mb range 
  covered by a read-only 2mb map. Therefore its not entirely useful to
  map read-only. 
  
 
 It needs a page fault to install a pte even if it is the read access.
 After the change, the page fault can be avoided.
 
  Can you measure an improvement with this change?
 
 I have a test case to measure the read time which has been attached.
 It maps 4k pages at first (dirt-loggged), then switch to large sptes
 (stop dirt-logging), at the last, measure the read access time after write
 protect sptes.
 
 Before: 23314111 ns   After: 11404197 ns

Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with shadow),
that is:

- large page must be destroyed when write protecting due to 
shadowed page.
- with shadow, it does not make sense to write protect 
large sptes as mentioned earlier.

So i wonder why is this part from your patch

-   if (level  PT_PAGE_TABLE_LEVEL 
-   has_wrprotected_page(vcpu-kvm, gfn, level)) {
-   ret = 1;
-   drop_spte(vcpu-kvm, sptep);
-   goto done;
-   }

necessary (assuming EPT is in use).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-14 Thread Marcelo Tosatti
On Wed, Nov 14, 2012 at 12:33:50AM +0900, Takuya Yoshikawa wrote:
 Ccing live migration developers who should be interested in this work,
 
 On Mon, 12 Nov 2012 21:10:32 -0200
 Marcelo Tosatti mtosa...@redhat.com wrote:
 
  On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
   Do not drop large spte until it can be insteaded by small pages so that
   the guest can happliy read memory through it
   
   The idea is from Avi:
   | As I mentioned before, write-protecting a large spte is a good idea,
   | since it moves some work from protect-time to fault-time, so it reduces
   | jitter.  This removes the need for the return value.
   
   Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
   ---
arch/x86/kvm/mmu.c |   34 +-
1 files changed, 9 insertions(+), 25 deletions(-)
  
  Its likely that other 4k pages are mapped read-write in the 2mb range 
  covered by a read-only 2mb map. Therefore its not entirely useful to
  map read-only. 
  
  Can you measure an improvement with this change?
 
 What we discussed at KVM Forum last week was about the jitter we could
 measure right after starting live migration: both Isaku and Chegu reported
 such jitter.
 
 So if this patch reduces such jitter for some real workloads, by lazily
 dropping largepage mappings and saving read faults until that point, that
 would be very nice!
 
 But sadly, what they measured included interactions with the outside of the
 guest, and the main cause was due to the big QEMU lock problem, they guessed.
 The order is so different that an improvement by a kernel side effort may not
 be seen easily.
 
 FWIW: I am now changing the initial write protection by
 kvm_mmu_slot_remove_write_access() to rmap based as I proposed at KVM Forum.
 ftrace said that 1ms was improved to 250-350us by the change for 10GB guest.
 My code still drops largepage mappings, so the initial write protection time
 itself may not be a such big issue here, I think.
 
 Again, if we can eliminate read faults to such an extent that guests can see
 measurable improvement, that should be very nice!
 
 Any thoughts?
 
 Thanks,
   Takuya

OK, makes sense. I'm worried about shadow / oos interactions 
with large read-only mappings (trying to remember what was the 
case exactly, it might be non-existant now).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-14 Thread Xiao Guangrong
On 11/14/2012 10:37 PM, Marcelo Tosatti wrote:
 On Tue, Nov 13, 2012 at 04:26:16PM +0800, Xiao Guangrong wrote:
 Hi Marcelo,

 On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so that
 the guest can happliy read memory through it

 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it reduces
 | jitter.  This removes the need for the return value.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)

 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 


 It needs a page fault to install a pte even if it is the read access.
 After the change, the page fault can be avoided.

 Can you measure an improvement with this change?

 I have a test case to measure the read time which has been attached.
 It maps 4k pages at first (dirt-loggged), then switch to large sptes
 (stop dirt-logging), at the last, measure the read access time after write
 protect sptes.

 Before: 23314111 ns  After: 11404197 ns
 
 Ok, i'm concerned about cases similar to e49146dce8c3dc6f44 (with shadow),
 that is:
 
 - large page must be destroyed when write protecting due to 
 shadowed page.
 - with shadow, it does not make sense to write protect 
 large sptes as mentioned earlier.
 

This case is removed now, the code when e49146dce8c3dc6f44 was applied is:
|
|pt = sp-spt;
|for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
|/* avoid RMW */
|if (is_writable_pte(pt[i]))
|update_spte(pt[i], pt[i]  ~PT_WRITABLE_MASK);
|}

The real problem in this code is it would write-protect the spte even if
it is not a last spte that caused the middle-level shadow page table was
write-protected. So e49146dce8c3dc6f44 added this code:
|if (sp-role.level != PT_PAGE_TABLE_LEVEL)
|continue;
|
was good to fix this problem.

Now, the current code is:
|   for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
|   if (!is_shadow_present_pte(pt[i]) ||
| !is_last_spte(pt[i], sp-role.level))
|   continue;
|
|   spte_write_protect(kvm, pt[i], flush, false);
|   }
It only write-protect the last spte. So, it allows large spte existent.
(the large spte can be broken by drop_large_spte() on the page-fault path.)

 So i wonder why is this part from your patch
 
 -   if (level  PT_PAGE_TABLE_LEVEL 
 -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
 -   ret = 1;
 -   drop_spte(vcpu-kvm, sptep);
 -   goto done;
 -   }
 
 necessary (assuming EPT is in use).

This is safe, we change these code to:

-   if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
+   if ((level  PT_PAGE_TABLE_LEVEL 
+  has_wrprotected_page(vcpu-kvm, gfn, level)) ||
+ mmu_need_write_protect(vcpu, gfn, can_unsync)) {
pgprintk(%s: found shadow page for %llx, marking ro\n,
 __func__, gfn);
ret = 1;

The spte become read-only which can ensure the shadow gfn can not be changed.

Btw, the origin code allows to create readonly spte under this case if 
!(pte_access  WRITEABBLE)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-14 Thread Xiao Guangrong
On 11/14/2012 10:44 PM, Marcelo Tosatti wrote:
 On Wed, Nov 14, 2012 at 12:33:50AM +0900, Takuya Yoshikawa wrote:
 Ccing live migration developers who should be interested in this work,

 On Mon, 12 Nov 2012 21:10:32 -0200
 Marcelo Tosatti mtosa...@redhat.com wrote:

 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so that
 the guest can happliy read memory through it

 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it reduces
 | jitter.  This removes the need for the return value.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)

 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 

 Can you measure an improvement with this change?

 What we discussed at KVM Forum last week was about the jitter we could
 measure right after starting live migration: both Isaku and Chegu reported
 such jitter.

 So if this patch reduces such jitter for some real workloads, by lazily
 dropping largepage mappings and saving read faults until that point, that
 would be very nice!

 But sadly, what they measured included interactions with the outside of the
 guest, and the main cause was due to the big QEMU lock problem, they guessed.
 The order is so different that an improvement by a kernel side effort may not
 be seen easily.

 FWIW: I am now changing the initial write protection by
 kvm_mmu_slot_remove_write_access() to rmap based as I proposed at KVM Forum.
 ftrace said that 1ms was improved to 250-350us by the change for 10GB guest.
 My code still drops largepage mappings, so the initial write protection time
 itself may not be a such big issue here, I think.

 Again, if we can eliminate read faults to such an extent that guests can see
 measurable improvement, that should be very nice!

 Any thoughts?

 Thanks,
  Takuya
 
 OK, makes sense. I'm worried about shadow / oos interactions 
 with large read-only mappings (trying to remember what was the 
 case exactly, it might be non-existant now).

Marcelo, i guess commit 38187c830cab84daecb41169948467f1f19317e3 is what you
mentioned, but i do not know how it can Simplifies out of sync shadow.  :(

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-13 Thread Xiao Guangrong
Hi Marcelo,

On 11/13/2012 07:10 AM, Marcelo Tosatti wrote:
 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so that
 the guest can happliy read memory through it

 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it reduces
 | jitter.  This removes the need for the return value.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)
 
 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 
 

It needs a page fault to install a pte even if it is the read access.
After the change, the page fault can be avoided.

 Can you measure an improvement with this change?

I have a test case to measure the read time which has been attached.
It maps 4k pages at first (dirt-loggged), then switch to large sptes
(stop dirt-logging), at the last, measure the read access time after write
protect sptes.

Before: 23314111 ns After: 11404197 ns


testcase.tar.bz2
Description: application/bzip


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-13 Thread Takuya Yoshikawa
Ccing live migration developers who should be interested in this work,

On Mon, 12 Nov 2012 21:10:32 -0200
Marcelo Tosatti mtosa...@redhat.com wrote:

 On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
  Do not drop large spte until it can be insteaded by small pages so that
  the guest can happliy read memory through it
  
  The idea is from Avi:
  | As I mentioned before, write-protecting a large spte is a good idea,
  | since it moves some work from protect-time to fault-time, so it reduces
  | jitter.  This removes the need for the return value.
  
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  ---
   arch/x86/kvm/mmu.c |   34 +-
   1 files changed, 9 insertions(+), 25 deletions(-)
 
 Its likely that other 4k pages are mapped read-write in the 2mb range 
 covered by a read-only 2mb map. Therefore its not entirely useful to
 map read-only. 
 
 Can you measure an improvement with this change?

What we discussed at KVM Forum last week was about the jitter we could
measure right after starting live migration: both Isaku and Chegu reported
such jitter.

So if this patch reduces such jitter for some real workloads, by lazily
dropping largepage mappings and saving read faults until that point, that
would be very nice!

But sadly, what they measured included interactions with the outside of the
guest, and the main cause was due to the big QEMU lock problem, they guessed.
The order is so different that an improvement by a kernel side effort may not
be seen easily.

FWIW: I am now changing the initial write protection by
kvm_mmu_slot_remove_write_access() to rmap based as I proposed at KVM Forum.
ftrace said that 1ms was improved to 250-350us by the change for 10GB guest.
My code still drops largepage mappings, so the initial write protection time
itself may not be a such big issue here, I think.

Again, if we can eliminate read faults to such an extent that guests can see
measurable improvement, that should be very nice!

Any thoughts?

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: lazily drop large spte

2012-11-12 Thread Marcelo Tosatti
On Mon, Nov 05, 2012 at 05:59:26PM +0800, Xiao Guangrong wrote:
 Do not drop large spte until it can be insteaded by small pages so that
 the guest can happliy read memory through it
 
 The idea is from Avi:
 | As I mentioned before, write-protecting a large spte is a good idea,
 | since it moves some work from protect-time to fault-time, so it reduces
 | jitter.  This removes the need for the return value.
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/mmu.c |   34 +-
  1 files changed, 9 insertions(+), 25 deletions(-)

Its likely that other 4k pages are mapped read-write in the 2mb range 
covered by a read-only 2mb map. Therefore its not entirely useful to
map read-only. 

Can you measure an improvement with this change?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: lazily drop large spte

2012-11-05 Thread Xiao Guangrong
Do not drop large spte until it can be insteaded by small pages so that
the guest can happliy read memory through it

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c |   34 +-
 1 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b875a9e..1d8869c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1105,7 +1105,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 
*sptep)

 /*
  * Write-protect on the specified @sptep, @pt_protect indicates whether
- * spte writ-protection is caused by protecting shadow page table.
+ * spte write-protection is caused by protecting shadow page table.
  * @flush indicates whether tlb need be flushed.
  *
  * Note: write protection is difference between drity logging and spte
@@ -1114,31 +1114,23 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 
*sptep)
  *   its dirty bitmap is properly set.
  * - for spte protection, the spte can be writable only after unsync-ing
  *   shadow page.
- *
- * Return true if the spte is dropped.
  */
-static bool
+static void
 spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
 {
u64 spte = *sptep;

if (!is_writable_pte(spte) 
  !(pt_protect  spte_is_locklessly_modifiable(spte)))
-   return false;
+   return;

rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep);

-   if (__drop_large_spte(kvm, sptep)) {
-   *flush |= true;
-   return true;
-   }
-
if (pt_protect)
spte = ~SPTE_MMU_WRITEABLE;
spte = spte  ~PT_WRITABLE_MASK;

*flush |= mmu_spte_update(sptep, spte);
-   return false;
 }

 static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
@@ -1150,11 +1142,8 @@ static bool __rmap_write_protect(struct kvm *kvm, 
unsigned long *rmapp,

for (sptep = rmap_get_first(*rmapp, iter); sptep;) {
BUG_ON(!(*sptep  PT_PRESENT_MASK));
-   if (spte_write_protect(kvm, sptep, flush, pt_protect)) {
-   sptep = rmap_get_first(*rmapp, iter);
-   continue;
-   }

+   spte_write_protect(kvm, sptep, flush, pt_protect);
sptep = rmap_get_next(iter);
}

@@ -2381,14 +2370,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if ((pte_access  ACC_WRITE_MASK)
|| (!vcpu-arch.mmu.direct_map  write_fault
 !is_write_protection(vcpu)  !user_fault)) {
-
-   if (level  PT_PAGE_TABLE_LEVEL 
-   has_wrprotected_page(vcpu-kvm, gfn, level)) {
-   ret = 1;
-   drop_spte(vcpu-kvm, sptep);
-   goto done;
-   }
-
spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;

if (!vcpu-arch.mmu.direct_map
@@ -2413,7 +2394,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (!can_unsync  is_writable_pte(*sptep))
goto set_pte;

-   if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
+   if ((level  PT_PAGE_TABLE_LEVEL 
+  has_wrprotected_page(vcpu-kvm, gfn, level)) ||
+ mmu_need_write_protect(vcpu, gfn, can_unsync)) {
pgprintk(%s: found shadow page for %llx, marking ro\n,
 __func__, gfn);
ret = 1;
@@ -2428,7 +2411,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 set_pte:
if (mmu_spte_update(sptep, spte))
kvm_flush_remote_tlbs(vcpu-kvm);
-done:
return ret;
 }

@@ -2635,6 +2617,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
break;
}

+   drop_large_spte(vcpu, iterator.sptep);
+
if (!is_shadow_present_pte(*iterator.sptep)) {
u64 base_addr = iterator.addr;

-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html