Re: [PATCH] emulate accessed bit for EPT

2010-02-08 Thread Avi Kivity

On 02/03/2010 11:11 PM, Rik van Riel wrote:

Currently KVM pretends that pages with EPT mappings never got
accessed.  This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.

We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.

TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().

This seems to help prevent KVM guests from being swapped out when
they should not on my system.

   


Applied, thanks.



-   /* always return old for EPT */
+   /*
+* Emulate the accessed bit for EPT, by checking if this page has
+* an EPT mapping, and clearing it if it does. On the next access,
+* a new EPT mapping will be established.
+* This has some overhead, but not as much as the cost of swapping
+* out actively used pages or breaking up actively used hugepages.
+*/
if (!shadow_accessed_mask)
-   return 0;
+   return kvm_unmap_rmapp(kvm, rmapp, data);
   


This could be optimized by using a software-available bit for 'present' 
and the rwx bits for young, that is:


  (present, rwx) - the page is present and recently accessed, will not 
cause EPT violation
  (present, !rwx) - page is present but old, will cause EPT violation 
but not rmap games and get_user_pages_fast().


However that's best done later if ever.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-07 Thread Marcelo Tosatti
On Fri, Feb 05, 2010 at 07:14:13PM +0100, Andrea Arcangeli wrote:
 On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
  But perhaps a module parameter to turn accessed bit emulation off might
  be handy in the future?
 
 Maybe, but somebody should show that this can overall become a
 downside, which I doubt... I think if it does, the VM is to blame for
 calling page_referenced when there is no point to do so just yet.

Agreed. ACK.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-05 Thread Marcelo Tosatti
On Thu, Feb 04, 2010 at 06:47:15PM +0100, Andrea Arcangeli wrote:
 On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
  I suspect it won't be very many. I have been monitoring
  /proc/meminfo on my system while testing this patch, and
  it is quite typical that the size of the inactive anon
  list does not change for minutes at a time.
  
  In other words, no pages are moved onto or off of the
  inactive anon list for several minutes. That corresponds
  to a very small number of minor faults introduced by my
  patch.
 
 When there's light VM pressure, ideally there should be zero overhead
 caused by the patch. When there is VM pressure this will avoid some
 unnecessary I/O which should outweight the minor faults. It should be
 a good default behavior.

Agree.

But perhaps a module parameter to turn accessed bit emulation off might
be handy in the future?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-05 Thread Andrea Arcangeli
On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
 But perhaps a module parameter to turn accessed bit emulation off might
 be handy in the future?

Maybe, but somebody should show that this can overall become a
downside, which I doubt... I think if it does, the VM is to blame for
calling page_referenced when there is no point to do so just yet.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-04 Thread Rik van Riel

On 02/03/2010 11:12 PM, Balbir Singh wrote:

* Rik van Rielr...@redhat.com  [2010-02-03 16:11:03]:


Currently KVM pretends that pages with EPT mappings never got
accessed.  This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.

We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.



Quite a clever implementation, one side effect is that one would see a
larger number of minor faults with EPT enabled and an increase in
allocation/frees of rmap entries, but that can be easily explained.


I suspect it won't be very many. I have been monitoring
/proc/meminfo on my system while testing this patch, and
it is quite typical that the size of the inactive anon
list does not change for minutes at a time.

In other words, no pages are moved onto or off of the
inactive anon list for several minutes. That corresponds
to a very small number of minor faults introduced by my
patch.

Of course, when the system is swapping, we will have more
minor faults.  However, minor faults should be less of a
performance issue than major faults :)

--
All rights reversed.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-04 Thread Balbir Singh
* Rik van Riel r...@redhat.com [2010-02-04 08:40:43]:

 On 02/03/2010 11:12 PM, Balbir Singh wrote:
 * Rik van Rielr...@redhat.com  [2010-02-03 16:11:03]:
 
 Currently KVM pretends that pages with EPT mappings never got
 accessed.  This has some side effects in the VM, like swapping
 out actively used guest pages and needlessly breaking up actively
 used hugepages.
 
 We can avoid those very costly side effects by emulating the
 accessed bit for EPT PTEs, which should only be slightly costly
 because pages pass through page_referenced infrequently.
 
 Quite a clever implementation, one side effect is that one would see a
 larger number of minor faults with EPT enabled and an increase in
 allocation/frees of rmap entries, but that can be easily explained.
 
 I suspect it won't be very many. I have been monitoring
 /proc/meminfo on my system while testing this patch, and
 it is quite typical that the size of the inactive anon
 list does not change for minutes at a time.
 
 In other words, no pages are moved onto or off of the
 inactive anon list for several minutes. That corresponds
 to a very small number of minor faults introduced by my
 patch.
 
 Of course, when the system is swapping, we will have more
 minor faults.  However, minor faults should be less of a
 performance issue than major faults :)


I do agree with you. 

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-04 Thread Rik van Riel

Balbir Singh wrote:

* Rik van Riel r...@redhat.com [2010-02-04 08:40:43]:


On 02/03/2010 11:12 PM, Balbir Singh wrote:

* Rik van Rielr...@redhat.com  [2010-02-03 16:11:03]:


Currently KVM pretends that pages with EPT mappings never got
accessed.  This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.

We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.

Quite a clever implementation, one side effect is that one would see a
larger number of minor faults with EPT enabled and an increase in
allocation/frees of rmap entries, but that can be easily explained.

I suspect it won't be very many. I have been monitoring
/proc/meminfo on my system while testing this patch, and
it is quite typical that the size of the inactive anon
list does not change for minutes at a time.

In other words, no pages are moved onto or off of the
inactive anon list for several minutes. That corresponds
to a very small number of minor faults introduced by my
patch.

Of course, when the system is swapping, we will have more
minor faults.  However, minor faults should be less of a
performance issue than major faults :)



I do agree with you. 


After 20 hours of uptime, it appears that this patch has
resolved the KVM guests get swapped while buffer and page
cache stay in memory problem my home system was experiencing.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-04 Thread Jeff Dike
On Wed, Feb 03, 2010 at 04:11:03PM -0500, Rik van Riel wrote:
 Jeff, does this patch fix the issue you saw a few months ago, with
 a 256MB KVM guest in a cgroup limited to 128GB memory?

Hum, let me dust off that workload and give it a shot...

Jeff
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-04 Thread Andrea Arcangeli
On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
 I suspect it won't be very many. I have been monitoring
 /proc/meminfo on my system while testing this patch, and
 it is quite typical that the size of the inactive anon
 list does not change for minutes at a time.
 
 In other words, no pages are moved onto or off of the
 inactive anon list for several minutes. That corresponds
 to a very small number of minor faults introduced by my
 patch.

When there's light VM pressure, ideally there should be zero overhead
caused by the patch. When there is VM pressure this will avoid some
unnecessary I/O which should outweight the minor faults. It should be
a good default behavior.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] emulate accessed bit for EPT

2010-02-03 Thread Balbir Singh
* Rik van Riel r...@redhat.com [2010-02-03 16:11:03]:

 Currently KVM pretends that pages with EPT mappings never got
 accessed.  This has some side effects in the VM, like swapping
 out actively used guest pages and needlessly breaking up actively
 used hugepages.
 
 We can avoid those very costly side effects by emulating the
 accessed bit for EPT PTEs, which should only be slightly costly
 because pages pass through page_referenced infrequently.
 
 TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
 
 This seems to help prevent KVM guests from being swapped out when
 they should not on my system.
 
 Signed-off-by: Rik van Riel r...@redhat.com
 ---
 Jeff, does this patch fix the issue you saw a few months ago, with
 a 256MB KVM guest in a cgroup limited to 128GB memory?
 
  arch/x86/kvm/mmu.c |   10 --
  1 files changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 89a49fb..6101615 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
 *rmapp,
   u64 *spte;
   int young = 0;
 
 - /* always return old for EPT */
 + /*
 +  * Emulate the accessed bit for EPT, by checking if this page has
 +  * an EPT mapping, and clearing it if it does. On the next access,
 +  * a new EPT mapping will be established.
 +  * This has some overhead, but not as much as the cost of swapping
 +  * out actively used pages or breaking up actively used hugepages.
 +  */
   if (!shadow_accessed_mask)
 - return 0;
 + return kvm_unmap_rmapp(kvm, rmapp, data);


Quite a clever implementation, one side effect is that one would see a
larger number of minor faults with EPT enabled and an increase in
allocation/frees of rmap entries, but that can be easily explained.

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html