subject:"\[PATCH 4\/4\] KVM\: MMU\: Make mmu_shrink\(\) scan nr_to

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Avi Kivity

On 12/19/2011 12:21 PM, Takuya Yoshikawa wrote:
> (2011/12/19 19:03), Avi Kivity wrote:
>>> IMO, The goal should be restricted to emergencies.
>>>
>>> So possible solution may be:
>>>  - we set the tuning parameters as conservative as possible
>>>  - pick up a guest with relatively high ratio
>>>(I have to think more how to achieve this)
>>>  - move the vm_list head for fairness
>>>
>>> In an emergency, we should not mind performance penalty so much.
>>
>> But is the shrinker really only called in emergencies?
>
> No, sadly.
>
> That is the problem.

It's not a problem - otherwise the icache and dcache would grow
indefinitely.

kvm's caches have another limit - perhaps we should remove it and let
the shrinker manage the caches itself (but that requires a better
selection algorithm).

>
>>
>> Also, with things like cgroups, we may have an emergency in one
>> container, but not in others - if the shrinker is not cgroup aware, it
>> soon will be.
>
> That seems to be a common problem for everyone, not KVM only.

It's really a requirement for dcache/icache, otherwise one container
could force out icache/dcache for another.  It doesn't concern us
directly but we have to ensure that one guest cannot force out another's
mmu pages.

>
>>> But there is not a perfect value because how often mmu_shrink() can be
>>> called
>>> will change if the admin change the sysctl_vfs_cache_pressure tuning
>>> parameter
>>> for dcache and icache, IIUC.
>>>
>>> And tdp and shadow paging differ much.
>>
>> We should aim for the following:
>> - normal operation causes very little shrinks (some are okay)
>> - high pressure mostly due to kvm results in kvm being shrunk (this is a
>> pathological case caused by a starting a guest with a huge amount of
>> memory, and mapping it all to /dev/zero (or ksm), and getting the guest
>> the create shadow mappings for all of it)
>> - general high pressure is shared among other caches like dcache and
>> icache
>>
>> The cost of reestablishing an mmu page can be as high as half a
>> millisecond of cpu time, which is the reason I want to be conservative.
>>
>
> I agree with you.
>
> I feel that I should add lkml in CC next time to hear from mm specialist.
> Shrinker has many heuristics added from a lot of experience; my lack of
> such experience means I need help.

Copying one expert.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Takuya Yoshikawa


(2011/12/19 19:03), Avi Kivity wrote:

IMO, The goal should be restricted to emergencies.

So possible solution may be:
 - we set the tuning parameters as conservative as possible
 - pick up a guest with relatively high ratio
   (I have to think more how to achieve this)
 - move the vm_list head for fairness

In an emergency, we should not mind performance penalty so much.


But is the shrinker really only called in emergencies?


No, sadly.

That is the problem.



Also, with things like cgroups, we may have an emergency in one
container, but not in others - if the shrinker is not cgroup aware, it
soon will be.


That seems to be a common problem for everyone, not KVM only.


But there is not a perfect value because how often mmu_shrink() can be
called
will change if the admin change the sysctl_vfs_cache_pressure tuning
parameter
for dcache and icache, IIUC.

And tdp and shadow paging differ much.


We should aim for the following:
- normal operation causes very little shrinks (some are okay)
- high pressure mostly due to kvm results in kvm being shrunk (this is a
pathological case caused by a starting a guest with a huge amount of
memory, and mapping it all to /dev/zero (or ksm), and getting the guest
the create shadow mappings for all of it)
- general high pressure is shared among other caches like dcache and icache

The cost of reestablishing an mmu page can be as high as half a
millisecond of cpu time, which is the reason I want to be conservative.



I agree with you.

I feel that I should add lkml in CC next time to hear from mm specialist.
Shrinker has many heuristics added from a lot of experience; my lack of
such experience means I need help.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Avi Kivity

On 12/19/2011 11:56 AM, Takuya Yoshikawa wrote:
> (2011/12/19 18:26), Avi Kivity wrote:
>> On 12/19/2011 11:22 AM, Takuya Yoshikawa wrote:
 Yes, it's very conservative.  But on the other hand the shrinker is
 tuned for dcache and icache, where there are usually tons of useless
 objects.  If we have to free something, I'd rather free them
 instead of
 mmu pages which tend to get recreated soon.

>>>
>>>
>>> OK, to satisfy the requirements, I will do:
>>>
>>>  1. find the guest with the highest (shadow pages / memory) ratio
>>
>> How do you propose to do that efficiently?  We may have hundreds of
>> guests, or even more, on one host.  Each guest access will involve
>> bouncing a few cache lines.
>
> IMO, The goal should be restricted to emergencies.
>
> So possible solution may be:
> - we set the tuning parameters as conservative as possible
> - pick up a guest with relatively high ratio
>   (I have to think more how to achieve this)
> - move the vm_list head for fairness
>
> In an emergency, we should not mind performance penalty so much.

But is the shrinker really only called in emergencies?

Also, with things like cgroups, we may have an emergency in one
container, but not in others - if the shrinker is not cgroup aware, it
soon will be.

>
>>
>>>  2. just zap one page from that guest, keeping the current
>>> conservative rate
>>>
>>> I will update the patch.
>>
>> I think the current rate is too conservative.  No idea what a good one
>> is, I don't have a feeling as to the relation between shrinker callbacks
>> and memory pressure.
>>
>
> When I tried to see what the current code is doing, frankly speaking,
> I thought mmu_shrink() was not tested enough from the beginning.

It wasn't, and in fact the original code was even worse, the code we
have now is after some fixes.

>
> I read the shrinker code as far as possible and realized the
> combination of
> (seeks=10*default, batch=128) is not reasonable; the high seeks means the
> shrinker rarely calculate higher value than 128, and mmu_shrink() cannot
> be called in normal life.
>
> How about setting the batch a bit lower, keeping seeks as is?

Ok.

>
> But there is not a perfect value because how often mmu_shrink() can be
> called
> will change if the admin change the sysctl_vfs_cache_pressure tuning
> parameter
> for dcache and icache, IIUC.
>
> And tdp and shadow paging differ much.

We should aim for the following:
- normal operation causes very little shrinks (some are okay)
- high pressure mostly due to kvm results in kvm being shrunk (this is a
pathological case caused by a starting a guest with a huge amount of
memory, and mapping it all to /dev/zero (or ksm), and getting the guest
the create shadow mappings for all of it)
- general high pressure is shared among other caches like dcache and icache

The cost of reestablishing an mmu page can be as high as half a
millisecond of cpu time, which is the reason I want to be conservative.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Takuya Yoshikawa


(2011/12/19 18:26), Avi Kivity wrote:

On 12/19/2011 11:22 AM, Takuya Yoshikawa wrote:

Yes, it's very conservative.  But on the other hand the shrinker is
tuned for dcache and icache, where there are usually tons of useless
objects.  If we have to free something, I'd rather free them instead of
mmu pages which tend to get recreated soon.




OK, to satisfy the requirements, I will do:

 1. find the guest with the highest (shadow pages / memory) ratio


How do you propose to do that efficiently?  We may have hundreds of
guests, or even more, on one host.  Each guest access will involve
bouncing a few cache lines.


IMO, The goal should be restricted to emergencies.

So possible solution may be:
- we set the tuning parameters as conservative as possible
- pick up a guest with relatively high ratio
  (I have to think more how to achieve this)
- move the vm_list head for fairness

In an emergency, we should not mind performance penalty so much.




 2. just zap one page from that guest, keeping the current
conservative rate

I will update the patch.


I think the current rate is too conservative.  No idea what a good one
is, I don't have a feeling as to the relation between shrinker callbacks
and memory pressure.



When I tried to see what the current code is doing, frankly speaking,
I thought mmu_shrink() was not tested enough from the beginning.

I read the shrinker code as far as possible and realized the combination of
(seeks=10*default, batch=128) is not reasonable; the high seeks means the
shrinker rarely calculate higher value than 128, and mmu_shrink() cannot
be called in normal life.

How about setting the batch a bit lower, keeping seeks as is?

But there is not a perfect value because how often mmu_shrink() can be called
will change if the admin change the sysctl_vfs_cache_pressure tuning parameter
for dcache and icache, IIUC.

And tdp and shadow paging differ much.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Avi Kivity

On 12/19/2011 11:22 AM, Takuya Yoshikawa wrote:
>> Yes, it's very conservative.  But on the other hand the shrinker is
>> tuned for dcache and icache, where there are usually tons of useless
>> objects.  If we have to free something, I'd rather free them instead of
>> mmu pages which tend to get recreated soon.
>>
>
>
> OK, to satisfy the requirements, I will do:
>
> 1. find the guest with the highest (shadow pages / memory) ratio

How do you propose to do that efficiently?  We may have hundreds of
guests, or even more, on one host.  Each guest access will involve
bouncing a few cache lines.

> 2. just zap one page from that guest, keeping the current
> conservative rate
>
> I will update the patch.

I think the current rate is too conservative.  No idea what a good one
is, I don't have a feeling as to the relation between shrinker callbacks
and memory pressure.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Takuya Yoshikawa


(2011/12/19 17:43), Avi Kivity wrote:

Well, if one guest is twice as large as other guests, then it will want
twice as many shadow pages.  So our goal should be to zap pages from the
guest with the highest (shadow pages / memory) ratio.



Can you measure whether there is a significant difference in a synthetic
workload, and what that change is? Perhaps apply {moderate, high} memory
pressure load with {2, 4, 8, 16} VMs or something like that.



I was running 4 VMs on my machine with enough high memory pressure.  The problem
was that mmu_shrink() was not tuned to be called in usual memory pressures:  
what
I did was changing the seeks and batch parameters and making ept=0.

At least, I have checked that if I make one VM do meaningless many copies, 
letting
others keep silent, the shrinker frees shadow pages intensively from that one.


Anyway, I don't think making the shrinker call mmu_shrink() with the default 
batch
size, nr_to_scan=128, and just trying to free one shadow page is a good 
behaviour.


Yes, it's very conservative.  But on the other hand the shrinker is
tuned for dcache and icache, where there are usually tons of useless
objects.  If we have to free something, I'd rather free them instead of
mmu pages which tend to get recreated soon.



OK, to satisfy the requirements, I will do:

1. find the guest with the highest (shadow pages / memory) ratio
2. just zap one page from that guest, keeping the current conservative 
rate

I will update the patch.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-19 Thread Avi Kivity

On 12/16/2011 04:58 PM, Takuya Yoshikawa wrote:
> > 
> > I'm not sure this is a meaningful test to verify this change is
> > worthwhile, because while the shrinker tries to free a shadow page from
> > one vm, the vm's position in the kvm list is changed, so the over time
> > the shrinker will cycle over all VMs.
>
> The test was for checking if mmu_shrink() would work as intended.  Maybe
> not good as a changelog, sorry.
>
>
> I admit that I could not find any strong reason except for protecting the
> host from intentionally induced shadowing.
>
> But for that, don't you think that freeing the same amount of shadow pages
> from good and bad VMs eaqually is bad thing?
>
> My method will try to free many shadow pages from VMs with many shadow
> pages;  e.g. if there is a pathological increase in shadow pages for one
> VM, that one will be intensively treated.
>
> If you can agree on this reasoning, I will update the description and resend.

Well, if one guest is twice as large as other guests, then it will want
twice as many shadow pages.  So our goal should be to zap pages from the
guest with the highest (shadow pages / memory) ratio.

> > 
> > Can you measure whether there is a significant difference in a synthetic
> > workload, and what that change is? Perhaps apply {moderate, high} memory
> > pressure load with {2, 4, 8, 16} VMs or something like that.
> > 
>
> I was running 4 VMs on my machine with enough high memory pressure.  The 
> problem
> was that mmu_shrink() was not tuned to be called in usual memory pressures:  
> what
> I did was changing the seeks and batch parameters and making ept=0.
>
> At least, I have checked that if I make one VM do meaningless many copies, 
> letting
> others keep silent, the shrinker frees shadow pages intensively from that one.
>
>
> Anyway, I don't think making the shrinker call mmu_shrink() with the default 
> batch
> size, nr_to_scan=128, and just trying to free one shadow page is a good 
> behaviour.

Yes, it's very conservative.  But on the other hand the shrinker is
tuned for dcache and icache, where there are usually tons of useless
objects.  If we have to free something, I'd rather free them instead of
mmu pages which tend to get recreated soon.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-16 Thread Takuya Yoshikawa

On Fri, 16 Dec 2011 09:06:11 -0200
Marcelo Tosatti  wrote:

> On Mon, Dec 12, 2011 at 07:26:47AM +0900, Takuya Yoshikawa wrote:
> > From: Takuya Yoshikawa 
> > 
> > Currently, mmu_shrink() tries to free a shadow page from one kvm and
> > does not use nr_to_scan correctly.
> > 
> > This patch fixes this by making it try to free some shadow pages from
> > each kvm.  The number of shadow pages each kvm frees becomes
> > proportional to the number of shadow pages it is using.
> > 
> > Note: an easy way to see how this code works is to do
> >   echo 3 > /proc/sys/vm/drop_caches
> > during some virtual machines are running.  Shadow pages will be zapped
> > as expected by this.
> 
> I'm not sure this is a meaningful test to verify this change is
> worthwhile, because while the shrinker tries to free a shadow page from
> one vm, the vm's position in the kvm list is changed, so the over time
> the shrinker will cycle over all VMs.

The test was for checking if mmu_shrink() would work as intended.  Maybe
not good as a changelog, sorry.

I admit that I could not find any strong reason except for protecting the
host from intentionally induced shadowing.

But for that, don't you think that freeing the same amount of shadow pages
from good and bad VMs eaqually is bad thing?

My method will try to free many shadow pages from VMs with many shadow
pages;  e.g. if there is a pathological increase in shadow pages for one
VM, that one will be intensively treated.

If you can agree on this reasoning, I will update the description and resend.

> 
> Can you measure whether there is a significant difference in a synthetic
> workload, and what that change is? Perhaps apply {moderate, high} memory
> pressure load with {2, 4, 8, 16} VMs or something like that.
> 

I was running 4 VMs on my machine with enough high memory pressure.  The problem
was that mmu_shrink() was not tuned to be called in usual memory pressures:  
what
I did was changing the seeks and batch parameters and making ept=0.

At least, I have checked that if I make one VM do meaningless many copies, 
letting
others keep silent, the shrinker frees shadow pages intensively from that one.

Anyway, I don't think making the shrinker call mmu_shrink() with the default 
batch
size, nr_to_scan=128, and just trying to free one shadow page is a good 
behaviour.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-16 Thread Marcelo Tosatti

On Mon, Dec 12, 2011 at 07:26:47AM +0900, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa 
> 
> Currently, mmu_shrink() tries to free a shadow page from one kvm and
> does not use nr_to_scan correctly.
> 
> This patch fixes this by making it try to free some shadow pages from
> each kvm.  The number of shadow pages each kvm frees becomes
> proportional to the number of shadow pages it is using.
> 
> Note: an easy way to see how this code works is to do
>   echo 3 > /proc/sys/vm/drop_caches
> during some virtual machines are running.  Shadow pages will be zapped
> as expected by this.

I'm not sure this is a meaningful test to verify this change is
worthwhile, because while the shrinker tries to free a shadow page from
one vm, the vm's position in the kvm list is changed, so the over time
the shrinker will cycle over all VMs.

Can you measure whether there is a significant difference in a synthetic
workload, and what that change is? Perhaps apply {moderate, high} memory
pressure load with {2, 4, 8, 16} VMs or something like that.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-11 Thread Takuya Yoshikawa

From: Takuya Yoshikawa 

Currently, mmu_shrink() tries to free a shadow page from one kvm and
does not use nr_to_scan correctly.

This patch fixes this by making it try to free some shadow pages from
each kvm.  The number of shadow pages each kvm frees becomes
proportional to the number of shadow pages it is using.

Note: an easy way to see how this code works is to do
  echo 3 > /proc/sys/vm/drop_caches
during some virtual machines are running.  Shadow pages will be zapped
as expected by this.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c |   23 ++-
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fcd0dd1..c6c61dd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3921,7 +3921,7 @@ restart:
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
struct kvm *kvm;
-   struct kvm *kvm_freed = NULL;
+   int nr_to_zap, nr_total;
int nr_to_scan = sc->nr_to_scan;
 
if (nr_to_scan == 0)
@@ -3929,25 +3929,30 @@ static int mmu_shrink(struct shrinker *shrink, struct 
shrink_control *sc)
 
raw_spin_lock(&kvm_lock);
 
+   nr_total = percpu_counter_read_positive(&kvm_total_used_mmu_pages);
+
list_for_each_entry(kvm, &kvm_list, list) {
int idx;
LIST_HEAD(invalid_list);
 
+   if (nr_to_scan <= 0) {
+   /* next time from this kvm */
+   list_move_tail(&kvm_list, &kvm->list);
+   break;
+   }
+
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
-   if (!kvm_freed && nr_to_scan > 0 &&
-   kvm->arch.n_used_mmu_pages > 0) {
-   pre_zap_one_sp(kvm, &invalid_list);
-   kvm_freed = kvm;
-   }
-   nr_to_scan--;
 
+   /* proportional to how many shadow pages this kvm is using */
+   nr_to_zap = sc->nr_to_scan * kvm->arch.n_used_mmu_pages;
+   nr_to_zap /= nr_total;
+   nr_to_scan -= pre_zap_some_sp(kvm, &invalid_list, nr_to_zap);
kvm_mmu_commit_zap_page(kvm, &invalid_list);
+
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
-   if (kvm_freed)
-   list_move_tail(&kvm_freed->list, &kvm_list);
 
raw_spin_unlock(&kvm_lock);
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

Re: [PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

[PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

10 matches

Site Navigation

Mail list logo

Footer information