Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Peter Lieven

On 24.02.2012 08:23, Stefan Hajnoczi wrote:

On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczistefa...@gmail.com  wrote:

On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczistefa...@gmail.com  wrote:

On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.comp...@dlh.net  wrote:

Stefan Hajnoczistefa...@gmail.com  schrieb:


On Thu, Feb 23, 2012 at 3:40 PM, Peter Lievenp...@dlh.net  wrote:

However, in a virtual machine I have not observed the above slow down

to

that extend
while the benefit of zero after free in a virtualisation environment

is

obvious:

1) zero pages can easily be merged by ksm or other technique.
2) zero (dup) pages are a lot faster to transfer in case of

migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


I dont think that it is cpu intense. All user pages are zeroed anyway, but at 
allocation time it shouldnt be a big difference in terms of cpu power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

Perhaps the middle path is to zero pages but do it after a grace
timeout.  I wonder if this helps eliminate the 2-3% slowdown you
noticed when compiling.

Gah, it's too early in the morning.  I don't think this timer actually
makes sense.
do you think it makes then sense to make a patchset/proposal to notice a 
guest

kernel about the presense of ksm in the host and switch to zero after free?

peter

Stefan






Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Peter Lieven

On 28.02.2012 13:05, Stefan Hajnoczi wrote:

On Tue, Feb 28, 2012 at 11:46 AM, Peter Lievenp...@dlh.net  wrote:

On 24.02.2012 08:23, Stefan Hajnoczi wrote:

On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczistefa...@gmail.com
  wrote:

On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczistefa...@gmail.com
  wrote:

On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.comp...@dlh.net
  wrote:

Stefan Hajnoczistefa...@gmail.comschrieb:


On Thu, Feb 23, 2012 at 3:40 PM, Peter Lievenp...@dlh.netwrote:

However, in a virtual machine I have not observed the above slow down

to

that extend
while the benefit of zero after free in a virtualisation environment

is

obvious:

1) zero pages can easily be merged by ksm or other technique.
2) zero (dup) pages are a lot faster to transfer in case of

migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


I dont think that it is cpu intense. All user pages are zeroed anyway,
but at allocation time it shouldnt be a big difference in terms of cpu
power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

Perhaps the middle path is to zero pages but do it after a grace
timeout.  I wonder if this helps eliminate the 2-3% slowdown you
noticed when compiling.

Gah, it's too early in the morning.  I don't think this timer actually
makes sense.


do you think it makes then sense to make a patchset/proposal to notice a
guest
kernel about the presense of ksm in the host and switch to zero after free?

I think your idea is interesting - whether or not people are happy
with it will depend on the performance impact.  It seems reasonable to
me.

could you support/help me in implementing and publishing this approach?

Peter



Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Avi Kivity
On 02/23/2012 06:42 PM, Stefan Hajnoczi wrote:
 On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
  However, in a virtual machine I have not observed the above slow down to
  that extend
  while the benefit of zero after free in a virtualisation environment is
  obvious:
 
  1) zero pages can easily be merged by ksm or other technique.
  2) zero (dup) pages are a lot faster to transfer in case of migration.

 The other approach is a memory page discard mechanism - which
 obviously requires more code changes than zeroing freed pages.

 The advantage is that we don't take the brute-force and CPU intensive
 approach of zeroing pages.  It would be like a fine-grained ballooning
 feature.

 I hope someone will follow up saying this has already been done or
 prototyped :).

It already exists - that's the balloon code.  Right now it's host
driven, but maybe we can modify it to allow the guest to initiate
balloon inflations.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Avi Kivity
On 02/24/2012 08:41 AM, Stefan Hajnoczi wrote:
 
  I dont think that it is cpu intense. All user pages are zeroed anyway, but 
  at allocation time it shouldnt be a big difference in terms of cpu power.

 It's easy to find a scenario where eagerly zeroing pages is wasteful.
 Imagine a process that uses all of physical memory.  Once it
 terminates the system is going to run processes that only use a small
 set of pages.  It's pointless zeroing all those pages if we're not
 going to use them anymore.

In the long term, we will use them, except if the guest is completely idle.

The scenario in which zeroing is expensive is when the page is refilled
through DMA.  In that case the zeroing was wasted.  This is a pretty
common scenario in pagecache intensive workloads.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Peter Lieven

On 28.02.2012 14:16, Avi Kivity wrote:

On 02/24/2012 08:41 AM, Stefan Hajnoczi wrote:

I dont think that it is cpu intense. All user pages are zeroed anyway, but at 
allocation time it shouldnt be a big difference in terms of cpu power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

In the long term, we will use them, except if the guest is completely idle.

The scenario in which zeroing is expensive is when the page is refilled
through DMA.  In that case the zeroing was wasted.  This is a pretty
common scenario in pagecache intensive workloads.


Avi, what do you think of the proposal to give the guest vm a hint
that the host is running ksm? In that case the administrator
has already chosen that saving physical memory is more important
than performance to him?

Peter



Re: [Qemu-devel] linux guests and ksm performance

2012-02-28 Thread Avi Kivity
On 02/28/2012 03:20 PM, Peter Lieven wrote:
 On 28.02.2012 14:16, Avi Kivity wrote:
 On 02/24/2012 08:41 AM, Stefan Hajnoczi wrote:
 I dont think that it is cpu intense. All user pages are zeroed
 anyway, but at allocation time it shouldnt be a big difference in
 terms of cpu power.
 It's easy to find a scenario where eagerly zeroing pages is wasteful.
 Imagine a process that uses all of physical memory.  Once it
 terminates the system is going to run processes that only use a small
 set of pages.  It's pointless zeroing all those pages if we're not
 going to use them anymore.
 In the long term, we will use them, except if the guest is completely
 idle.

 The scenario in which zeroing is expensive is when the page is refilled
 through DMA.  In that case the zeroing was wasted.  This is a pretty
 common scenario in pagecache intensive workloads.

 Avi, what do you think of the proposal to give the guest vm a hint
 that the host is running ksm? In that case the administrator
 has already chosen that saving physical memory is more important
 than performance to him?

It makes some sense.  Perhaps through the balloon device, a flag that
indicates that voluntary ballooning will be gratefully accepted.

-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Peter Lieven
Hi,

i have recently been playing with an old idea (originally in grsecurity
for security reasons) to change
the policy from zero on allocate to zero after free in the linux page
allocator. My concern is that linux
leaves a lot of waste in the physical memory unlike Windows which per
default zeros pages after
they are freed.

I have run some tests and I can confirm some old results that a hardware
Linux machine
is approximately 2-3% slower with zero after free on big compilation jobs.
This might be due
to either the fact that pages are only zeroed on allocate if GFP_ZERO is
set or due to caching
benefits.

However, in a virtual machine I have not observed the above slow down to
that extend
while the benefit of zero after free in a virtualisation environment is
obvious:

1) zero pages can easily be merged by ksm or other technique.
2) zero (dup) pages are a lot faster to transfer in case of migration.

Therefore I would like to hear your thoughts if it would be a good idea to
change
the strategy in the Linux kernel from zero on allocate to zero after free
automatically
if the 'hypervisor' cpu feature is set? Or even have another technique to
tell a linux
guest that ksm is running on the host.

If this is not feasible can someone think of a kernel module / userspace
program that
zeroes out unused pages periodically.

Peter





Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Stefan Hajnoczi
On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down to
 that extend
 while the benefit of zero after free in a virtualisation environment is
 obvious:

 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.

I hope someone will follow up saying this has already been done or
prototyped :).

Stefan



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Javier Guerra Giraldez
On Thu, Feb 23, 2012 at 11:42 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 The other approach is a memory page discard mechanism - which
 obviously requires more code changes than zeroing freed pages.

 The advantage is that we don't take the brute-force and CPU intensive
 approach of zeroing pages.  It would be like a fine-grained ballooning
 feature.

(disclaimer: i don't know the code, i'm just guessing)

does KVM emulate the MMU? if so, is there any 'unmap page' primitive?

-- 
Javier



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread peter.lie...@gmail.com




Stefan Hajnoczi stefa...@gmail.com schrieb:

On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down
to
 that extend
 while the benefit of zero after free in a virtualisation environment
is
 obvious:

 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of
migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


I dont think that it is cpu intense. All user pages are zeroed anyway, but at 
allocation time it shouldnt be a big difference in terms of cpu power.

I hope someone will follow up saying this has already been done or
prototyped :).

Stefan

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Stefan Hajnoczi
On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com p...@dlh.net wrote:




 Stefan Hajnoczi stefa...@gmail.com schrieb:

On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down
to
 that extend
 while the benefit of zero after free in a virtualisation environment
is
 obvious:

 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of
migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


 I dont think that it is cpu intense. All user pages are zeroed anyway, but at 
 allocation time it shouldnt be a big difference in terms of cpu power.

It's easy to find a scenario where eagerly zeroing pages is wasteful.
Imagine a process that uses all of physical memory.  Once it
terminates the system is going to run processes that only use a small
set of pages.  It's pointless zeroing all those pages if we're not
going to use them anymore.

Stefan



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Stefan Hajnoczi
On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com p...@dlh.net wrote:




 Stefan Hajnoczi stefa...@gmail.com schrieb:

On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down
to
 that extend
 while the benefit of zero after free in a virtualisation environment
is
 obvious:

 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of
migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


 I dont think that it is cpu intense. All user pages are zeroed anyway, but 
 at allocation time it shouldnt be a big difference in terms of cpu power.

 It's easy to find a scenario where eagerly zeroing pages is wasteful.
 Imagine a process that uses all of physical memory.  Once it
 terminates the system is going to run processes that only use a small
 set of pages.  It's pointless zeroing all those pages if we're not
 going to use them anymore.

Perhaps the middle path is to zero pages but do it after a grace
timeout.  I wonder if this helps eliminate the 2-3% slowdown you
noticed when compiling.

This requires no special host-guest interfaces for discarding pages.

Stefan



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Gleb Natapov
On Thu, Feb 23, 2012 at 04:42:54PM +, Stefan Hajnoczi wrote:
 On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
  However, in a virtual machine I have not observed the above slow down to
  that extend
  while the benefit of zero after free in a virtualisation environment is
  obvious:
 
  1) zero pages can easily be merged by ksm or other technique.
  2) zero (dup) pages are a lot faster to transfer in case of migration.
 
 The other approach is a memory page discard mechanism - which
 obviously requires more code changes than zeroing freed pages.
 
 The advantage is that we don't take the brute-force and CPU intensive
 approach of zeroing pages.  It would be like a fine-grained ballooning
 feature.
 
 I hope someone will follow up saying this has already been done or
 prototyped :).
 
That was attempted. It is called page hinting, but AFAIK due to
complex locking issue attempt was abandoned.

--
Gleb.



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Stefan Hajnoczi
On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com p...@dlh.net wrote:
 Stefan Hajnoczi stefa...@gmail.com schrieb:

On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down
to
 that extend
 while the benefit of zero after free in a virtualisation environment
is
 obvious:

 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of
migration.

The other approach is a memory page discard mechanism - which
obviously requires more code changes than zeroing freed pages.

The advantage is that we don't take the brute-force and CPU intensive
approach of zeroing pages.  It would be like a fine-grained ballooning
feature.


 I dont think that it is cpu intense. All user pages are zeroed anyway, but 
 at allocation time it shouldnt be a big difference in terms of cpu power.

 It's easy to find a scenario where eagerly zeroing pages is wasteful.
 Imagine a process that uses all of physical memory.  Once it
 terminates the system is going to run processes that only use a small
 set of pages.  It's pointless zeroing all those pages if we're not
 going to use them anymore.

 Perhaps the middle path is to zero pages but do it after a grace
 timeout.  I wonder if this helps eliminate the 2-3% slowdown you
 noticed when compiling.

Gah, it's too early in the morning.  I don't think this timer actually
makes sense.

Stefan



Re: [Qemu-devel] linux guests and ksm performance

2012-02-23 Thread Peter Lieven

Am 24.02.2012 um 08:23 schrieb Stefan Hajnoczi:

 On Fri, Feb 24, 2012 at 6:53 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Feb 24, 2012 at 6:41 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 7:08 PM, peter.lie...@gmail.com p...@dlh.net 
 wrote:
 Stefan Hajnoczi stefa...@gmail.com schrieb:
 
 On Thu, Feb 23, 2012 at 3:40 PM, Peter Lieven p...@dlh.net wrote:
 However, in a virtual machine I have not observed the above slow down
 to
 that extend
 while the benefit of zero after free in a virtualisation environment
 is
 obvious:
 
 1) zero pages can easily be merged by ksm or other technique.
 2) zero (dup) pages are a lot faster to transfer in case of
 migration.
 
 The other approach is a memory page discard mechanism - which
 obviously requires more code changes than zeroing freed pages.
 
 The advantage is that we don't take the brute-force and CPU intensive
 approach of zeroing pages.  It would be like a fine-grained ballooning
 feature.
 
 
 I dont think that it is cpu intense. All user pages are zeroed anyway, but 
 at allocation time it shouldnt be a big difference in terms of cpu power.
 
 It's easy to find a scenario where eagerly zeroing pages is wasteful.
 Imagine a process that uses all of physical memory.  Once it
 terminates the system is going to run processes that only use a small
 set of pages.  It's pointless zeroing all those pages if we're not
 going to use them anymore.
 
 Perhaps the middle path is to zero pages but do it after a grace
 timeout.  I wonder if this helps eliminate the 2-3% slowdown you
 noticed when compiling.
 
 Gah, it's too early in the morning.  I don't think this timer actually
 makes sense.

ok, that would be the idea of an ansynchronous page zeroing in the guest. i also
think this is to complicated.

maybe the other idea is too simple:
is it possible to give the guest a hint that ksm is enabled on the host (lets 
say in
a way like its done with kvmclock). if ksm is enabled on the host the 
administrator
has already made the decision that performance is not so important and he/she
is eager to save physical memory. what if and only if this flag is set switch 
from
zero on allocate to zero after free. i think the whole thing is less than 10-20
lines of code. and its code that has been proven to be working well in 
grsecurity
for ages.

this might introduce a little (2-3%) overhead, but only if there is a lot of 
non GFP_FREE
memory is allocated, but its definitely faster than swapping. 
of course, it has to be garanteed that this code does not slow down normal 
systems
due to additionales branches (would it be enough to mark the if statements as 
unlikely?)

peter


peter





 
 Stefan