Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-20 Thread Nick Piggin
On Friday 17 April 2009 17:08:07 Jared Hulbert wrote:
> > As everyone knows, my favourite thing is to say nasty things about any
> > new feature that adds complexity to common code. I feel like crying to
> > hear about how many more instances of MS Office we can all run, if only
> > we apply this patch. And the poorly written HPC app just sounds like
> > scrapings from the bottom of justification barrel.
> >
> > I'm sorry, maybe I'm way off with my understanding of how important
> > this is. There isn't too much help in the changelog. A discussion of
> > where the memory savings comes from, and how far does things like
> > sharing of fs image, or ballooning goes and how much extra savings we
> > get from this... with people from other hypervisors involved as well.
> > Have I missed this kind of discussion?
> 
> Nick,
> 
> I don't know about other hypervisors, fs and balloonings, but I have
> tried this out.  It works.  It works on apps I don't consider, "poorly
> written".  I'm very excited about this.  I got >10% saving in a
> roughly off the shelf embedded system.  No user noticeable performance
> impact.

OK well that's what I want to hear. Thanks, that means a lot to me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-17 Thread Jared Hulbert
> As everyone knows, my favourite thing is to say nasty things about any
> new feature that adds complexity to common code. I feel like crying to
> hear about how many more instances of MS Office we can all run, if only
> we apply this patch. And the poorly written HPC app just sounds like
> scrapings from the bottom of justification barrel.
>
> I'm sorry, maybe I'm way off with my understanding of how important
> this is. There isn't too much help in the changelog. A discussion of
> where the memory savings comes from, and how far does things like
> sharing of fs image, or ballooning goes and how much extra savings we
> get from this... with people from other hypervisors involved as well.
> Have I missed this kind of discussion?

Nick,

I don't know about other hypervisors, fs and balloonings, but I have
tried this out.  It works.  It works on apps I don't consider, "poorly
written".  I'm very excited about this.  I got >10% saving in a
roughly off the shelf embedded system.  No user noticeable performance
impact.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-16 Thread Izik Eidus

Nick Piggin wrote:

On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote:
  

On Thu,  9 Apr 2009 06:58:37 +0300
Izik Eidus  wrote:



KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.
  

Generally looks OK to me.  But that doesn't mean much.  We should rub
bottles with words like "hugh" and "nick" on them to be sure.



I haven't looked too closely at it yet sorry. Hugh has a great eye for
these details, though, hint hint :)

As everyone knows, my favourite thing is to say nasty things about any
new feature that adds complexity to common code.


The whole idea and the way i wrote it so it wont touch common code, i 
didnt change the linux mm logic no where.

The worst thing that we have add is helper functions.


 I feel like crying to
hear about how many more instances of MS Office we can all run, if only
we apply this patch.


And more instances of linux guests...


 And the poorly written HPC app just sounds like
scrapings from the bottom of justification barrel.
  


So if you have a big rendering application that load gigas of 
geometrical data that is handled by many threads
and you have a case that each thread sometimes change this geometrical 
data and you dont want the other threads will notice it.
How would you share it in traditional way?, after one time shared data 
will get cowed, how will you recollect it again when it become identical?

KSM do it for applications transparently

KSM writing motivation indeed was KVM where there it is highly needed 
you may check what VMware say about the fact that they have much better 
overcommit than Hyper-V / XEN:


http://blogs.vmware.com/virtualreality/2008/03/cheap-hyperviso.html

It is important to understand that in virtualization enviorments there 
are cases where memory is much more critical than any other resource for 
higher density.


Together with KSM, KVM will have the same memory overcommit abilitys 
such as VMware have.

I'm sorry, maybe I'm way off with my understanding of how important
this is. There isn't too much help in the changelog. A discussion of
where the memory savings comes from,


Memory saving come from identical librarys, identical kernels, zeroed 
pages -> that is for virtualization.
The Librarys code will always be identical among similar guests, so why 
have this code at multiple places on the host memory?



 and how far does things like
sharing of fs image, or ballooning goes and how much extra savings we
get from this...


Ballooning is much worse when it come to performance, beacuse what it 
does is shrink the guest memory, with KSM we find identical pages and 
merge them into one page, so we dont get guest performance lose



 with people from other hypervisors involved as well.
Have I missed this kind of discussion?

Careful what you wish for, ay? :)
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-16 Thread Nick Piggin
On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote:
> On Thu,  9 Apr 2009 06:58:37 +0300
> Izik Eidus  wrote:
> 
> > KSM is a linux driver that allows dynamicly sharing identical memory
> > pages between one or more processes.
> 
> Generally looks OK to me.  But that doesn't mean much.  We should rub
> bottles with words like "hugh" and "nick" on them to be sure.

I haven't looked too closely at it yet sorry. Hugh has a great eye for
these details, though, hint hint :)

As everyone knows, my favourite thing is to say nasty things about any
new feature that adds complexity to common code. I feel like crying to
hear about how many more instances of MS Office we can all run, if only
we apply this patch. And the poorly written HPC app just sounds like
scrapings from the bottom of justification barrel.

I'm sorry, maybe I'm way off with my understanding of how important
this is. There isn't too much help in the changelog. A discussion of
where the memory savings comes from, and how far does things like
sharing of fs image, or ballooning goes and how much extra savings we
get from this... with people from other hypervisors involved as well.
Have I missed this kind of discussion?

Careful what you wish for, ay? :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3

2009-04-14 Thread Andrew Morton
On Thu,  9 Apr 2009 06:58:37 +0300
Izik Eidus  wrote:

> KSM is a linux driver that allows dynamicly sharing identical memory
> pages between one or more processes.

Generally looks OK to me.  But that doesn't mean much.  We should rub
bottles with words like "hugh" and "nick" on them to be sure.


>
> ...
>
>  include/linux/ksm.h  |   48 ++
>  include/linux/miscdevice.h   |1 +
>  include/linux/mm.h   |5 +
>  include/linux/mmu_notifier.h |   34 +
>  include/linux/rmap.h |   11 +
>  mm/Kconfig   |6 +
>  mm/Makefile  |1 +
>  mm/ksm.c | 1674 
> ++
>  mm/memory.c  |   90 +++-
>  mm/mmu_notifier.c|   20 +
>  mm/rmap.c|  139 

And it's pretty unobtrusive for what it is.  I expect we can get this
into 2.6.31 unless there are some pratfalls which I missed.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-07 Thread Andrea Arcangeli
On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
> From v1 to v2:
> 
> 1)Fixed security issue found by Chris Wright:
> Ksm was checking if page is a shared page by running !PageAnon.
> Beacuse that Ksm scan only anonymous memory, all !PageAnons
> inside ksm data strctures are shared page, however there might
> be a case for do_wp_page() when the VM_SHARED is used where
> do_wp_page() would instead of copying the page into new anonymos
> page, would reuse the page, it was fixed by adding check for the
> dirty_bit of the virtual addresses pointing into the shared page.
> I was not finding any VM code tha would clear the dirty bit from
> this virtual address (due to the fact that we allocate the page
> using page_alloc() - kernel allocated pages), ~but i still want
> confirmation about this from the vm guys - thanks.~

As far as I can tell this wasn't a bug and this change is
unnecessary. I already checked this bit but I may have missed
something, so I ask here to be sure.

As far as I can tell when VM_SHARED is set, no anonymous page can ever
be allocated by in that vma range, hence no KSM page can ever be
generated in that vma either. MAP_SHARED|MAP_ANONYMOUS is only a
different API for /dev/shm, IPCSHM backing, no anonymous pages can
live there. It surely worked like that in older 2.6, reading latest
code it seems to still work like that, but if something has changed
Hugh will surely correct me in a jiffy ;).

I still see this in the file=null path.
  
  } else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
  if (error)
goto free_vma;
}


So you can revert your change for now.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Andrea Arcangeli
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote:
> They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc.
> Presumably they will probably want to control it to interleave it over
> all numa nodes and use hugepages for it. It would be very little work.

I thought it's the intermediate result of the computations that leads
to lots of equal data too, in which case ksm is the only way to share
it all.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Izik Eidus

Nikola Ciprich wrote:

Hi Izik,
Is there some user documentation available? (apart from RTFS?:))
I've compiled kernel with v2 of Your patches, loaded ksm module,
did echo 1 > /proc/sys/kernel/mm/ksm/run, but I think it didn't do
anything, at least no pages were collected..
Could You advise me a bit?
thanks a lot in advance...
I can't wait to try it on our hosts runing 50-60 KVMs :)
BR
nik
  
You need the userspace / kvm patchs that i posted together with V1 about 
1-2 weeks ago...

What you should do is this:
Patch Linus kernel git with the ksm patchs (V2) (like you just did)
This patchs can be found at:
http://lkml.org/lkml/2009/4/4/77

Then patch Avi kernel git with the kvm patchs that were sent togather 
with V1

Patchs can be found at:
http://lkml.org/lkml/2009/3/30/534

and then Avi git userspace with this patchs:
http://lkml.org/lkml/2009/3/30/538

Now, after you finish patching the kernel, load the kvm modules from avi 
git, and then using patched userspace

you can start using ksm:

set up the speed: (just number, you can change them to make it take less 
or more cpu)

echo 400 > /sys/kernel/mm/ksm/pages_to_scan
echo 1 > /sys/kernel/mm/ksm/sleep

echo 1 > /sys/kernel/mm/ksm/run

Dont raise all the VMS at once, beacuse then KSM wont be able to catch 
with the memory allocation...
Raise few VMS, see that their memory get shared and your host free 
memory grow, then raise more VMS and so on...


Enjoy.
(You can check  pages_shared for the number of pages that have been 
shared, you can run top as well)


On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
  

From v1 to v2:

1)Fixed security issue found by Chris Wright:
Ksm was checking if page is a shared page by running !PageAnon.
Beacuse that Ksm scan only anonymous memory, all !PageAnons
inside ksm data strctures are shared page, however there might
be a case for do_wp_page() when the VM_SHARED is used where
do_wp_page() would instead of copying the page into new anonymos
page, would reuse the page, it was fixed by adding check for the
dirty_bit of the virtual addresses pointing into the shared page.
I was not finding any VM code tha would clear the dirty bit from
this virtual address (due to the fact that we allocate the page
using page_alloc() - kernel allocated pages), ~but i still want
confirmation about this from the vm guys - thanks.~

2)Moved to sysfs to control ksm:
It was requested as a better way to control the ksm scanning
thread than ioctls.
the sysfs api:
dir: /sys/kernel/mm/ksm/

kernel_pages_allocated - information about how many kernel pages
ksm have allocated, this pages are not swappable, and each page
like that is used by ksm to share pages with identical content

pages_shared - how many pages were shared by ksm


run - set to 1 when you want ksm to run, 0 when no

max_kernel_pages - set the maximum amount of kernel pages
to be allocated by ksm, set 0 for unlimited.

pages_to_scan - how many pages to scan before ksm will sleep

sleep - how much usecs ksm will sleep.

3)Add sysfs paramater to control the maximum kernel pages to be by
ksm.

4)Add statistics about how much pages are really shared.


One issue still to be discussed:
There was a suggestion to use madvice(SHAREABLE) instead of using
ioctls to register memory that need to be scanned by ksm.
Such change is outside the area of ksm.c and would required adding
new madvice api, and change some parts of the vm and the kernel
code, so first thing to do, is realized if we really want this.

I dont know any other open issues.

Thanks.

This is from the first post:
(The kvm part, togather with the kvm-userspace part, was post with V1
before about a week, whoever want to test ksm may download the
patch from lkml archive)

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)

2) Ksm will go again on the whole memory and will recalculate the
   checmsum of the pages, pages that are found to have the same
   checksum value, would be considered "pages that are most likely
   wont changed"
   Ksm will insert this pages into sorted by page content RB-tree that
   is called "unstable tree", the reason that this tree is called
   unstable is due to the 

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Izik Eidus

Nick Piggin wrote:

On Sunday 05 April 2009 01:35:18 Izik Eidus wrote:

  

This driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type.
(For desktop work loads we have achived more than x2 memory overcommit
(more like x3))



Interesting that it is a desirable workload to have multiple guests each
running MS office.
  


This numbers are took from such workload, it is some kind of weird 
script that keep opening Word / Excel and write there like a user...

I think in addition it open internet explorer and enter to random sites...
I can search for the script if wanted...


I wonder, can windows enter a paravirtualised guest mode for KVM? And can
you detect page allocation/freeing events?
  


I Dont know.

 
  

This driver have found users other than KVM, for example CERN,
Fons Rademakers:
"on many-core machines we run one large detector simulation program per core.
These simulation programs are identical but run each in their own process and
need about 2 - 2.5 GB RAM.
We typically buy machines with 2GB RAM per core and so have a problem to run
one of these programs per core.
Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
maps, detector geometry, etc.
Currently people have been trying to start one program, initialize the geometry
and field maps and then fork it N times, to have the data shared.
With KSM this would be done automatically by the system so it sounded extremely
attractive when Andrea presented it."



They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc.
Presumably they will probably want to control it to interleave it over
all numa nodes and use hugepages for it. It would be very little work.
  


Agree about that, dont know their application to much, i know they had 
problems to do it.


 
  

I am sending another seires of patchs for kvm kernel and kvm-userspace
that would allow users of kvm to test ksm with it.
The kvm patchs would apply to Avi git tree.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Nikola Ciprich
Hi Izik,
Is there some user documentation available? (apart from RTFS?:))
I've compiled kernel with v2 of Your patches, loaded ksm module,
did echo 1 > /proc/sys/kernel/mm/ksm/run, but I think it didn't do
anything, at least no pages were collected..
Could You advise me a bit?
thanks a lot in advance...
I can't wait to try it on our hosts runing 50-60 KVMs :)
BR
nik


On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
> From v1 to v2:
> 
> 1)Fixed security issue found by Chris Wright:
> Ksm was checking if page is a shared page by running !PageAnon.
> Beacuse that Ksm scan only anonymous memory, all !PageAnons
> inside ksm data strctures are shared page, however there might
> be a case for do_wp_page() when the VM_SHARED is used where
> do_wp_page() would instead of copying the page into new anonymos
> page, would reuse the page, it was fixed by adding check for the
> dirty_bit of the virtual addresses pointing into the shared page.
> I was not finding any VM code tha would clear the dirty bit from
> this virtual address (due to the fact that we allocate the page
> using page_alloc() - kernel allocated pages), ~but i still want
> confirmation about this from the vm guys - thanks.~
> 
> 2)Moved to sysfs to control ksm:
> It was requested as a better way to control the ksm scanning
> thread than ioctls.
> the sysfs api:
> dir: /sys/kernel/mm/ksm/
> 
> kernel_pages_allocated - information about how many kernel pages
> ksm have allocated, this pages are not swappable, and each page
> like that is used by ksm to share pages with identical content
> 
> pages_shared - how many pages were shared by ksm
> 
> run - set to 1 when you want ksm to run, 0 when no
> 
> max_kernel_pages - set the maximum amount of kernel pages
> to be allocated by ksm, set 0 for unlimited.
> 
> pages_to_scan - how many pages to scan before ksm will sleep
> 
> sleep - how much usecs ksm will sleep.
> 
> 3)Add sysfs paramater to control the maximum kernel pages to be by
> ksm.
> 
> 4)Add statistics about how much pages are really shared.
> 
> 
> One issue still to be discussed:
> There was a suggestion to use madvice(SHAREABLE) instead of using
> ioctls to register memory that need to be scanned by ksm.
> Such change is outside the area of ksm.c and would required adding
> new madvice api, and change some parts of the vm and the kernel
> code, so first thing to do, is realized if we really want this.
> 
> I dont know any other open issues.
> 
> Thanks.
> 
> This is from the first post:
> (The kvm part, togather with the kvm-userspace part, was post with V1
> before about a week, whoever want to test ksm may download the
> patch from lkml archive)
> 
> KSM is a linux driver that allows dynamicly sharing identical memory
> pages between one or more processes.
> 
> Unlike tradtional page sharing that is made at the allocation of the
> memory, ksm do it dynamicly after the memory was created.
> Memory is periodically scanned; identical pages are identified and
> merged.
> The sharing is unnoticeable by the process that use this memory.
> (the shared pages are marked as readonly, and in case of write
> do_wp_page() take care to create new copy of the page)
> 
> To find identical pages ksm use algorithm that is split into three
> primery levels:
> 
> 1) Ksm will start scan the memory and will calculate checksum for each
>page that is registred to be scanned.
>(In the first round of the scanning, ksm would only calculate
> this checksum for all the pages)
> 
> 2) Ksm will go again on the whole memory and will recalculate the
>checmsum of the pages, pages that are found to have the same
>checksum value, would be considered "pages that are most likely
>wont changed"
>Ksm will insert this pages into sorted by page content RB-tree that
>is called "unstable tree", the reason that this tree is called
>unstable is due to the fact that the page contents might changed
>while they are still inside the tree, and therefore the tree would
>become corrupted.
>Due to this problem ksm take two more steps in addition to the
>checksum calculation:
>a) Ksm will throw and recreate the entire unstable tree each round
>   of memory scanning - so if we have corruption, it will be fixed
>   when we will rebuild the tree.
>b) Ksm is using RB-tree, that its balancing is made by the node color
>   and not by the content, so even if the page get corrupted, it still
>   would take the same amount of time to search on it.
> 
> 3) In addition to the unstable tree, ksm hold another tree that is called
>"stable tree" - this tree is RB-tree that is sorted by the pages
>content and all its pages are write protected, and therefore it cant get
>corrupted.
>Each time ksm will find two identcial pages using the unstable tree,
>it will create new write-protected shared page, and this p

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Avi Kivity

Nick Piggin wrote:

On Sunday 05 April 2009 01:35:18 Izik Eidus wrote:

  

This driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type.
(For desktop work loads we have achived more than x2 memory overcommit
(more like x3))



Interesting that it is a desirable workload to have multiple guests each
running MS office.

I wonder, can windows enter a paravirtualised guest mode for KVM?


Windows has some support for paravirtualization, for example it can use 
hypercalls instead of tlb flush IPIs.



 And can
you detect page allocation/freeing events?
  


Not that I know of.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Nick Piggin
On Sunday 05 April 2009 01:35:18 Izik Eidus wrote:

> This driver is very useful for KVM as in cases of runing multiple guests
> operation system of the same type.
> (For desktop work loads we have achived more than x2 memory overcommit
> (more like x3))

Interesting that it is a desirable workload to have multiple guests each
running MS office.

I wonder, can windows enter a paravirtualised guest mode for KVM? And can
you detect page allocation/freeing events?

 
> This driver have found users other than KVM, for example CERN,
> Fons Rademakers:
> "on many-core machines we run one large detector simulation program per core.
> These simulation programs are identical but run each in their own process and
> need about 2 - 2.5 GB RAM.
> We typically buy machines with 2GB RAM per core and so have a problem to run
> one of these programs per core.
> Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
> maps, detector geometry, etc.
> Currently people have been trying to start one program, initialize the 
> geometry
> and field maps and then fork it N times, to have the data shared.
> With KSM this would be done automatically by the system so it sounded 
> extremely
> attractive when Andrea presented it."

They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc.
Presumably they will probably want to control it to interleave it over
all numa nodes and use hugepages for it. It would be very little work.

 
> I am sending another seires of patchs for kvm kernel and kvm-userspace
> that would allow users of kvm to test ksm with it.
> The kvm patchs would apply to Avi git tree.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Jesper Juhl
On Thu, 2 Apr 2009, Chris Wright wrote:

> * Jesper Juhl (j...@chaosbits.net) wrote:
> > Do you rely only on the checksum or do you actually compare pages to check 
> > they are 100% identical before sharing?
> 
> Checksum has absolutely nothing to do w/ finding if two pages match.
> It's only used as a heuristic to suggest whether a single page has
> changed.  If that page is changing we won't bother trying to find a
> match for it.  Here's an example of the life of a page w.r.t checksum.
> 
> 1. checksum = uninitialized
> 2. first time page is found, checksum it (checksum = A).
>if checksum has changed (uninitialize != A) don't go any further w/ that 
> page
> 3. next time page is found, checksum it (checksum = B).
>if checksum has change (A != B) don't go any further w/ that page
> 4. next time page is found, checksum it (checksum = B).
>if checksum has changed (B == B)...it hasn't, continue processing the
>page
> 
> later if a match is found in the tree (which is sorted by _contents_,
> i.e. memcmp) we'll attempt to merge the pages which at it's very core
> does:
> 
>   if (pages_identical(oldpage, newpage))
>   ret = replace_page(vma, oldpage, newpage, orig_pte, newprot);
> 
> pages_identical?  you guessed it...just does:
> 
>   r = memcmp(addr1, addr2, PAGE_SIZE)
> 

Thank you for that explanation, it set my mind at ease :-)


-- 
Jesper Juhl  http://www.chaosbits.net/
Plain text mails only, please  http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Chris Wright
* Jesper Juhl (j...@chaosbits.net) wrote:
> Do you rely only on the checksum or do you actually compare pages to check 
> they are 100% identical before sharing?

Checksum has absolutely nothing to do w/ finding if two pages match.
It's only used as a heuristic to suggest whether a single page has
changed.  If that page is changing we won't bother trying to find a
match for it.  Here's an example of the life of a page w.r.t checksum.

1. checksum = uninitialized
2. first time page is found, checksum it (checksum = A).
   if checksum has changed (uninitialize != A) don't go any further w/ that page
3. next time page is found, checksum it (checksum = B).
   if checksum has change (A != B) don't go any further w/ that page
4. next time page is found, checksum it (checksum = B).
   if checksum has changed (B == B)...it hasn't, continue processing the
   page

later if a match is found in the tree (which is sorted by _contents_,
i.e. memcmp) we'll attempt to merge the pages which at it's very core
does:

if (pages_identical(oldpage, newpage))
ret = replace_page(vma, oldpage, newpage, orig_pte, newprot);

pages_identical?  you guessed it...just does:

r = memcmp(addr1, addr2, PAGE_SIZE)

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Izik Eidus

Jesper Juhl wrote:

Hi,

On Tue, 31 Mar 2009, Izik Eidus wrote:

  

KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)




One question;

Calcolating a checksum is a fine way to find pages that are "likely to be 
identical"


I dont use checksum as with hash table, the checksum doesnt use to find 
identical pages by the way that they have similer data...
the checksum is used to let me know that the page was not changed for a 
while and it is worth checking for identical pages to it...
In the future we will want to use the page table dirty bit for it, as 
taking checksum is somewhat expensive


, but there is no guarantee that two pages with the same 
checksum really are identical - there *will* be checksum collisions 
eventually. So, I really hope that your implementation actually checks 
that two pages that it find that have identical checksums really are 100% 
identical by comparing them bit by bit before throwing one away.
  

We do that :-)

If you rely only on a checksum then eventually a user will get bitten by a 
checksum collision and, in the best case, something will crash, and in the 
worst case, data will silently be corrupted.


Do you rely only on the checksum or do you actually compare pages to check 
they are 100% identical before sharing?
  


I do 100% compare to the pages before i share them.

I must admit that I have not read through the patch to find the answer, I 
just read your description and became concerned.


  

Dont worry, me neither :-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-04-02 Thread Jesper Juhl
Hi,

On Tue, 31 Mar 2009, Izik Eidus wrote:

> KSM is a linux driver that allows dynamicly sharing identical memory
> pages between one or more processes.
> 
> Unlike tradtional page sharing that is made at the allocation of the
> memory, ksm do it dynamicly after the memory was created.
> Memory is periodically scanned; identical pages are identified and
> merged.
> The sharing is unnoticeable by the process that use this memory.
> (the shared pages are marked as readonly, and in case of write
> do_wp_page() take care to create new copy of the page)
> 
> To find identical pages ksm use algorithm that is split into three
> primery levels:
> 
> 1) Ksm will start scan the memory and will calculate checksum for each
>page that is registred to be scanned.
>(In the first round of the scanning, ksm would only calculate
> this checksum for all the pages)
> 

One question;

Calcolating a checksum is a fine way to find pages that are "likely to be 
identical", but there is no guarantee that two pages with the same 
checksum really are identical - there *will* be checksum collisions 
eventually. So, I really hope that your implementation actually checks 
that two pages that it find that have identical checksums really are 100% 
identical by comparing them bit by bit before throwing one away.
If you rely only on a checksum then eventually a user will get bitten by a 
checksum collision and, in the best case, something will crash, and in the 
worst case, data will silently be corrupted.

Do you rely only on the checksum or do you actually compare pages to check 
they are 100% identical before sharing?

I must admit that I have not read through the patch to find the answer, I 
just read your description and became concerned.

-- 
Jesper Juhl  http://www.chaosbits.net/
Plain text mails only, please  http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-31 Thread Izik Eidus

Anthony Liguori wrote:

Izik Eidus wrote:

I am sending another seires of patchs for kvm kernel and kvm-userspace
that would allow users of kvm to test ksm with it.
The kvm patchs would apply to Avi git tree.
  
Any reason to not take these through upstream QEMU instead of 
kvm-userspace?  In principle, I don't see anything that would prevent 
normal QEMU from almost making use of this functionality.  That would 
make it one less thing to eventually have to merge...


The changes for the kvm-userspace were just provided for testing it...
After we will have ksm inside the kernel we will send another patch to 
qemu-devel that will add support for it.




Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-30 Thread Anthony Liguori

Izik Eidus wrote:

I am sending another seires of patchs for kvm kernel and kvm-userspace
that would allow users of kvm to test ksm with it.
The kvm patchs would apply to Avi git tree.
  
Any reason to not take these through upstream QEMU instead of 
kvm-userspace?  In principle, I don't see anything that would prevent 
normal QEMU from almost making use of this functionality.  That would 
make it one less thing to eventually have to merge...


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-28 Thread Alan Cox
> You have implemented second one, but seems it already was patented
> http://www.google.com/patents?vid=USPAT6789156
> I'm not a lawyer but IMHO we have direct conflict here.
> >From other point of view they have patented the WEEL, but at least we
> have to know about this.

Its an old idea and appeared for Linux in March 1998: Little project from
Philipp Reisner called "mergemem".

http://groups.google.com/group/muc.lists.linux-kernel/browse_thread/thread/387af278089c7066?ie=utf-8&oe=utf-8&q=share+identical+pages#b3d4f68fb5dd4f88

so if there is a patent which is relevant (and thats a question for
lawyers and legal patent search people) perhaps the Linux Foundation and
some of the patent busters could take a look at mergemem and
re-examination.

Alan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-28 Thread Dmitri Monakhov
Izik Eidus <[EMAIL PROTECTED]> writes:

> (From v1 to v2 the main change is much more documentation)
>
> KSM is a linux driver that allows dynamicly sharing identical memory
> pages between one or more processes.
>
> Unlike tradtional page sharing that is made at the allocation of the
> memory, ksm do it dynamicly after the memory was created.
> Memory is periodically scanned; identical pages are identified and
> merged.
> The sharing is unnoticeable by the process that use this memory.
> (the shared pages are marked as readonly, and in case of write
> do_wp_page() take care to create new copy of the page)
>
> This driver is very useful for KVM as in cases of runing multiple guests
> operation system of the same type.
Hi Izik, approach that was used in the driver commonly known as
content based search. Where are several variants of it
most commons are:
1: with guest TM support
2: w/o guest vm support.
You have implemented second one, but seems it already was patented
http://www.google.com/patents?vid=USPAT6789156
I'm not a lawyer but IMHO we have direct conflict here.
>From other point of view they have patented the WEEL, but at least we
have to know about this.
> (For desktop work loads we have achived more than x2 memory overcommit
> (more like x3))
>
> This driver have found users other than KVM, for example CERN,
> Fons Rademakers:
> "on many-core machines we run one large detector simulation program per core.
> These simulation programs are identical but run each in their own process and
> need about 2 - 2.5 GB RAM.
> We typically buy machines with 2GB RAM per core and so have a problem to run
> one of these programs per core.
> Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field
> maps, detector geometry, etc.
> Currently people have been trying to start one program, initialize the 
> geometry
> and field maps and then fork it N times, to have the data shared.
> With KSM this would be done automatically by the system so it sounded 
> extremely
> attractive when Andrea presented it."
>
> (We have are already started to test KSM on their systems...)
>
> KSM can run as kernel thread or as userspace application or both
>
> example for how to control the kernel thread:
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include "ksm.h"
>
> int main(int argc, char *argv[])
> {
>   int fd;
>   int used = 0;
>   int fd_start;
>   struct ksm_kthread_info info;
>   
>
>   if (argc < 2) {
>   fprintf(stderr,
>   "usage: %s {start npages sleep | stop | info}\n",
>   argv[0]);
>   exit(1);
>   }
>
>   fd = open("/dev/ksm", O_RDWR | O_TRUNC, (mode_t)0600);
>   if (fd == -1) {
>   fprintf(stderr, "could not open /dev/ksm\n");
>   exit(1);
>   }
>
>   if (!strncmp(argv[1], "start", strlen(argv[1]))) {
>   used = 1;
>   if (argc < 4) {
>   fprintf(stderr,
>   "usage: %s start npages_to_scan max_pages_to_merge sleep\n",
>   argv[0]);
>   exit(1);
>   }
>   info.pages_to_scan = atoi(argv[2]);
>   info.max_pages_to_merge = atoi(argv[3]);
>   info.sleep = atoi(argv[4]);
>   info.flags = ksm_control_flags_run;
>
>   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, &info);
>   if (fd_start == -1) {
>   fprintf(stderr, "KSM_START_KTHREAD failed\n");
>   exit(1);
>   }
>   printf("created scanner\n");
>   }
>
>   if (!strncmp(argv[1], "stop", strlen(argv[1]))) {
>   used = 1;
>   info.flags = 0;
>   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, &info);
>   printf("stopped scanner\n");
>   }
>
>   if (!strncmp(argv[1], "info", strlen(argv[1]))) {
>   used = 1;
>   ioctl(fd, KSM_GET_INFO_KTHREAD, &info);
>printf("flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n",
>info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep);
>   }
>
>   if (!used)
>   fprintf(stderr, "unknown command %s\n", argv[1]);
>
>   return 0;
> }
>
> example of how to register qemu to ksm (or any userspace application)
>
> diff --git a/qemu/vl.c b/qemu/vl.c
> index 4721fdd..7785bf9 100644
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -21,6 +21,7 @@
>   * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>   * DEALINGS IN
>   * THE SOFTWARE.
>   */
> +#include "ksm.h"
>  #include "hw/hw.h"
>  #include "hw/boards.h"
>  #include "hw/usb.h"
> @@ -5799,6 +5800,37 @@ static void termsig_setup(void)
>  
>  #endif
>  
> +int ksm_register_memory(void)
> +{
> +int fd;
> +int ksm_fd;
> +int r = 1;
> +struct ksm_memory_region ksm_region;
> +
> +

Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Ryota OZAKI
2008/11/20 Izik Eidus <[EMAIL PROTECTED]>:
> ציטוט Izik Eidus:
>>
>> ציטוט Ryota OZAKI:
>>>
>>> Hi Izik,
>>>
>>> I've tried your patch set, but ksm doesn't work in my machine.
>>>
>>> I compiled linux patched with the four patches and configured with KSM
>>> and KVM enabled. After boot with the linux, I run two VMs running linux
>>> using QEMU with a patch in your mail and started KSM scanner with your
>>> script, then the host linux caused panic with the following oops.
>>>
>>
>> Yes you are right, we are missing pte_unmap(pte); in get_pte()!
>> that will effect just 32bits with highmem so this why you see it
>> thanks for the reporting, i will fix it for v3
>>
>> below patch should fix it (i cant test it now, will test it for v3)
>>
>> can you report if it fix your problem? thanks
>>
> Thinking about what i just did, it is wrong,
> this patch is the right one (still wasnt tested), but if you are going to
> apply something then use this one.

Great! Applied the 2nd patch, ksm works with both HIGHMEM enabled and disabled.

Thanks for your quick response,
  ozaki-r

>
> thanks
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 707be52..c842c29 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -569,14 +569,16 @@ out:
>  static int is_present_pte(struct mm_struct *mm, unsigned long addr)
>  {
>pte_t *ptep;
> +   int r;
>
>ptep = get_pte(mm, addr);
>if (!ptep)
>return 0;
>
> -   if (pte_present(*ptep))
> -   return 1;
> -   return 0;
> +   r = pte_present(*ptep);
> +   pte_unmap(ptep);
> +
> +   return r;
>  }
>
>  #define PAGEHASH_LEN 128
> @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm,
>if (!orig_ptep)
>goto out_unlock;
>orig_pte = *orig_ptep;
> +   pte_unmap(orig_ptep);
>if (!pte_present(orig_pte))
>goto out_unlock;
>if (page_to_pfn(oldpage) != pte_pfn(orig_pte))
>
>


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Izik Eidus

ציטוט Izik Eidus:

ציטוט Ryota OZAKI:

Hi Izik,

I've tried your patch set, but ksm doesn't work in my machine.

I compiled linux patched with the four patches and configured with KSM
and KVM enabled. After boot with the linux, I run two VMs running linux
using QEMU with a patch in your mail and started KSM scanner with your
script, then the host linux caused panic with the following oops.
  


Yes you are right, we are missing pte_unmap(pte); in get_pte()!
that will effect just 32bits with highmem so this why you see it
thanks for the reporting, i will fix it for v3

below patch should fix it (i cant test it now, will test it for v3)

can you report if it fix your problem? thanks


Thinking about what i just did, it is wrong,
this patch is the right one (still wasnt tested), but if you are going 
to apply something then use this one.


thanks
diff --git a/mm/ksm.c b/mm/ksm.c
index 707be52..c842c29 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -569,14 +569,16 @@ out:
 static int is_present_pte(struct mm_struct *mm, unsigned long addr)
 {
pte_t *ptep;
+   int r;
 
ptep = get_pte(mm, addr);
if (!ptep)
return 0;
 
-   if (pte_present(*ptep))
-   return 1;
-   return 0;
+   r = pte_present(*ptep);
+   pte_unmap(ptep);
+
+   return r;
 }
 
 #define PAGEHASH_LEN 128
@@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm,
if (!orig_ptep)
goto out_unlock;
orig_pte = *orig_ptep;
+   pte_unmap(orig_ptep);
if (!pte_present(orig_pte))
goto out_unlock;
if (page_to_pfn(oldpage) != pte_pfn(orig_pte))


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2008-11-20 Thread Izik Eidus

ציטוט Ryota OZAKI:

Hi Izik,

I've tried your patch set, but ksm doesn't work in my machine.

I compiled linux patched with the four patches and configured with KSM
and KVM enabled. After boot with the linux, I run two VMs running linux
using QEMU with a patch in your mail and started KSM scanner with your
script, then the host linux caused panic with the following oops.
  


Yes you are right, we are missing pte_unmap(pte); in get_pte()!
that will effect just 32bits with highmem so this why you see it
thanks for the reporting, i will fix it for v3

below patch should fix it (i cant test it now, will test it for v3)

can you report if it fix your problem? thanks


== BEGINNING of OOPS
kernel BUG at arch/x86/mm/highmem_32.c:87!
invalid opcode:  [#1] SMP
last sysfs file: /sys/class/net/vnet-ssh2/address
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in: netconsole autofs4 nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables
x_tables loop kvm_intel kvm iTCO_wdt iTCO_vendor_support igb
netxen_nic button ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore
[last unloaded: microcode]

Pid: 343, comm: kksmd Not tainted
(2.6.28-rc5-linus-head-20081119-sparsemem #1) X7DWA
EIP: 0060:[] EFLAGS: 00010206 CPU: 6
EIP is at kmap_atomic_prot+0x7d/0xeb
EAX: c0008d94 EBX: c1ff6240 ECX: 0163 EDX: 7e00
ESI: 0154 EDI: 0055 EBP: f5cdbf10 ESP: f5cdbef8
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process kksmd (pid: 343, ti=f5cda000 task=f617b140 task.ti=f5cda000)
Stack:
 7fa12163 f000 c204efbc f50479e8 9eb7e000 c08a34d0 f5cdbf18 c041f07a
 f5cdbf28 c048339c  f5c271e0 f5cdbf30 c04833bc f5cdbfb0 c0483b0d
 f5cdbf50 c0425845  0064 0009 c08a34d0 f5cdbfb0 c06384c1
Call Trace:
 [] ? kmap_atomic+0x13/0x15
 [] ? get_pte+0x50/0x63
 [] ? is_present_pte+0xd/0x1f
 [] ? ksm_scan_start+0x9a/0x7ac
 [] ? finish_task_switch+0x29/0xa4
 [] ? schedule+0x6bf/0x719
 [] ? default_spin_lock_flags+0x8/0xc
 [] ? finish_wait+0x49/0x4e
 [] ? kthread_ksm_scan_thread+0x0/0xdc
 [] ? kthread_ksm_scan_thread+0x3a/0xdc
 [] ? autoremove_wake_function+0x0/0x38
 [] ? kthread+0x40/0x66
 [] ? kthread+0x0/0x66
 [] ? kernel_thread_helper+0x7/0x10
Code: 86 00 00 00 64 a1 04 a0 82 c0 6b c0 0d 8d 3c 30 a1 78 b0 77 c0
8d 34 bd 00 00 00 00 89 45 ec a1 0c d0 84 c0 29 f0 83 38 00 74 04 <0f>
0b eb fe c1 ea 1a 8b 04 d5 80 32 8a c0 83 e0 fc 29 c3 c1 fb
EIP: [] kmap_atomic_prot+0x7d/0xeb SS:ESP 0068:f5cdbef8
Kernel panic - not syncing: Fatal exception
== END of OOPS
  


diff --git a/mm/ksm.c b/mm/ksm.c
index 707be52..e14448a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -562,6 +562,7 @@ static pte_t *get_pte(struct mm_struct *mm, unsigned long 
addr)
goto out;
 
ptep = pte_offset_map(pmd, addr);
+   pte_unmap(ptep);
 out:
return ptep;
 }


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Izik Eidus wrote:

Andrew Morton wrote:

On Tue, 11 Nov 2008 21:18:23 +0200
Izik Eidus <[EMAIL PROTECTED]> wrote:

 

hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  (And how much would we gain 
from

it?)

[I'm looking for reasons why this is more than a 
muck-up-the-vm-for-kvm

thing here ;) ]
  


^^ this?

 
KSM is separate driver , it doesn't change anything in the VM but 
adding two helper functions.



What, you mean I should actually read the code?   Oh well, OK.
  

Andrea i think what is happening here is my fault

Sorry, meant to write here Andrew :-)

i will try to give here much more information about KSM:
first the bad things:
KSM shared pages are right now (we have patch that can change it but 
we want to wait with it) unswappable
this mean that the entire memory of the guest is swappable but the 
pages that are shared are not.
(when the pages are splited back by COW they become anonymous again 
with the help of do_wp_page()
the reason that the pages are not swappable is beacuse the way the 
Linux Rmap is working, this not allow us to create nonlinear anonymous 
pages
(we dont want to use nonlinear vma for kvm, as it will make swapping 
for kvm very slow)
the reason that ksm pages need to have nonlinear reverse mapping is 
that for one guest identical page can be found in whole diffrent 
offset than other guest have it

(this is from the userspace VM point of view)

the rest is quite simple:
it is walking over the entire guest memory (or only some of it) and 
scan for identical pages using hash table

it merge the pages into one single write protected page

numbers for ksm is something that i have just for desktops and just 
the numbers i gave you

what is do know is:
big overcommit like 300% is possible just when you take into account 
that some of the guest memory will be free
we are sharing mostly the DLLs/ KERNEL / ZERO pages, for the DLLS and 
KERNEL PAGEs this pages likely will never break
but ZERO pages will be break when windows will allocate them and will 
come back when windows will free the memory.
(i wouldnt suggest 300% overcommit for servers workload, beacuse you 
can end up swapping in that case,
but for desktops after runing in production and passed some seiroes qa 
tress tests it seems like 300% is a real number that can be use)


i just ran test on two fedora 8 guests and got that results (using 
GNOME in both of them)
9959 root  15   0  730m 537m 281m S8  3.4   0:44.28 
kvm


9956 root  15   0  730m 537m 246m S4  3.4   0:41.43 kvm
as you can see the physical sharing was 281mb and 246mb (kernel pages 
are counted as shared)
there is small lie in this numbers beacuse pages that was shared 
across two guests and was splited by writing from guest number 1 will 
still have 1 refernce count to it
and will still be kernel page (untill the other guest (num 2) will 
write to it as well)



anyway i am willing to make much better testing or everything that 
needed for this patchs to be merged.

(just tell me what and i will do it)

beside that you should know that patch 4 is not a must, it is just 
nice optimization...


thanks.

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrea Arcangeli
Hi Andrew,

thanks for looking into this.

On Tue, Nov 11, 2008 at 11:11:10AM -0800, Andrew Morton wrote:
> What userspace-only changes could fix this?  Identify the common data,
> write it to a flat file and mmap it, something like that?

The whole idea is to do something that works transparently and isn't
tailored for kvm. The mmu notifier change_pte method can be dropped as
well if you want (I recommended not to have it in the first submission
but Izik preferred to keep it because it will optimize away a kvm
shadow pte minor fault the first time kvm access the page after
sharing it). The page_wrprotect and replace_page can also be embedded
in ksm.

So the idea is that while we could do something specific to ksm that
keeps most of the code in userland, it'd be more tricky as it'd
require some communication with the core VM anyway (we can't just do
it in userland with mprotect, memcpy, mmap(MAP_PRIVATE) as it wouldn't
be atomic and second it'd be inefficient in terms of vma-buildup for
the same reason nonlinear-vmas exist), but most important: it wouldn't
work for all other regular process. With KSM we can share anonymous
memory for the whole system, KVM is just a random user.

This implementation is on the simple side because it can't
swap. Swapping and perhaps the limitation of sharing anonymous memory
is the only trouble here but those will be addressed in the
future. ksm is a new device driver so it's like /dev/mem, so no
swapping isn't a blocker here.

By sharing anon pages, in short we're making anonymous vmas nonlinear,
and this isn't supported by the current rmap code. So swapping can't
work unless we mark those anon-vmas nonlinear and we either build the
equivalent of the old pte_chains on demand just for those nonlinear
shared pages, or we do a full scan of all ptes in the nonlinear
anon-vmas. An external rmap infrastructure can allow ksm to build
whatever it wants inside ksm.c to track the nonlinear anon-pages
inside a regular anon-vma and rmap.c can invoke those methods to find
the ptes for those nonlinear pages. The core VM won't get more complex
and ksm can decide if to do a full nonlinear scan of the vma, or to
build the equivalent of pte_chains. This again has to be added later
and once everybody sees ksm, it'll be easier to agree on a
external-rmap API to allow it to swap. While the pte_chains are very
inefficent to reverse the regular anonymous mappings, they're
efficient solution as an exception for the shared KSM pages that gets
scattered over the linear anon-vmas.

It's a bit like the initial kvm that was merged despite it couldn't
swap. Then we added mmu notifiers, and now kvm can swap. So we add ksm
now without swapping and later we build an external-rmap to allow ksm
to swap after we agree ksm is useful and people starts using it.

> There has been the occasional discussion about idenfifying all-zeroes
> pages and scavenging them, repointing them at the zero page.  Could
> this infrastructure be used for that?  (And how much would we gain from
> it?)

Zero pages makes a lot of difference for windows, but they're totally
useless for linux. With current ksm all guest pagecache is 100% shared
across hosts, so when you start an app the .text runs on the same
physical memory on both guests. Works fine and code is quite simple in
this round. Once we add swapping it'll be a bit more complex in VM
terms as it'll have to handle nonlinear anon-vmas.

If we ever decide to share MAP_SHARED pagecache it'll be even more
complicated than just adding the external-rmap... I think this can be
done incrementally if needed at all. OpenVZ if the install is smart
enough could share the pagecache by just hardlinking the equal
binaries.. but AFIK they don't do that normally. For the anon ram they
need this too, they can't solve equal anon ram in userland as it has
to be handled atomically at runtime.

The folks at CERN LHC (was visiting it last month) badly need KSM too
for certain apps they're running that are allocating huge arrays
(aligned) in anon memory and they're mostly equal for all
processes. They tried to work around it with fork but it's not working
well, KSM would solve their problem (it'd solve it both on the same OS
and across OS with kvm as virtualization engine running on the same host).

So I think this is good stuff, and I'd focus discussions and reviews
on the KSM API of /dev/ksm that if merged will be longstanding and
much more troublesome than the rest of the code if changed later (if
we change the ksm internals at any time nobody will notice), and
post-merging we can focus on the external-rmap to make KSM pages first
class citizens in VM terms. But then anything can be changed here, so
suggestions welcome!

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Andrew Morton wrote:

On Tue, 11 Nov 2008 21:18:23 +0200
Izik Eidus <[EMAIL PROTECTED]> wrote:

  

hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  (And how much would we gain from
it?)

[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
  


^^ this?

  
KSM is separate driver , it doesn't change anything in the VM but adding 
two helper functions.



What, you mean I should actually read the code?   Oh well, OK.
  

Andrea i think what is happening here is my fault
i will try to give here much more information about KSM:
first the bad things:
KSM shared pages are right now (we have patch that can change it but we 
want to wait with it) unswappable
this mean that the entire memory of the guest is swappable but the pages 
that are shared are not.
(when the pages are splited back by COW they become anonymous again with 
the help of do_wp_page()
the reason that the pages are not swappable is beacuse the way the Linux 
Rmap is working, this not allow us to create nonlinear anonymous pages
(we dont want to use nonlinear vma for kvm, as it will make swapping for 
kvm very slow)
the reason that ksm pages need to have nonlinear reverse mapping is that 
for one guest identical page can be found in whole diffrent offset than 
other guest have it

(this is from the userspace VM point of view)

the rest is quite simple:
it is walking over the entire guest memory (or only some of it) and scan 
for identical pages using hash table

it merge the pages into one single write protected page

numbers for ksm is something that i have just for desktops and just the 
numbers i gave you

what is do know is:
big overcommit like 300% is possible just when you take into account 
that some of the guest memory will be free
we are sharing mostly the DLLs/ KERNEL / ZERO pages, for the DLLS and 
KERNEL PAGEs this pages likely will never break
but ZERO pages will be break when windows will allocate them and will 
come back when windows will free the memory.
(i wouldnt suggest 300% overcommit for servers workload, beacuse you can 
end up swapping in that case,
but for desktops after runing in production and passed some seiroes qa 
tress tests it seems like 300% is a real number that can be use)


i just ran test on two fedora 8 guests and got that results (using GNOME 
in both of them)
9959 root  15   0  730m 537m 281m S8  3.4   0:44.28 
kvm


9956 root  15   0  730m 537m 246m S4  3.4   0:41.43 kvm
as you can see the physical sharing was 281mb and 246mb (kernel pages 
are counted as shared)
there is small lie in this numbers beacuse pages that was shared across 
two guests and was splited by writing from guest number 1 will still 
have 1 refernce count to it
and will still be kernel page (untill the other guest (num 2) will write 
to it as well)



anyway i am willing to make much better testing or everything that 
needed for this patchs to be merged.

(just tell me what and i will do it)

beside that you should know that patch 4 is not a must, it is just nice 
optimization...


thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 21:18:23 +0200
Izik Eidus <[EMAIL PROTECTED]> wrote:

> > hm.
> >
> > There has been the occasional discussion about idenfifying all-zeroes
> > pages and scavenging them, repointing them at the zero page.  Could
> > this infrastructure be used for that?  (And how much would we gain from
> > it?)
> >
> > [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
> > thing here ;) ]

^^ this?

> KSM is separate driver , it doesn't change anything in the VM but adding 
> two helper functions.

What, you mean I should actually read the code?   Oh well, OK.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Avi Kivity

Andrew Morton wrote:
For kvm, the kernel never knew those pages were shared.  They are loaded 
from independent (possibly compressed and encrypted) disk images.  These 
images are different; but some pages happen to be the same because they 
came from the same installation media.



What userspace-only changes could fix this?  Identify the common data,
write it to a flat file and mmap it, something like that?

  


This was considered.  You can't scan the image, because it may be 
encrypted/compressed/offset (typical images _are_ offset because the 
first partition starts at sector 63...).  The data may come from the 
network and not a disk image.  You can't scan in userspace because the 
images belong to different users and contain sensitive data.  Pages may 
come from several images (multiple disk images per guest) so you end up 
with one vma per page.


So you have to scan memory, after the guest has retrieved it from 
disk/network/manufactured it somehow, decompressed and encrypted it, 
written it to the offset it wants.  You can't scan from userspace since 
it's sensitive data, and of course the actual merging need to be done 
atomically, which can only be done from the holy of holies, the vm.


For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).



hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  


Yes, trivially.  ksm may be an overkill for this, though.


(And how much would we gain from
it?)
  


A lot of zeros.


[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
  


I sympathize -- us too.  Consider the typical multiuser gnome 
minicomputer with all 150 users reading lwn.net at the same time instead 
of working.  You could share the firefox rendered page cache, reducing 
memory utilization drastically.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 21:07:10 +0200
Izik Eidus <[EMAIL PROTECTED]> wrote:

> we have used KSM in production for about half year and the numbers that 
> came from our QA is:
> using KSM for desktop (KSM was tested just for windows desktop workload) 
> you can run as many as
> 52 windows xp with 1 giga ram each on server with just 16giga ram. (this 
> is more than 300% overcommit)
> the reason is that most of the kernel/dlls of this guests is shared and 
> in addition we are sharing the windows zero
> (windows keep making all its free memory as zero, so every time windows 
> release memory we take the page back to the host)
> there is slide that give this numbers you can find at:
> http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_3.pdf
>  
> (slide 27)
> beside more i gave presentation about ksm that can be found at:
> http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_12.pdf

OK, 300% isn't chicken feed.

It is quite important that information such as this be prepared, added to
the patch changelogs and maintained.  For a start, without this basic
information, there is no reason for anyone to look at any of the code!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Andrew Morton wrote:

On Tue, 11 Nov 2008 20:48:16 +0200
Avi Kivity <[EMAIL PROTECTED]> wrote:

  

Andrew Morton wrote:


The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right

via the usual sharing, refcounting and COWing approaches.
  
  
For kvm, the kernel never knew those pages were shared.  They are loaded 
from independent (possibly compressed and encrypted) disk images.  These 
images are different; but some pages happen to be the same because they 
came from the same installation media.



What userspace-only changes could fix this?  Identify the common data,
write it to a flat file and mmap it, something like that?

  
For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).



hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  (And how much would we gain from
it?)

[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
KSM is separate driver , it doesn't change anything in the VM but adding 
two helper functions.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 20:48:16 +0200
Avi Kivity <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > The whole approach seems wrong to me.  The kernel lost track of these
> > pages and then we run around post-facto trying to fix that up again. 
> > Please explain (for the changelog) why the kernel cannot get this right
> > via the usual sharing, refcounting and COWing approaches.
> >   
> 
> For kvm, the kernel never knew those pages were shared.  They are loaded 
> from independent (possibly compressed and encrypted) disk images.  These 
> images are different; but some pages happen to be the same because they 
> came from the same installation media.

What userspace-only changes could fix this?  Identify the common data,
write it to a flat file and mmap it, something like that?

> For OpenVZ the situation is less clear, but if you allow users to 
> independently upgrade their chroots you will eventually arrive at the 
> same scenario (unless of course you apply the same merging strategy at 
> the filesystem level).

hm.

There has been the occasional discussion about idenfifying all-zeroes
pages and scavenging them, repointing them at the zero page.  Could
this infrastructure be used for that?  (And how much would we gain from
it?)

[I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm
thing here ;) ]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Avi Kivity wrote:

Andrew Morton wrote:

The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right

via the usual sharing, refcounting and COWing approaches.
  


For kvm, the kernel never knew those pages were shared.  They are 
loaded from independent (possibly compressed and encrypted) disk 
images.  These images are different; but some pages happen to be the 
same because they came from the same installation media.


As Avi said, in kvm we cannot know how the guest is going to map its 
pages, we have nothing to do but to scan for the identical pages
(you can have pages that are shared that are in whole different offset 
inside the guest)




For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Izik Eidus

Andrew Morton wrote:

On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote:

  

KSM is a linux driver that allows dynamicly sharing identical memory pages
between one or more processes.

unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and merged.
the sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

this driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type, many pages are sharable.
this driver can be useful by OpenVZ as well.



These benefits should be quantified, please.  Also any benefits to any
other workloads should be identified and quantified.
  

Sure,
we have used KSM in production for about half year and the numbers that 
came from our QA is:
using KSM for desktop (KSM was tested just for windows desktop workload) 
you can run as many as
52 windows xp with 1 giga ram each on server with just 16giga ram. (this 
is more than 300% overcommit)
the reason is that most of the kernel/dlls of this guests is shared and 
in addition we are sharing the windows zero
(windows keep making all its free memory as zero, so every time windows 
release memory we take the page back to the host)

there is slide that give this numbers you can find at:
http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_3.pdf 
(slide 27)

beside more i gave presentation about ksm that can be found at:
http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_12.pdf

if more numbers are wanted for other workloads i can test it.
(the idea of ksm is to run it slowly slowy at low priority and let it 
merge pages when no one need the cpu)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Avi Kivity

Andrew Morton wrote:

The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right

via the usual sharing, refcounting and COWing approaches.
  


For kvm, the kernel never knew those pages were shared.  They are loaded 
from independent (possibly compressed and encrypted) disk images.  These 
images are different; but some pages happen to be the same because they 
came from the same installation media.


For OpenVZ the situation is less clear, but if you allow users to 
independently upgrade their chroots you will eventually arrive at the 
same scenario (unless of course you apply the same merging strategy at 
the filesystem level).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote:

> KSM is a linux driver that allows dynamicly sharing identical memory pages
> between one or more processes.
> 
> unlike tradtional page sharing that is made at the allocation of the
> memory, ksm do it dynamicly after the memory was created.
> Memory is periodically scanned; identical pages are identified and merged.
> the sharing is unnoticeable by the process that use this memory.
> (the shared pages are marked as readonly, and in case of write
> do_wp_page() take care to create new copy of the page)
> 
> this driver is very useful for KVM as in cases of runing multiple guests
> operation system of the same type, many pages are sharable.
> this driver can be useful by OpenVZ as well.

These benefits should be quantified, please.  Also any benefits to any
other workloads should be identified and quantified.

The whole approach seems wrong to me.  The kernel lost track of these
pages and then we run around post-facto trying to fix that up again. 
Please explain (for the changelog) why the kernel cannot get this right
via the usual sharing, refcounting and COWing approaches.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html