Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
On Friday 17 April 2009 17:08:07 Jared Hulbert wrote: > > As everyone knows, my favourite thing is to say nasty things about any > > new feature that adds complexity to common code. I feel like crying to > > hear about how many more instances of MS Office we can all run, if only > > we apply this patch. And the poorly written HPC app just sounds like > > scrapings from the bottom of justification barrel. > > > > I'm sorry, maybe I'm way off with my understanding of how important > > this is. There isn't too much help in the changelog. A discussion of > > where the memory savings comes from, and how far does things like > > sharing of fs image, or ballooning goes and how much extra savings we > > get from this... with people from other hypervisors involved as well. > > Have I missed this kind of discussion? > > Nick, > > I don't know about other hypervisors, fs and balloonings, but I have > tried this out. It works. It works on apps I don't consider, "poorly > written". I'm very excited about this. I got >10% saving in a > roughly off the shelf embedded system. No user noticeable performance > impact. OK well that's what I want to hear. Thanks, that means a lot to me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
> As everyone knows, my favourite thing is to say nasty things about any > new feature that adds complexity to common code. I feel like crying to > hear about how many more instances of MS Office we can all run, if only > we apply this patch. And the poorly written HPC app just sounds like > scrapings from the bottom of justification barrel. > > I'm sorry, maybe I'm way off with my understanding of how important > this is. There isn't too much help in the changelog. A discussion of > where the memory savings comes from, and how far does things like > sharing of fs image, or ballooning goes and how much extra savings we > get from this... with people from other hypervisors involved as well. > Have I missed this kind of discussion? Nick, I don't know about other hypervisors, fs and balloonings, but I have tried this out. It works. It works on apps I don't consider, "poorly written". I'm very excited about this. I got >10% saving in a roughly off the shelf embedded system. No user noticeable performance impact. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
Nick Piggin wrote: On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote: On Thu, 9 Apr 2009 06:58:37 +0300 Izik Eidus wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Generally looks OK to me. But that doesn't mean much. We should rub bottles with words like "hugh" and "nick" on them to be sure. I haven't looked too closely at it yet sorry. Hugh has a great eye for these details, though, hint hint :) As everyone knows, my favourite thing is to say nasty things about any new feature that adds complexity to common code. The whole idea and the way i wrote it so it wont touch common code, i didnt change the linux mm logic no where. The worst thing that we have add is helper functions. I feel like crying to hear about how many more instances of MS Office we can all run, if only we apply this patch. And more instances of linux guests... And the poorly written HPC app just sounds like scrapings from the bottom of justification barrel. So if you have a big rendering application that load gigas of geometrical data that is handled by many threads and you have a case that each thread sometimes change this geometrical data and you dont want the other threads will notice it. How would you share it in traditional way?, after one time shared data will get cowed, how will you recollect it again when it become identical? KSM do it for applications transparently KSM writing motivation indeed was KVM where there it is highly needed you may check what VMware say about the fact that they have much better overcommit than Hyper-V / XEN: http://blogs.vmware.com/virtualreality/2008/03/cheap-hyperviso.html It is important to understand that in virtualization enviorments there are cases where memory is much more critical than any other resource for higher density. Together with KSM, KVM will have the same memory overcommit abilitys such as VMware have. I'm sorry, maybe I'm way off with my understanding of how important this is. There isn't too much help in the changelog. A discussion of where the memory savings comes from, Memory saving come from identical librarys, identical kernels, zeroed pages -> that is for virtualization. The Librarys code will always be identical among similar guests, so why have this code at multiple places on the host memory? and how far does things like sharing of fs image, or ballooning goes and how much extra savings we get from this... Ballooning is much worse when it come to performance, beacuse what it does is shrink the guest memory, with KSM we find identical pages and merge them into one page, so we dont get guest performance lose with people from other hypervisors involved as well. Have I missed this kind of discussion? Careful what you wish for, ay? :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
On Wednesday 15 April 2009 08:09:03 Andrew Morton wrote: > On Thu, 9 Apr 2009 06:58:37 +0300 > Izik Eidus wrote: > > > KSM is a linux driver that allows dynamicly sharing identical memory > > pages between one or more processes. > > Generally looks OK to me. But that doesn't mean much. We should rub > bottles with words like "hugh" and "nick" on them to be sure. I haven't looked too closely at it yet sorry. Hugh has a great eye for these details, though, hint hint :) As everyone knows, my favourite thing is to say nasty things about any new feature that adds complexity to common code. I feel like crying to hear about how many more instances of MS Office we can all run, if only we apply this patch. And the poorly written HPC app just sounds like scrapings from the bottom of justification barrel. I'm sorry, maybe I'm way off with my understanding of how important this is. There isn't too much help in the changelog. A discussion of where the memory savings comes from, and how far does things like sharing of fs image, or ballooning goes and how much extra savings we get from this... with people from other hypervisors involved as well. Have I missed this kind of discussion? Careful what you wish for, ay? :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v3
On Thu, 9 Apr 2009 06:58:37 +0300 Izik Eidus wrote: > KSM is a linux driver that allows dynamicly sharing identical memory > pages between one or more processes. Generally looks OK to me. But that doesn't mean much. We should rub bottles with words like "hugh" and "nick" on them to be sure. > > ... > > include/linux/ksm.h | 48 ++ > include/linux/miscdevice.h |1 + > include/linux/mm.h |5 + > include/linux/mmu_notifier.h | 34 + > include/linux/rmap.h | 11 + > mm/Kconfig |6 + > mm/Makefile |1 + > mm/ksm.c | 1674 > ++ > mm/memory.c | 90 +++- > mm/mmu_notifier.c| 20 + > mm/rmap.c| 139 And it's pretty unobtrusive for what it is. I expect we can get this into 2.6.31 unless there are some pratfalls which I missed. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote: > From v1 to v2: > > 1)Fixed security issue found by Chris Wright: > Ksm was checking if page is a shared page by running !PageAnon. > Beacuse that Ksm scan only anonymous memory, all !PageAnons > inside ksm data strctures are shared page, however there might > be a case for do_wp_page() when the VM_SHARED is used where > do_wp_page() would instead of copying the page into new anonymos > page, would reuse the page, it was fixed by adding check for the > dirty_bit of the virtual addresses pointing into the shared page. > I was not finding any VM code tha would clear the dirty bit from > this virtual address (due to the fact that we allocate the page > using page_alloc() - kernel allocated pages), ~but i still want > confirmation about this from the vm guys - thanks.~ As far as I can tell this wasn't a bug and this change is unnecessary. I already checked this bit but I may have missed something, so I ask here to be sure. As far as I can tell when VM_SHARED is set, no anonymous page can ever be allocated by in that vma range, hence no KSM page can ever be generated in that vma either. MAP_SHARED|MAP_ANONYMOUS is only a different API for /dev/shm, IPCSHM backing, no anonymous pages can live there. It surely worked like that in older 2.6, reading latest code it seems to still work like that, but if something has changed Hugh will surely correct me in a jiffy ;). I still see this in the file=null path. } else if (vm_flags & VM_SHARED) { error = shmem_zero_setup(vma); if (error) goto free_vma; } So you can revert your change for now. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote: > They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc. > Presumably they will probably want to control it to interleave it over > all numa nodes and use hugepages for it. It would be very little work. I thought it's the intermediate result of the computations that leads to lots of equal data too, in which case ksm is the only way to share it all. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Nikola Ciprich wrote: Hi Izik, Is there some user documentation available? (apart from RTFS?:)) I've compiled kernel with v2 of Your patches, loaded ksm module, did echo 1 > /proc/sys/kernel/mm/ksm/run, but I think it didn't do anything, at least no pages were collected.. Could You advise me a bit? thanks a lot in advance... I can't wait to try it on our hosts runing 50-60 KVMs :) BR nik You need the userspace / kvm patchs that i posted together with V1 about 1-2 weeks ago... What you should do is this: Patch Linus kernel git with the ksm patchs (V2) (like you just did) This patchs can be found at: http://lkml.org/lkml/2009/4/4/77 Then patch Avi kernel git with the kvm patchs that were sent togather with V1 Patchs can be found at: http://lkml.org/lkml/2009/3/30/534 and then Avi git userspace with this patchs: http://lkml.org/lkml/2009/3/30/538 Now, after you finish patching the kernel, load the kvm modules from avi git, and then using patched userspace you can start using ksm: set up the speed: (just number, you can change them to make it take less or more cpu) echo 400 > /sys/kernel/mm/ksm/pages_to_scan echo 1 > /sys/kernel/mm/ksm/sleep echo 1 > /sys/kernel/mm/ksm/run Dont raise all the VMS at once, beacuse then KSM wont be able to catch with the memory allocation... Raise few VMS, see that their memory get shared and your host free memory grow, then raise more VMS and so on... Enjoy. (You can check pages_shared for the number of pages that have been shared, you can run top as well) On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote: From v1 to v2: 1)Fixed security issue found by Chris Wright: Ksm was checking if page is a shared page by running !PageAnon. Beacuse that Ksm scan only anonymous memory, all !PageAnons inside ksm data strctures are shared page, however there might be a case for do_wp_page() when the VM_SHARED is used where do_wp_page() would instead of copying the page into new anonymos page, would reuse the page, it was fixed by adding check for the dirty_bit of the virtual addresses pointing into the shared page. I was not finding any VM code tha would clear the dirty bit from this virtual address (due to the fact that we allocate the page using page_alloc() - kernel allocated pages), ~but i still want confirmation about this from the vm guys - thanks.~ 2)Moved to sysfs to control ksm: It was requested as a better way to control the ksm scanning thread than ioctls. the sysfs api: dir: /sys/kernel/mm/ksm/ kernel_pages_allocated - information about how many kernel pages ksm have allocated, this pages are not swappable, and each page like that is used by ksm to share pages with identical content pages_shared - how many pages were shared by ksm run - set to 1 when you want ksm to run, 0 when no max_kernel_pages - set the maximum amount of kernel pages to be allocated by ksm, set 0 for unlimited. pages_to_scan - how many pages to scan before ksm will sleep sleep - how much usecs ksm will sleep. 3)Add sysfs paramater to control the maximum kernel pages to be by ksm. 4)Add statistics about how much pages are really shared. One issue still to be discussed: There was a suggestion to use madvice(SHAREABLE) instead of using ioctls to register memory that need to be scanned by ksm. Such change is outside the area of ksm.c and would required adding new madvice api, and change some parts of the vm and the kernel code, so first thing to do, is realized if we really want this. I dont know any other open issues. Thanks. This is from the first post: (The kvm part, togather with the kvm-userspace part, was post with V1 before about a week, whoever want to test ksm may download the patch from lkml archive) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) 2) Ksm will go again on the whole memory and will recalculate the checmsum of the pages, pages that are found to have the same checksum value, would be considered "pages that are most likely wont changed" Ksm will insert this pages into sorted by page content RB-tree that is called "unstable tree", the reason that this tree is called unstable is due to the
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Nick Piggin wrote: On Sunday 05 April 2009 01:35:18 Izik Eidus wrote: This driver is very useful for KVM as in cases of runing multiple guests operation system of the same type. (For desktop work loads we have achived more than x2 memory overcommit (more like x3)) Interesting that it is a desirable workload to have multiple guests each running MS office. This numbers are took from such workload, it is some kind of weird script that keep opening Word / Excel and write there like a user... I think in addition it open internet explorer and enter to random sites... I can search for the script if wanted... I wonder, can windows enter a paravirtualised guest mode for KVM? And can you detect page allocation/freeing events? I Dont know. This driver have found users other than KVM, for example CERN, Fons Rademakers: "on many-core machines we run one large detector simulation program per core. These simulation programs are identical but run each in their own process and need about 2 - 2.5 GB RAM. We typically buy machines with 2GB RAM per core and so have a problem to run one of these programs per core. Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field maps, detector geometry, etc. Currently people have been trying to start one program, initialize the geometry and field maps and then fork it N times, to have the data shared. With KSM this would be done automatically by the system so it sounded extremely attractive when Andrea presented it." They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc. Presumably they will probably want to control it to interleave it over all numa nodes and use hugepages for it. It would be very little work. Agree about that, dont know their application to much, i know they had problems to do it. I am sending another seires of patchs for kvm kernel and kvm-userspace that would allow users of kvm to test ksm with it. The kvm patchs would apply to Avi git tree. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Hi Izik, Is there some user documentation available? (apart from RTFS?:)) I've compiled kernel with v2 of Your patches, loaded ksm module, did echo 1 > /proc/sys/kernel/mm/ksm/run, but I think it didn't do anything, at least no pages were collected.. Could You advise me a bit? thanks a lot in advance... I can't wait to try it on our hosts runing 50-60 KVMs :) BR nik On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote: > From v1 to v2: > > 1)Fixed security issue found by Chris Wright: > Ksm was checking if page is a shared page by running !PageAnon. > Beacuse that Ksm scan only anonymous memory, all !PageAnons > inside ksm data strctures are shared page, however there might > be a case for do_wp_page() when the VM_SHARED is used where > do_wp_page() would instead of copying the page into new anonymos > page, would reuse the page, it was fixed by adding check for the > dirty_bit of the virtual addresses pointing into the shared page. > I was not finding any VM code tha would clear the dirty bit from > this virtual address (due to the fact that we allocate the page > using page_alloc() - kernel allocated pages), ~but i still want > confirmation about this from the vm guys - thanks.~ > > 2)Moved to sysfs to control ksm: > It was requested as a better way to control the ksm scanning > thread than ioctls. > the sysfs api: > dir: /sys/kernel/mm/ksm/ > > kernel_pages_allocated - information about how many kernel pages > ksm have allocated, this pages are not swappable, and each page > like that is used by ksm to share pages with identical content > > pages_shared - how many pages were shared by ksm > > run - set to 1 when you want ksm to run, 0 when no > > max_kernel_pages - set the maximum amount of kernel pages > to be allocated by ksm, set 0 for unlimited. > > pages_to_scan - how many pages to scan before ksm will sleep > > sleep - how much usecs ksm will sleep. > > 3)Add sysfs paramater to control the maximum kernel pages to be by > ksm. > > 4)Add statistics about how much pages are really shared. > > > One issue still to be discussed: > There was a suggestion to use madvice(SHAREABLE) instead of using > ioctls to register memory that need to be scanned by ksm. > Such change is outside the area of ksm.c and would required adding > new madvice api, and change some parts of the vm and the kernel > code, so first thing to do, is realized if we really want this. > > I dont know any other open issues. > > Thanks. > > This is from the first post: > (The kvm part, togather with the kvm-userspace part, was post with V1 > before about a week, whoever want to test ksm may download the > patch from lkml archive) > > KSM is a linux driver that allows dynamicly sharing identical memory > pages between one or more processes. > > Unlike tradtional page sharing that is made at the allocation of the > memory, ksm do it dynamicly after the memory was created. > Memory is periodically scanned; identical pages are identified and > merged. > The sharing is unnoticeable by the process that use this memory. > (the shared pages are marked as readonly, and in case of write > do_wp_page() take care to create new copy of the page) > > To find identical pages ksm use algorithm that is split into three > primery levels: > > 1) Ksm will start scan the memory and will calculate checksum for each >page that is registred to be scanned. >(In the first round of the scanning, ksm would only calculate > this checksum for all the pages) > > 2) Ksm will go again on the whole memory and will recalculate the >checmsum of the pages, pages that are found to have the same >checksum value, would be considered "pages that are most likely >wont changed" >Ksm will insert this pages into sorted by page content RB-tree that >is called "unstable tree", the reason that this tree is called >unstable is due to the fact that the page contents might changed >while they are still inside the tree, and therefore the tree would >become corrupted. >Due to this problem ksm take two more steps in addition to the >checksum calculation: >a) Ksm will throw and recreate the entire unstable tree each round > of memory scanning - so if we have corruption, it will be fixed > when we will rebuild the tree. >b) Ksm is using RB-tree, that its balancing is made by the node color > and not by the content, so even if the page get corrupted, it still > would take the same amount of time to search on it. > > 3) In addition to the unstable tree, ksm hold another tree that is called >"stable tree" - this tree is RB-tree that is sorted by the pages >content and all its pages are write protected, and therefore it cant get >corrupted. >Each time ksm will find two identcial pages using the unstable tree, >it will create new write-protected shared page, and this p
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Nick Piggin wrote: On Sunday 05 April 2009 01:35:18 Izik Eidus wrote: This driver is very useful for KVM as in cases of runing multiple guests operation system of the same type. (For desktop work loads we have achived more than x2 memory overcommit (more like x3)) Interesting that it is a desirable workload to have multiple guests each running MS office. I wonder, can windows enter a paravirtualised guest mode for KVM? Windows has some support for paravirtualization, for example it can use hypercalls instead of tlb flush IPIs. And can you detect page allocation/freeing events? Not that I know of. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
On Sunday 05 April 2009 01:35:18 Izik Eidus wrote: > This driver is very useful for KVM as in cases of runing multiple guests > operation system of the same type. > (For desktop work loads we have achived more than x2 memory overcommit > (more like x3)) Interesting that it is a desirable workload to have multiple guests each running MS office. I wonder, can windows enter a paravirtualised guest mode for KVM? And can you detect page allocation/freeing events? > This driver have found users other than KVM, for example CERN, > Fons Rademakers: > "on many-core machines we run one large detector simulation program per core. > These simulation programs are identical but run each in their own process and > need about 2 - 2.5 GB RAM. > We typically buy machines with 2GB RAM per core and so have a problem to run > one of these programs per core. > Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field > maps, detector geometry, etc. > Currently people have been trying to start one program, initialize the > geometry > and field maps and then fork it N times, to have the data shared. > With KSM this would be done automatically by the system so it sounded > extremely > attractive when Andrea presented it." They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc. Presumably they will probably want to control it to interleave it over all numa nodes and use hugepages for it. It would be very little work. > I am sending another seires of patchs for kvm kernel and kvm-userspace > that would allow users of kvm to test ksm with it. > The kvm patchs would apply to Avi git tree. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Thu, 2 Apr 2009, Chris Wright wrote: > * Jesper Juhl (j...@chaosbits.net) wrote: > > Do you rely only on the checksum or do you actually compare pages to check > > they are 100% identical before sharing? > > Checksum has absolutely nothing to do w/ finding if two pages match. > It's only used as a heuristic to suggest whether a single page has > changed. If that page is changing we won't bother trying to find a > match for it. Here's an example of the life of a page w.r.t checksum. > > 1. checksum = uninitialized > 2. first time page is found, checksum it (checksum = A). >if checksum has changed (uninitialize != A) don't go any further w/ that > page > 3. next time page is found, checksum it (checksum = B). >if checksum has change (A != B) don't go any further w/ that page > 4. next time page is found, checksum it (checksum = B). >if checksum has changed (B == B)...it hasn't, continue processing the >page > > later if a match is found in the tree (which is sorted by _contents_, > i.e. memcmp) we'll attempt to merge the pages which at it's very core > does: > > if (pages_identical(oldpage, newpage)) > ret = replace_page(vma, oldpage, newpage, orig_pte, newprot); > > pages_identical? you guessed it...just does: > > r = memcmp(addr1, addr2, PAGE_SIZE) > Thank you for that explanation, it set my mind at ease :-) -- Jesper Juhl http://www.chaosbits.net/ Plain text mails only, please http://www.expita.com/nomime.html Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
* Jesper Juhl (j...@chaosbits.net) wrote: > Do you rely only on the checksum or do you actually compare pages to check > they are 100% identical before sharing? Checksum has absolutely nothing to do w/ finding if two pages match. It's only used as a heuristic to suggest whether a single page has changed. If that page is changing we won't bother trying to find a match for it. Here's an example of the life of a page w.r.t checksum. 1. checksum = uninitialized 2. first time page is found, checksum it (checksum = A). if checksum has changed (uninitialize != A) don't go any further w/ that page 3. next time page is found, checksum it (checksum = B). if checksum has change (A != B) don't go any further w/ that page 4. next time page is found, checksum it (checksum = B). if checksum has changed (B == B)...it hasn't, continue processing the page later if a match is found in the tree (which is sorted by _contents_, i.e. memcmp) we'll attempt to merge the pages which at it's very core does: if (pages_identical(oldpage, newpage)) ret = replace_page(vma, oldpage, newpage, orig_pte, newprot); pages_identical? you guessed it...just does: r = memcmp(addr1, addr2, PAGE_SIZE) thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Jesper Juhl wrote: Hi, On Tue, 31 Mar 2009, Izik Eidus wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) One question; Calcolating a checksum is a fine way to find pages that are "likely to be identical" I dont use checksum as with hash table, the checksum doesnt use to find identical pages by the way that they have similer data... the checksum is used to let me know that the page was not changed for a while and it is worth checking for identical pages to it... In the future we will want to use the page table dirty bit for it, as taking checksum is somewhat expensive , but there is no guarantee that two pages with the same checksum really are identical - there *will* be checksum collisions eventually. So, I really hope that your implementation actually checks that two pages that it find that have identical checksums really are 100% identical by comparing them bit by bit before throwing one away. We do that :-) If you rely only on a checksum then eventually a user will get bitten by a checksum collision and, in the best case, something will crash, and in the worst case, data will silently be corrupted. Do you rely only on the checksum or do you actually compare pages to check they are 100% identical before sharing? I do 100% compare to the pages before i share them. I must admit that I have not read through the patch to find the answer, I just read your description and became concerned. Dont worry, me neither :-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Hi, On Tue, 31 Mar 2009, Izik Eidus wrote: > KSM is a linux driver that allows dynamicly sharing identical memory > pages between one or more processes. > > Unlike tradtional page sharing that is made at the allocation of the > memory, ksm do it dynamicly after the memory was created. > Memory is periodically scanned; identical pages are identified and > merged. > The sharing is unnoticeable by the process that use this memory. > (the shared pages are marked as readonly, and in case of write > do_wp_page() take care to create new copy of the page) > > To find identical pages ksm use algorithm that is split into three > primery levels: > > 1) Ksm will start scan the memory and will calculate checksum for each >page that is registred to be scanned. >(In the first round of the scanning, ksm would only calculate > this checksum for all the pages) > One question; Calcolating a checksum is a fine way to find pages that are "likely to be identical", but there is no guarantee that two pages with the same checksum really are identical - there *will* be checksum collisions eventually. So, I really hope that your implementation actually checks that two pages that it find that have identical checksums really are 100% identical by comparing them bit by bit before throwing one away. If you rely only on a checksum then eventually a user will get bitten by a checksum collision and, in the best case, something will crash, and in the worst case, data will silently be corrupted. Do you rely only on the checksum or do you actually compare pages to check they are 100% identical before sharing? I must admit that I have not read through the patch to find the answer, I just read your description and became concerned. -- Jesper Juhl http://www.chaosbits.net/ Plain text mails only, please http://www.expita.com/nomime.html Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Anthony Liguori wrote: Izik Eidus wrote: I am sending another seires of patchs for kvm kernel and kvm-userspace that would allow users of kvm to test ksm with it. The kvm patchs would apply to Avi git tree. Any reason to not take these through upstream QEMU instead of kvm-userspace? In principle, I don't see anything that would prevent normal QEMU from almost making use of this functionality. That would make it one less thing to eventually have to merge... The changes for the kvm-userspace were just provided for testing it... After we will have ksm inside the kernel we will send another patch to qemu-devel that will add support for it. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Izik Eidus wrote: I am sending another seires of patchs for kvm kernel and kvm-userspace that would allow users of kvm to test ksm with it. The kvm patchs would apply to Avi git tree. Any reason to not take these through upstream QEMU instead of kvm-userspace? In principle, I don't see anything that would prevent normal QEMU from almost making use of this functionality. That would make it one less thing to eventually have to merge... Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
> You have implemented second one, but seems it already was patented > http://www.google.com/patents?vid=USPAT6789156 > I'm not a lawyer but IMHO we have direct conflict here. > >From other point of view they have patented the WEEL, but at least we > have to know about this. Its an old idea and appeared for Linux in March 1998: Little project from Philipp Reisner called "mergemem". http://groups.google.com/group/muc.lists.linux-kernel/browse_thread/thread/387af278089c7066?ie=utf-8&oe=utf-8&q=share+identical+pages#b3d4f68fb5dd4f88 so if there is a patent which is relevant (and thats a question for lawyers and legal patent search people) perhaps the Linux Foundation and some of the patent busters could take a look at mergemem and re-examination. Alan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Izik Eidus <[EMAIL PROTECTED]> writes: > (From v1 to v2 the main change is much more documentation) > > KSM is a linux driver that allows dynamicly sharing identical memory > pages between one or more processes. > > Unlike tradtional page sharing that is made at the allocation of the > memory, ksm do it dynamicly after the memory was created. > Memory is periodically scanned; identical pages are identified and > merged. > The sharing is unnoticeable by the process that use this memory. > (the shared pages are marked as readonly, and in case of write > do_wp_page() take care to create new copy of the page) > > This driver is very useful for KVM as in cases of runing multiple guests > operation system of the same type. Hi Izik, approach that was used in the driver commonly known as content based search. Where are several variants of it most commons are: 1: with guest TM support 2: w/o guest vm support. You have implemented second one, but seems it already was patented http://www.google.com/patents?vid=USPAT6789156 I'm not a lawyer but IMHO we have direct conflict here. >From other point of view they have patented the WEEL, but at least we have to know about this. > (For desktop work loads we have achived more than x2 memory overcommit > (more like x3)) > > This driver have found users other than KVM, for example CERN, > Fons Rademakers: > "on many-core machines we run one large detector simulation program per core. > These simulation programs are identical but run each in their own process and > need about 2 - 2.5 GB RAM. > We typically buy machines with 2GB RAM per core and so have a problem to run > one of these programs per core. > Of the 2 - 2.5 GB about 700MB is identical data in the form of magnetic field > maps, detector geometry, etc. > Currently people have been trying to start one program, initialize the > geometry > and field maps and then fork it N times, to have the data shared. > With KSM this would be done automatically by the system so it sounded > extremely > attractive when Andrea presented it." > > (We have are already started to test KSM on their systems...) > > KSM can run as kernel thread or as userspace application or both > > example for how to control the kernel thread: > > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include "ksm.h" > > int main(int argc, char *argv[]) > { > int fd; > int used = 0; > int fd_start; > struct ksm_kthread_info info; > > > if (argc < 2) { > fprintf(stderr, > "usage: %s {start npages sleep | stop | info}\n", > argv[0]); > exit(1); > } > > fd = open("/dev/ksm", O_RDWR | O_TRUNC, (mode_t)0600); > if (fd == -1) { > fprintf(stderr, "could not open /dev/ksm\n"); > exit(1); > } > > if (!strncmp(argv[1], "start", strlen(argv[1]))) { > used = 1; > if (argc < 4) { > fprintf(stderr, > "usage: %s start npages_to_scan max_pages_to_merge sleep\n", > argv[0]); > exit(1); > } > info.pages_to_scan = atoi(argv[2]); > info.max_pages_to_merge = atoi(argv[3]); > info.sleep = atoi(argv[4]); > info.flags = ksm_control_flags_run; > > fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, &info); > if (fd_start == -1) { > fprintf(stderr, "KSM_START_KTHREAD failed\n"); > exit(1); > } > printf("created scanner\n"); > } > > if (!strncmp(argv[1], "stop", strlen(argv[1]))) { > used = 1; > info.flags = 0; > fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, &info); > printf("stopped scanner\n"); > } > > if (!strncmp(argv[1], "info", strlen(argv[1]))) { > used = 1; > ioctl(fd, KSM_GET_INFO_KTHREAD, &info); >printf("flags %d, pages_to_scan %d npages_merge %d, sleep_time %d\n", >info.flags, info.pages_to_scan, info.max_pages_to_merge, info.sleep); > } > > if (!used) > fprintf(stderr, "unknown command %s\n", argv[1]); > > return 0; > } > > example of how to register qemu to ksm (or any userspace application) > > diff --git a/qemu/vl.c b/qemu/vl.c > index 4721fdd..7785bf9 100644 > --- a/qemu/vl.c > +++ b/qemu/vl.c > @@ -21,6 +21,7 @@ > * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > * DEALINGS IN > * THE SOFTWARE. > */ > +#include "ksm.h" > #include "hw/hw.h" > #include "hw/boards.h" > #include "hw/usb.h" > @@ -5799,6 +5800,37 @@ static void termsig_setup(void) > > #endif > > +int ksm_register_memory(void) > +{ > +int fd; > +int ksm_fd; > +int r = 1; > +struct ksm_memory_region ksm_region; > + > +
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
2008/11/20 Izik Eidus <[EMAIL PROTECTED]>: > ציטוט Izik Eidus: >> >> ציטוט Ryota OZAKI: >>> >>> Hi Izik, >>> >>> I've tried your patch set, but ksm doesn't work in my machine. >>> >>> I compiled linux patched with the four patches and configured with KSM >>> and KVM enabled. After boot with the linux, I run two VMs running linux >>> using QEMU with a patch in your mail and started KSM scanner with your >>> script, then the host linux caused panic with the following oops. >>> >> >> Yes you are right, we are missing pte_unmap(pte); in get_pte()! >> that will effect just 32bits with highmem so this why you see it >> thanks for the reporting, i will fix it for v3 >> >> below patch should fix it (i cant test it now, will test it for v3) >> >> can you report if it fix your problem? thanks >> > Thinking about what i just did, it is wrong, > this patch is the right one (still wasnt tested), but if you are going to > apply something then use this one. Great! Applied the 2nd patch, ksm works with both HIGHMEM enabled and disabled. Thanks for your quick response, ozaki-r > > thanks > > diff --git a/mm/ksm.c b/mm/ksm.c > index 707be52..c842c29 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -569,14 +569,16 @@ out: > static int is_present_pte(struct mm_struct *mm, unsigned long addr) > { >pte_t *ptep; > + int r; > >ptep = get_pte(mm, addr); >if (!ptep) >return 0; > > - if (pte_present(*ptep)) > - return 1; > - return 0; > + r = pte_present(*ptep); > + pte_unmap(ptep); > + > + return r; > } > > #define PAGEHASH_LEN 128 > @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm, >if (!orig_ptep) >goto out_unlock; >orig_pte = *orig_ptep; > + pte_unmap(orig_ptep); >if (!pte_present(orig_pte)) >goto out_unlock; >if (page_to_pfn(oldpage) != pte_pfn(orig_pte)) > >
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
ציטוט Izik Eidus: ציטוט Ryota OZAKI: Hi Izik, I've tried your patch set, but ksm doesn't work in my machine. I compiled linux patched with the four patches and configured with KSM and KVM enabled. After boot with the linux, I run two VMs running linux using QEMU with a patch in your mail and started KSM scanner with your script, then the host linux caused panic with the following oops. Yes you are right, we are missing pte_unmap(pte); in get_pte()! that will effect just 32bits with highmem so this why you see it thanks for the reporting, i will fix it for v3 below patch should fix it (i cant test it now, will test it for v3) can you report if it fix your problem? thanks Thinking about what i just did, it is wrong, this patch is the right one (still wasnt tested), but if you are going to apply something then use this one. thanks diff --git a/mm/ksm.c b/mm/ksm.c index 707be52..c842c29 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -569,14 +569,16 @@ out: static int is_present_pte(struct mm_struct *mm, unsigned long addr) { pte_t *ptep; + int r; ptep = get_pte(mm, addr); if (!ptep) return 0; - if (pte_present(*ptep)) - return 1; - return 0; + r = pte_present(*ptep); + pte_unmap(ptep); + + return r; } #define PAGEHASH_LEN 128 @@ -669,6 +671,7 @@ static int try_to_merge_one_page(struct mm_struct *mm, if (!orig_ptep) goto out_unlock; orig_pte = *orig_ptep; + pte_unmap(orig_ptep); if (!pte_present(orig_pte)) goto out_unlock; if (page_to_pfn(oldpage) != pte_pfn(orig_pte))
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
ציטוט Ryota OZAKI: Hi Izik, I've tried your patch set, but ksm doesn't work in my machine. I compiled linux patched with the four patches and configured with KSM and KVM enabled. After boot with the linux, I run two VMs running linux using QEMU with a patch in your mail and started KSM scanner with your script, then the host linux caused panic with the following oops. Yes you are right, we are missing pte_unmap(pte); in get_pte()! that will effect just 32bits with highmem so this why you see it thanks for the reporting, i will fix it for v3 below patch should fix it (i cant test it now, will test it for v3) can you report if it fix your problem? thanks == BEGINNING of OOPS kernel BUG at arch/x86/mm/highmem_32.c:87! invalid opcode: [#1] SMP last sysfs file: /sys/class/net/vnet-ssh2/address Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: netconsole autofs4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables loop kvm_intel kvm iTCO_wdt iTCO_vendor_support igb netxen_nic button ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode] Pid: 343, comm: kksmd Not tainted (2.6.28-rc5-linus-head-20081119-sparsemem #1) X7DWA EIP: 0060:[] EFLAGS: 00010206 CPU: 6 EIP is at kmap_atomic_prot+0x7d/0xeb EAX: c0008d94 EBX: c1ff6240 ECX: 0163 EDX: 7e00 ESI: 0154 EDI: 0055 EBP: f5cdbf10 ESP: f5cdbef8 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process kksmd (pid: 343, ti=f5cda000 task=f617b140 task.ti=f5cda000) Stack: 7fa12163 f000 c204efbc f50479e8 9eb7e000 c08a34d0 f5cdbf18 c041f07a f5cdbf28 c048339c f5c271e0 f5cdbf30 c04833bc f5cdbfb0 c0483b0d f5cdbf50 c0425845 0064 0009 c08a34d0 f5cdbfb0 c06384c1 Call Trace: [] ? kmap_atomic+0x13/0x15 [] ? get_pte+0x50/0x63 [] ? is_present_pte+0xd/0x1f [] ? ksm_scan_start+0x9a/0x7ac [] ? finish_task_switch+0x29/0xa4 [] ? schedule+0x6bf/0x719 [] ? default_spin_lock_flags+0x8/0xc [] ? finish_wait+0x49/0x4e [] ? kthread_ksm_scan_thread+0x0/0xdc [] ? kthread_ksm_scan_thread+0x3a/0xdc [] ? autoremove_wake_function+0x0/0x38 [] ? kthread+0x40/0x66 [] ? kthread+0x0/0x66 [] ? kernel_thread_helper+0x7/0x10 Code: 86 00 00 00 64 a1 04 a0 82 c0 6b c0 0d 8d 3c 30 a1 78 b0 77 c0 8d 34 bd 00 00 00 00 89 45 ec a1 0c d0 84 c0 29 f0 83 38 00 74 04 <0f> 0b eb fe c1 ea 1a 8b 04 d5 80 32 8a c0 83 e0 fc 29 c3 c1 fb EIP: [] kmap_atomic_prot+0x7d/0xeb SS:ESP 0068:f5cdbef8 Kernel panic - not syncing: Fatal exception == END of OOPS diff --git a/mm/ksm.c b/mm/ksm.c index 707be52..e14448a 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -562,6 +562,7 @@ static pte_t *get_pte(struct mm_struct *mm, unsigned long addr) goto out; ptep = pte_offset_map(pmd, addr); + pte_unmap(ptep); out: return ptep; }
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Izik Eidus wrote: Andrew Morton wrote: On Tue, 11 Nov 2008 21:18:23 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote: hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? (And how much would we gain from it?) [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] ^^ this? KSM is separate driver , it doesn't change anything in the VM but adding two helper functions. What, you mean I should actually read the code? Oh well, OK. Andrea i think what is happening here is my fault Sorry, meant to write here Andrew :-) i will try to give here much more information about KSM: first the bad things: KSM shared pages are right now (we have patch that can change it but we want to wait with it) unswappable this mean that the entire memory of the guest is swappable but the pages that are shared are not. (when the pages are splited back by COW they become anonymous again with the help of do_wp_page() the reason that the pages are not swappable is beacuse the way the Linux Rmap is working, this not allow us to create nonlinear anonymous pages (we dont want to use nonlinear vma for kvm, as it will make swapping for kvm very slow) the reason that ksm pages need to have nonlinear reverse mapping is that for one guest identical page can be found in whole diffrent offset than other guest have it (this is from the userspace VM point of view) the rest is quite simple: it is walking over the entire guest memory (or only some of it) and scan for identical pages using hash table it merge the pages into one single write protected page numbers for ksm is something that i have just for desktops and just the numbers i gave you what is do know is: big overcommit like 300% is possible just when you take into account that some of the guest memory will be free we are sharing mostly the DLLs/ KERNEL / ZERO pages, for the DLLS and KERNEL PAGEs this pages likely will never break but ZERO pages will be break when windows will allocate them and will come back when windows will free the memory. (i wouldnt suggest 300% overcommit for servers workload, beacuse you can end up swapping in that case, but for desktops after runing in production and passed some seiroes qa tress tests it seems like 300% is a real number that can be use) i just ran test on two fedora 8 guests and got that results (using GNOME in both of them) 9959 root 15 0 730m 537m 281m S8 3.4 0:44.28 kvm 9956 root 15 0 730m 537m 246m S4 3.4 0:41.43 kvm as you can see the physical sharing was 281mb and 246mb (kernel pages are counted as shared) there is small lie in this numbers beacuse pages that was shared across two guests and was splited by writing from guest number 1 will still have 1 refernce count to it and will still be kernel page (untill the other guest (num 2) will write to it as well) anyway i am willing to make much better testing or everything that needed for this patchs to be merged. (just tell me what and i will do it) beside that you should know that patch 4 is not a must, it is just nice optimization... thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Hi Andrew, thanks for looking into this. On Tue, Nov 11, 2008 at 11:11:10AM -0800, Andrew Morton wrote: > What userspace-only changes could fix this? Identify the common data, > write it to a flat file and mmap it, something like that? The whole idea is to do something that works transparently and isn't tailored for kvm. The mmu notifier change_pte method can be dropped as well if you want (I recommended not to have it in the first submission but Izik preferred to keep it because it will optimize away a kvm shadow pte minor fault the first time kvm access the page after sharing it). The page_wrprotect and replace_page can also be embedded in ksm. So the idea is that while we could do something specific to ksm that keeps most of the code in userland, it'd be more tricky as it'd require some communication with the core VM anyway (we can't just do it in userland with mprotect, memcpy, mmap(MAP_PRIVATE) as it wouldn't be atomic and second it'd be inefficient in terms of vma-buildup for the same reason nonlinear-vmas exist), but most important: it wouldn't work for all other regular process. With KSM we can share anonymous memory for the whole system, KVM is just a random user. This implementation is on the simple side because it can't swap. Swapping and perhaps the limitation of sharing anonymous memory is the only trouble here but those will be addressed in the future. ksm is a new device driver so it's like /dev/mem, so no swapping isn't a blocker here. By sharing anon pages, in short we're making anonymous vmas nonlinear, and this isn't supported by the current rmap code. So swapping can't work unless we mark those anon-vmas nonlinear and we either build the equivalent of the old pte_chains on demand just for those nonlinear shared pages, or we do a full scan of all ptes in the nonlinear anon-vmas. An external rmap infrastructure can allow ksm to build whatever it wants inside ksm.c to track the nonlinear anon-pages inside a regular anon-vma and rmap.c can invoke those methods to find the ptes for those nonlinear pages. The core VM won't get more complex and ksm can decide if to do a full nonlinear scan of the vma, or to build the equivalent of pte_chains. This again has to be added later and once everybody sees ksm, it'll be easier to agree on a external-rmap API to allow it to swap. While the pte_chains are very inefficent to reverse the regular anonymous mappings, they're efficient solution as an exception for the shared KSM pages that gets scattered over the linear anon-vmas. It's a bit like the initial kvm that was merged despite it couldn't swap. Then we added mmu notifiers, and now kvm can swap. So we add ksm now without swapping and later we build an external-rmap to allow ksm to swap after we agree ksm is useful and people starts using it. > There has been the occasional discussion about idenfifying all-zeroes > pages and scavenging them, repointing them at the zero page. Could > this infrastructure be used for that? (And how much would we gain from > it?) Zero pages makes a lot of difference for windows, but they're totally useless for linux. With current ksm all guest pagecache is 100% shared across hosts, so when you start an app the .text runs on the same physical memory on both guests. Works fine and code is quite simple in this round. Once we add swapping it'll be a bit more complex in VM terms as it'll have to handle nonlinear anon-vmas. If we ever decide to share MAP_SHARED pagecache it'll be even more complicated than just adding the external-rmap... I think this can be done incrementally if needed at all. OpenVZ if the install is smart enough could share the pagecache by just hardlinking the equal binaries.. but AFIK they don't do that normally. For the anon ram they need this too, they can't solve equal anon ram in userland as it has to be handled atomically at runtime. The folks at CERN LHC (was visiting it last month) badly need KSM too for certain apps they're running that are allocating huge arrays (aligned) in anon memory and they're mostly equal for all processes. They tried to work around it with fork but it's not working well, KSM would solve their problem (it'd solve it both on the same OS and across OS with kvm as virtualization engine running on the same host). So I think this is good stuff, and I'd focus discussions and reviews on the KSM API of /dev/ksm that if merged will be longstanding and much more troublesome than the rest of the code if changed later (if we change the ksm internals at any time nobody will notice), and post-merging we can focus on the external-rmap to make KSM pages first class citizens in VM terms. But then anything can be changed here, so suggestions welcome! Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: On Tue, 11 Nov 2008 21:18:23 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote: hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? (And how much would we gain from it?) [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] ^^ this? KSM is separate driver , it doesn't change anything in the VM but adding two helper functions. What, you mean I should actually read the code? Oh well, OK. Andrea i think what is happening here is my fault i will try to give here much more information about KSM: first the bad things: KSM shared pages are right now (we have patch that can change it but we want to wait with it) unswappable this mean that the entire memory of the guest is swappable but the pages that are shared are not. (when the pages are splited back by COW they become anonymous again with the help of do_wp_page() the reason that the pages are not swappable is beacuse the way the Linux Rmap is working, this not allow us to create nonlinear anonymous pages (we dont want to use nonlinear vma for kvm, as it will make swapping for kvm very slow) the reason that ksm pages need to have nonlinear reverse mapping is that for one guest identical page can be found in whole diffrent offset than other guest have it (this is from the userspace VM point of view) the rest is quite simple: it is walking over the entire guest memory (or only some of it) and scan for identical pages using hash table it merge the pages into one single write protected page numbers for ksm is something that i have just for desktops and just the numbers i gave you what is do know is: big overcommit like 300% is possible just when you take into account that some of the guest memory will be free we are sharing mostly the DLLs/ KERNEL / ZERO pages, for the DLLS and KERNEL PAGEs this pages likely will never break but ZERO pages will be break when windows will allocate them and will come back when windows will free the memory. (i wouldnt suggest 300% overcommit for servers workload, beacuse you can end up swapping in that case, but for desktops after runing in production and passed some seiroes qa tress tests it seems like 300% is a real number that can be use) i just ran test on two fedora 8 guests and got that results (using GNOME in both of them) 9959 root 15 0 730m 537m 281m S8 3.4 0:44.28 kvm 9956 root 15 0 730m 537m 246m S4 3.4 0:41.43 kvm as you can see the physical sharing was 281mb and 246mb (kernel pages are counted as shared) there is small lie in this numbers beacuse pages that was shared across two guests and was splited by writing from guest number 1 will still have 1 refernce count to it and will still be kernel page (untill the other guest (num 2) will write to it as well) anyway i am willing to make much better testing or everything that needed for this patchs to be merged. (just tell me what and i will do it) beside that you should know that patch 4 is not a must, it is just nice optimization... thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 21:18:23 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote: > > hm. > > > > There has been the occasional discussion about idenfifying all-zeroes > > pages and scavenging them, repointing them at the zero page. Could > > this infrastructure be used for that? (And how much would we gain from > > it?) > > > > [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm > > thing here ;) ] ^^ this? > KSM is separate driver , it doesn't change anything in the VM but adding > two helper functions. What, you mean I should actually read the code? Oh well, OK. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. What userspace-only changes could fix this? Identify the common data, write it to a flat file and mmap it, something like that? This was considered. You can't scan the image, because it may be encrypted/compressed/offset (typical images _are_ offset because the first partition starts at sector 63...). The data may come from the network and not a disk image. You can't scan in userspace because the images belong to different users and contain sensitive data. Pages may come from several images (multiple disk images per guest) so you end up with one vma per page. So you have to scan memory, after the guest has retrieved it from disk/network/manufactured it somehow, decompressed and encrypted it, written it to the offset it wants. You can't scan from userspace since it's sensitive data, and of course the actual merging need to be done atomically, which can only be done from the holy of holies, the vm. For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? Yes, trivially. ksm may be an overkill for this, though. (And how much would we gain from it?) A lot of zeros. [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] I sympathize -- us too. Consider the typical multiuser gnome minicomputer with all 150 users reading lwn.net at the same time instead of working. You could share the firefox rendered page cache, reducing memory utilization drastically. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 21:07:10 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote: > we have used KSM in production for about half year and the numbers that > came from our QA is: > using KSM for desktop (KSM was tested just for windows desktop workload) > you can run as many as > 52 windows xp with 1 giga ram each on server with just 16giga ram. (this > is more than 300% overcommit) > the reason is that most of the kernel/dlls of this guests is shared and > in addition we are sharing the windows zero > (windows keep making all its free memory as zero, so every time windows > release memory we take the page back to the host) > there is slide that give this numbers you can find at: > http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_3.pdf > > (slide 27) > beside more i gave presentation about ksm that can be found at: > http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_12.pdf OK, 300% isn't chicken feed. It is quite important that information such as this be prepared, added to the patch changelogs and maintained. For a start, without this basic information, there is no reason for anyone to look at any of the code! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: On Tue, 11 Nov 2008 20:48:16 +0200 Avi Kivity <[EMAIL PROTECTED]> wrote: Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. What userspace-only changes could fix this? Identify the common data, write it to a flat file and mmap it, something like that? For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? (And how much would we gain from it?) [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] KSM is separate driver , it doesn't change anything in the VM but adding two helper functions. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 20:48:16 +0200 Avi Kivity <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > The whole approach seems wrong to me. The kernel lost track of these > > pages and then we run around post-facto trying to fix that up again. > > Please explain (for the changelog) why the kernel cannot get this right > > via the usual sharing, refcounting and COWing approaches. > > > > For kvm, the kernel never knew those pages were shared. They are loaded > from independent (possibly compressed and encrypted) disk images. These > images are different; but some pages happen to be the same because they > came from the same installation media. What userspace-only changes could fix this? Identify the common data, write it to a flat file and mmap it, something like that? > For OpenVZ the situation is less clear, but if you allow users to > independently upgrade their chroots you will eventually arrive at the > same scenario (unless of course you apply the same merging strategy at > the filesystem level). hm. There has been the occasional discussion about idenfifying all-zeroes pages and scavenging them, repointing them at the zero page. Could this infrastructure be used for that? (And how much would we gain from it?) [I'm looking for reasons why this is more than a muck-up-the-vm-for-kvm thing here ;) ] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Avi Kivity wrote: Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. As Avi said, in kvm we cannot know how the guest is going to map its pages, we have nothing to do but to scan for the identical pages (you can have pages that are shared that are in whole different offset inside the guest) For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote: KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. the sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) this driver is very useful for KVM as in cases of runing multiple guests operation system of the same type, many pages are sharable. this driver can be useful by OpenVZ as well. These benefits should be quantified, please. Also any benefits to any other workloads should be identified and quantified. Sure, we have used KSM in production for about half year and the numbers that came from our QA is: using KSM for desktop (KSM was tested just for windows desktop workload) you can run as many as 52 windows xp with 1 giga ram each on server with just 16giga ram. (this is more than 300% overcommit) the reason is that most of the kernel/dlls of this guests is shared and in addition we are sharing the windows zero (windows keep making all its free memory as zero, so every time windows release memory we take the page back to the host) there is slide that give this numbers you can find at: http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_3.pdf (slide 27) beside more i gave presentation about ksm that can be found at: http://kvm.qumranet.com/kvmwiki/KvmForum2008?action=AttachFile&do=get&target=kdf2008_12.pdf if more numbers are wanted for other workloads i can test it. (the idea of ksm is to run it slowly slowy at low priority and let it merge pages when no one need the cpu) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
Andrew Morton wrote: The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media. For OpenVZ the situation is less clear, but if you allow users to independently upgrade their chroots you will eventually arrive at the same scenario (unless of course you apply the same merging strategy at the filesystem level). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux
On Tue, 11 Nov 2008 15:21:37 +0200 Izik Eidus <[EMAIL PROTECTED]> wrote: > KSM is a linux driver that allows dynamicly sharing identical memory pages > between one or more processes. > > unlike tradtional page sharing that is made at the allocation of the > memory, ksm do it dynamicly after the memory was created. > Memory is periodically scanned; identical pages are identified and merged. > the sharing is unnoticeable by the process that use this memory. > (the shared pages are marked as readonly, and in case of write > do_wp_page() take care to create new copy of the page) > > this driver is very useful for KVM as in cases of runing multiple guests > operation system of the same type, many pages are sharable. > this driver can be useful by OpenVZ as well. These benefits should be quantified, please. Also any benefits to any other workloads should be identified and quantified. The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html