Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-04 Thread Alan Cox
  Taken off list 
 
 Hmmm, list would like to know :-).

That would be my choice too but unfortunately I can't do that
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-03 Thread Pavel Machek
On Tue 2008-12-02 22:10:29, Alan Cox wrote:
 On Tue, 2 Dec 2008 13:24:11 -0800
 Chris Wright [EMAIL PROTECTED] wrote:
 
  * Alan Cox ([EMAIL PROTECTED]) wrote:
   On Tue, 2 Dec 2008 10:07:24 -0800
   Chris Wright [EMAIL PROTECTED] wrote:
* Alan Cox ([EMAIL PROTECTED]) wrote:
  +   r = !memcmp(old_digest, sha1_item-sha1val, SHA1_DIGEST_SIZE);
  +   mutex_unlock(sha1_lock);
  +   if (r) {
  +   char *old_addr, *new_addr;
  +   old_addr = kmap_atomic(oldpage, KM_USER0);
  +   new_addr = kmap_atomic(newpage, KM_USER1);
  +   r = !memcmp(old_addr+PAGEHASH_LEN, 
  new_addr+PAGEHASH_LEN,
  +   PAGE_SIZE-PAGEHASH_LEN);
 
 NAK - this isn't guaranteed to be robust so you could end up merging
 different pages one provided by a malicious attacker.

I presume you're referring to the digest comparison.  While there's
theoretical concern of hash collision, it's mitigated by hmac(sha1)
so the attacker can't brute force for known collisions.
   
   Using current known techniques. A random collision is just as bad news.
  
  And, just to clarify, your concern would extend to any digest based
  comparison?  Or are you specifically concerned about sha1?
 
 Taken off list 

Hmmm, list would like to know :-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-02 Thread Chris Wright
* Alan Cox ([EMAIL PROTECTED]) wrote:
  +   r = !memcmp(old_digest, sha1_item-sha1val, SHA1_DIGEST_SIZE);
  +   mutex_unlock(sha1_lock);
  +   if (r) {
  +   char *old_addr, *new_addr;
  +   old_addr = kmap_atomic(oldpage, KM_USER0);
  +   new_addr = kmap_atomic(newpage, KM_USER1);
  +   r = !memcmp(old_addr+PAGEHASH_LEN, new_addr+PAGEHASH_LEN,
  +   PAGE_SIZE-PAGEHASH_LEN);
 
 NAK - this isn't guaranteed to be robust so you could end up merging
 different pages one provided by a malicious attacker.

I presume you're referring to the digest comparison.  While there's
theoretical concern of hash collision, it's mitigated by hmac(sha1)
so the attacker can't brute force for known collisions.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-02 Thread Alan Cox
On Tue, 2 Dec 2008 10:07:24 -0800
Chris Wright [EMAIL PROTECTED] wrote:

 * Alan Cox ([EMAIL PROTECTED]) wrote:
   + r = !memcmp(old_digest, sha1_item-sha1val, SHA1_DIGEST_SIZE);
   + mutex_unlock(sha1_lock);
   + if (r) {
   + char *old_addr, *new_addr;
   + old_addr = kmap_atomic(oldpage, KM_USER0);
   + new_addr = kmap_atomic(newpage, KM_USER1);
   + r = !memcmp(old_addr+PAGEHASH_LEN, new_addr+PAGEHASH_LEN,
   + PAGE_SIZE-PAGEHASH_LEN);
  
  NAK - this isn't guaranteed to be robust so you could end up merging
  different pages one provided by a malicious attacker.
 
 I presume you're referring to the digest comparison.  While there's
 theoretical concern of hash collision, it's mitigated by hmac(sha1)
 so the attacker can't brute force for known collisions.

Using current known techniques. A random collision is just as bad news.

This code simply isn't fit for the kernel.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-02 Thread Chris Wright
* Alan Cox ([EMAIL PROTECTED]) wrote:
 On Tue, 2 Dec 2008 10:07:24 -0800
 Chris Wright [EMAIL PROTECTED] wrote:
  * Alan Cox ([EMAIL PROTECTED]) wrote:
+   r = !memcmp(old_digest, sha1_item-sha1val, SHA1_DIGEST_SIZE);
+   mutex_unlock(sha1_lock);
+   if (r) {
+   char *old_addr, *new_addr;
+   old_addr = kmap_atomic(oldpage, KM_USER0);
+   new_addr = kmap_atomic(newpage, KM_USER1);
+   r = !memcmp(old_addr+PAGEHASH_LEN, 
new_addr+PAGEHASH_LEN,
+   PAGE_SIZE-PAGEHASH_LEN);
   
   NAK - this isn't guaranteed to be robust so you could end up merging
   different pages one provided by a malicious attacker.
  
  I presume you're referring to the digest comparison.  While there's
  theoretical concern of hash collision, it's mitigated by hmac(sha1)
  so the attacker can't brute force for known collisions.
 
 Using current known techniques. A random collision is just as bad news.

And, just to clarify, your concern would extend to any digest based
comparison?  Or are you specifically concerned about sha1?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-02 Thread Jonathan Corbet
On Tue, 2 Dec 2008 13:24:11 -0800
Chris Wright [EMAIL PROTECTED] wrote:

  Using current known techniques. A random collision is just as bad
  news.  
 
 And, just to clarify, your concern would extend to any digest based
 comparison?  Or are you specifically concerned about sha1?

Wouldn't this issue just go away if the code simply compared the full
pages, rather than skipping the hashed 128 bytes at the beginning?
Given the cost of this whole operation (which, it seems, can involve
copying one of the pages before testing for equality), skipping the
comparison of 128 bytes seems like a bit of a premature optimization.

jon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-12-02 Thread Alan Cox
On Tue, 2 Dec 2008 13:24:11 -0800
Chris Wright [EMAIL PROTECTED] wrote:

 * Alan Cox ([EMAIL PROTECTED]) wrote:
  On Tue, 2 Dec 2008 10:07:24 -0800
  Chris Wright [EMAIL PROTECTED] wrote:
   * Alan Cox ([EMAIL PROTECTED]) wrote:
 + r = !memcmp(old_digest, sha1_item-sha1val, SHA1_DIGEST_SIZE);
 + mutex_unlock(sha1_lock);
 + if (r) {
 + char *old_addr, *new_addr;
 + old_addr = kmap_atomic(oldpage, KM_USER0);
 + new_addr = kmap_atomic(newpage, KM_USER1);
 + r = !memcmp(old_addr+PAGEHASH_LEN, 
 new_addr+PAGEHASH_LEN,
 + PAGE_SIZE-PAGEHASH_LEN);

NAK - this isn't guaranteed to be robust so you could end up merging
different pages one provided by a malicious attacker.
   
   I presume you're referring to the digest comparison.  While there's
   theoretical concern of hash collision, it's mitigated by hmac(sha1)
   so the attacker can't brute force for known collisions.
  
  Using current known techniques. A random collision is just as bad news.
 
 And, just to clarify, your concern would extend to any digest based
 comparison?  Or are you specifically concerned about sha1?

Taken off list 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver.

2008-11-28 Thread Alan Cox
 + r = !memcmp(old_digest, sha1_item-sha1val, SHA1_DIGEST_SIZE);
 + mutex_unlock(sha1_lock);
 + if (r) {
 + char *old_addr, *new_addr;
 + old_addr = kmap_atomic(oldpage, KM_USER0);
 + new_addr = kmap_atomic(newpage, KM_USER1);
 + r = !memcmp(old_addr+PAGEHASH_LEN, new_addr+PAGEHASH_LEN,
 + PAGE_SIZE-PAGEHASH_LEN);

NAK - this isn't guaranteed to be robust so you could end up merging
different pages one provided by a malicious attacker.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-12 Thread Eric Rannaud
On Tue, 2008-11-11 at 17:40 -0500, [EMAIL PROTECTED] wrote: 
 On Tue, 11 Nov 2008 15:03:45 MST, Jonathan Corbet said:
 Seems reasonably sane to me - only doing the first 128 bytes rather than
 a full 4K page is some 32 times faster.  Yes, you'll have the *occasional*
 case where two pages were identical for 128 bytes but then differed, which is
 why there's buckets.  But the vast majority of the time, at least one bit
 will be different in the first part.

In the same spirit, computing a CRC32 instead of a SHA1 would probably
be faster as well (faster to compute, and faster to compare the
digests). The increased rate of collision should be negligible.

Also, the upcoming SSE4.2 (Core i7) has a CRC32 instruction. (Support is
already in the kernel: arch/x86/crypto/crc32c-intel.c)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Andrew Morton
On Tue, 11 Nov 2008 15:21:40 +0200
Izik Eidus [EMAIL PROTECTED] wrote:

 From: Izik Eidus [EMAIL PROTECTED]
 
 ksm is driver that allow merging identical pages between one or more
 applications in way unvisible to the application that use it.
 pages that are merged are marked as readonly and are COWed when any 
 application
 try to change them.
 
 ksm is working by walking over the memory pages of the applications it scan
 in order to find identical pages.
 it uses an hash table to find in effective way the identical pages.
 
 when ksm find two identical pages, it marked them as readonly and merge them
 into single one page,
 after the pages are marked as readonly and merged into one page, linux
 will treat this pages as normal copy_on_write pages and will fork them
 when write access will happen to them.
 
 ksm scan just memory areas that were registred to be scanned by it.
 

This driver apepars to implement a new userspace interface, in /dev/ksm

Please fully document that interface in the changelog so that we can
review your decisions here.  This is by far the most important
consideration - we can change all the code, but interfaces are for
ever.

 diff --git a/drivers/Kconfig b/drivers/Kconfig
 index d38f43f..c1c701f 100644
 --- a/drivers/Kconfig
 +++ b/drivers/Kconfig
 @@ -105,4 +105,9 @@ source drivers/uio/Kconfig
  source drivers/xen/Kconfig
  
  source drivers/staging/Kconfig
 +
 +config KSM
 + bool KSM driver support
 + help
 + ksm is driver for merging identical pages between applciations

s/is/is a/

applications is misspelt.

  endmenu
 diff --git a/include/linux/ksm.h b/include/linux/ksm.h
 new file mode 100644
 index 000..f873502
 --- /dev/null
 +++ b/include/linux/ksm.h
 @@ -0,0 +1,53 @@
 +#ifndef __LINUX_KSM_H
 +#define __LINUX_KSM_H
 +
 +/*
 + * Userspace interface for /dev/ksm - kvm shared memory
 + */
 +
 +#include asm/types.h
 +#include linux/ioctl.h
 +
 +#define KSM_API_VERSION 1
 +
 +/* for KSM_REGISTER_MEMORY_REGION */
 +struct ksm_memory_region {
 + __u32 npages; /* number of pages to share */
 + __u32 pad;
 + __u64 addr; /* the begining of the virtual address */
 +};
 +
 +struct ksm_user_scan {
 + __u32 pages_to_scan;
 + __u32 max_pages_to_merge;
 +};
 +
 +struct ksm_kthread_info {
 + __u32 sleep; /* number of microsecoends to sleep */
 + __u32 pages_to_scan; /* number of pages to scan */
 + __u32 max_pages_to_merge;
 + __u32 running;
 +};
 +
 +#define KSMIO 0xAB
 +
 +/* ioctls for /dev/ksm */
 +#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
 +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd 
 */
 +#define KSM_CREATE_SCAN  _IO(KSMIO,   0x02) /* return SCAN 
 fd */
 +#define KSM_START_STOP_KTHREAD_IOW(KSMIO,  0x03,\
 +   struct ksm_kthread_info)
 +#define KSM_GET_INFO_KTHREAD  _IOW(KSMIO,  0x04,\
 +   struct ksm_kthread_info) 
 +
 +
 +/* ioctls for SMA fds */
 +#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
 +   struct ksm_memory_region)
 +#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
 +
 +/* ioctls for SCAN fds */
 +#define KSM_SCAN _IOW(KSMIO,  0x40,\
 +   struct ksm_user_scan)
 +
 +#endif

uh-oh, ioctls.

Please do document that interface for us.

 diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
 index 26433ec..adc2435 100644
 --- a/include/linux/miscdevice.h
 +++ b/include/linux/miscdevice.h
 @@ -30,6 +30,7 @@
  #define TUN_MINOR 200
  #define  HPET_MINOR   228
  #define KVM_MINOR232
 +#define KSM_MINOR233
  
  struct device;
  
 diff --git a/mm/Kconfig b/mm/Kconfig
 index 5b5790f..e7f0061 100644
 --- a/mm/Kconfig
 +++ b/mm/Kconfig
 @@ -222,3 +222,6 @@ config UNEVICTABLE_LRU
  
  config MMU_NOTIFIER
   bool
 +
 +config KSM
 + bool
 diff --git a/mm/Makefile b/mm/Makefile
 index c06b45a..9722afe 100644
 --- a/mm/Makefile
 +++ b/mm/Makefile
 @@ -26,6 +26,7 @@ obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
  obj-$(CONFIG_TINY_SHMEM) += tiny-shmem.o
  obj-$(CONFIG_SLOB) += slob.o
  obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 +obj-$(CONFIG_KSM) += ksm.o
  obj-$(CONFIG_SLAB) += slab.o
  obj-$(CONFIG_SLUB) += slub.o
  obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 diff --git a/mm/ksm.c b/mm/ksm.c
 new file mode 100644
 index 000..977eb37
 --- /dev/null
 +++ b/mm/ksm.c
 @@ -0,0 +1,1202 @@
 +/*
 + * Memory merging driver for Linux
 + *
 + * This module enables dynamic sharing of identical pages found in different
 + * memory areas, even if they are not shared by fork()
 + *
 + * Copyright (C) 2008 Red Hat, Inc.
 + * Authors:
 + *   Izik Eidus
 + *   Andrea Arcangeli
 + *   Chris Wright
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.
 + 

Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Jonathan Corbet
I don't claim to begin to really understand the deep VM side of this
patch, but I can certainly pick nits as I work through it...sorry for
the lack of anything more substantive.

 +static struct list_head slots;

Some of these file-static variable names seem a little..terse...

 +#define PAGECMP_OFFSET 128
 +#define PAGEHASH_SIZE (PAGECMP_OFFSET ? PAGECMP_OFFSET : PAGE_SIZE)
 +/* hash the page */
 +static void page_hash(struct page *page, unsigned char *digest)

So is this really saying that you only hash the first 128 bytes, relying on
full compares for the rest?  I assume there's a perfectly good reason for
doing it that way, but it's not clear to me from reading the code.  Do you
gain performance which is not subsequently lost in the (presumably) higher
number of hash collisions?

 +static int ksm_scan_start(struct ksm_scan *ksm_scan, int scan_npages,
 +   int max_npages)
 +{
 + struct ksm_mem_slot *slot;
 + struct page *page[1];
 + int val;
 + int ret = 0;
 +
 + down_read(slots_lock);
 +
 + scan_update_old_index(ksm_scan);
 +
 + while (scan_npages  0  max_npages  0) {

Should this loop maybe check kthread_run too?  It seems like you could loop
for a long time after kthread_run has been set to zero.

 +static int ksm_dev_open(struct inode *inode, struct file *filp)
 +{
 + try_module_get(THIS_MODULE);
 + return 0;
 +}
 +
 +static int ksm_dev_release(struct inode *inode, struct file *filp)
 +{
 + module_put(THIS_MODULE);
 + return 0;
 +}
 +
 +static struct file_operations ksm_chardev_ops = {
 + .open   = ksm_dev_open,
 + .release= ksm_dev_release,
 + .unlocked_ioctl = ksm_dev_ioctl,
 + .compat_ioctl   = ksm_dev_ioctl,
 +};

Why do you roll your own module reference counting?  Is there a reason you
don't just set .owner and let the VFS handle it?

Given that the KSM_REGISTER_MEMORY_REGION ioctl() creates unswappable
memory, should there be some sort of capability check done there?  A check
for starting/stopping the thread might also make sense.  Or is that
expected to be handled via permissions on /dev/ksm?

Actually, it occurs to me that there's no sanity checks on any of the
values passed in by ioctl().  What happens if the user tells KSM to scan a
bogus range of memory?

Any benchmarks on the runtime cost of having KSM running?

jon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Andrea Arcangeli
On Tue, Nov 11, 2008 at 12:38:51PM -0800, Andrew Morton wrote:
 Please fully document that interface in the changelog so that we can
 review your decisions here.  This is by far the most important
 consideration - we can change all the code, but interfaces are for
 ever.

Yes, this is the most important point in my view. Even after we make
the ksm pages swappable it'll remain an invisible change to anybody
but us (it'll work better under VM pressure, but that's about it).

 uh-oh, ioctls.

Yes, it's all ioctl based. In very short, it assigns the task and
memory region to scan, and allows to start/stop the kernel thread that
does the scan while selecting how many pages to execute per scan and
how many scans to execute per second. The more pages per scan and the
more scans per second, the higher cpu utilization of the kernel thread.

It would also be possible to submit ksm in a way that has no API at
all (besides kernel module params tunable later by sysfs to set
pages-per-scan and scan-frequency). Doing that would allow us to defer
the API decisions. But then all anonymous memory in the system will be
scanned unconditionally even if there may be little to share for
certain tasks. It would perform quite well, usually the sharable part
is the largest part so the rest wouldn't generate an huge amount of
cpu waste. There's some ram waste too, as some memory has to be
allocated for every page we want to possibly share.

In some ways removing the API would make it simpler to use for non
virtualization environments where they may want to enable it
system-wide.

 ooh, a comment!

8)

  +   kpage = alloc_page(GFP_KERNEL |  __GFP_HIGHMEM);
 
 Stray whitepace.
 
 Replace with GFP_HIGHUSER.

So not a cleanup, but an improvement (I agree highuser is better, this
isn't by far any higher-priority kernel alloc and it deserves to have
lower prio in the watermarks).

 The term shared memory has specific meanings in Linux/Unix.  I'm
 suspecting that its use here was inappropriate but I have insufficient
 information to be able to suggest alternatives.

We could call it create_shared_anonymous_memory but then what if it
ever becomes capable of sharing pagecache too? (I doubt it will, not
in the short term at least ;)

Usually when we do these kind of tricks we use the word cloning, so
perhaps also create_cloned_memory_area, so you can later say cloned
anonymous memory or cloned shared memory page instead of just KSM
page. But then this module would become KCM and not KSM 8). Perhaps we
should just go recursive and use create_ksm_memory_area.

 Generally: this review was rather a waste of your time and of mine
 because the code is inadequately documented.

Well, this was a great review considering how little the code was
documented, definitely not a waste of time on our end, it was very
helpful and the good thing is I don't see any controversial stuff.

The two inner loops are the core of the ksm scan, and they're aren't
very simple I've to say. Documenting won't be trivial but it's surely
possible, Izik already working on it I think! Apologies if any time
was wasted on your side!!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Izik Eidus

Jonathan Corbet wrote:

I don't claim to begin to really understand the deep VM side of this
patch, but I can certainly pick nits as I work through it...sorry for
the lack of anything more substantive.

  

+static struct list_head slots;



Some of these file-static variable names seem a little..terse...

  

+#define PAGECMP_OFFSET 128
+#define PAGEHASH_SIZE (PAGECMP_OFFSET ? PAGECMP_OFFSET : PAGE_SIZE)
+/* hash the page */
+static void page_hash(struct page *page, unsigned char *digest)



So is this really saying that you only hash the first 128 bytes, relying on
full compares for the rest?  I assume there's a perfectly good reason for
doing it that way, but it's not clear to me from reading the code.  Do you
gain performance which is not subsequently lost in the (presumably) higher
number of hash collisions?

  

+static int ksm_scan_start(struct ksm_scan *ksm_scan, int scan_npages,
+ int max_npages)
+{
+   struct ksm_mem_slot *slot;
+   struct page *page[1];
+   int val;
+   int ret = 0;
+
+   down_read(slots_lock);
+
+   scan_update_old_index(ksm_scan);
+
+   while (scan_npages  0  max_npages  0) {



Should this loop maybe check kthread_run too?  It seems like you could loop
for a long time after kthread_run has been set to zero.

  

+static int ksm_dev_open(struct inode *inode, struct file *filp)
+{
+   try_module_get(THIS_MODULE);
+   return 0;
+}
+
+static int ksm_dev_release(struct inode *inode, struct file *filp)
+{
+   module_put(THIS_MODULE);
+   return 0;
+}
+
+static struct file_operations ksm_chardev_ops = {
+   .open   = ksm_dev_open,
+   .release= ksm_dev_release,
+   .unlocked_ioctl = ksm_dev_ioctl,
+   .compat_ioctl   = ksm_dev_ioctl,
+};



Why do you roll your own module reference counting?  Is there a reason you
don't just set .owner and let the VFS handle it?
  


Yes, I am taking get_task_mm() if the module will be removed before i 
free the mms, things will go wrong



Given that the KSM_REGISTER_MEMORY_REGION ioctl() creates unswappable
memory, should there be some sort of capability check done there?  A check
for starting/stopping the thread might also make sense.  Or is that
expected to be handled via permissions on /dev/ksm?
  


Well, I think giving premmision to /dev/ksm default as root is enough.
No?


Actually, it occurs to me that there's no sanity checks on any of the
values passed in by ioctl().  What happens if the user tells KSM to scan a
bogus range of memory?
  


Well get_user_pages() run in context of the process, therefore it should 
fail in bogus range of memory



Any benchmarks on the runtime cost of having KSM running?
  


This one is problematic, ksm can take anything from 0% to 100% cpu
its all depend on how fast you run it.
it have 3 parameters:
number of pages to scan before it go to sleep
maximum number of pages to merge while we scanning the above pages 
(merging is expensive)
time to sleep (when runing from userspace using /dev/ksm, we actually do 
it there (userspace)



jon
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Jonathan Corbet
On Wed, 12 Nov 2008 00:17:39 +0200
Izik Eidus [EMAIL PROTECTED] wrote:

  +static int ksm_dev_open(struct inode *inode, struct file *filp)
  +{
  +  try_module_get(THIS_MODULE);
  +  return 0;
  +}
  +
  +static int ksm_dev_release(struct inode *inode, struct file *filp)
  +{
  +  module_put(THIS_MODULE);
  +  return 0;
  +}
  +
  +static struct file_operations ksm_chardev_ops = {
  +  .open   = ksm_dev_open,
  +  .release= ksm_dev_release,
  +  .unlocked_ioctl = ksm_dev_ioctl,
  +  .compat_ioctl   = ksm_dev_ioctl,
  +};

 
  Why do you roll your own module reference counting?  Is there a
  reason you don't just set .owner and let the VFS handle it?
  
 
 Yes, I am taking get_task_mm() if the module will be removed before i 
 free the mms, things will go wrong

But...if you set .owner, the VFS will do the try_module_get() *before*
calling into your module (as an added bonus, it will actually check the
return value too).  All you've succeeded in doing here is adding a
microscopic race to the module reference counting; otherwise the end
result is the same.

jon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Izik Eidus

Jonathan Corbet wrote:

On Wed, 12 Nov 2008 00:17:39 +0200
Izik Eidus [EMAIL PROTECTED] wrote:

  

+static int ksm_dev_open(struct inode *inode, struct file *filp)
+{
+   try_module_get(THIS_MODULE);
+   return 0;
+}
+
+static int ksm_dev_release(struct inode *inode, struct file *filp)
+{
+   module_put(THIS_MODULE);
+   return 0;
+}
+
+static struct file_operations ksm_chardev_ops = {
+   .open   = ksm_dev_open,
+   .release= ksm_dev_release,
+   .unlocked_ioctl = ksm_dev_ioctl,
+   .compat_ioctl   = ksm_dev_ioctl,
+};
  


Why do you roll your own module reference counting?  Is there a
reason you don't just set .owner and let the VFS handle it?

  
Yes, I am taking get_task_mm() if the module will be removed before i 
free the mms, things will go wrong



But...if you set .owner, the VFS will do the try_module_get() *before*
calling into your module (as an added bonus, it will actually check the
return value too).  

Ohhh i see what you mean
you are right i had at least needed to check for the return value of 
try_module_get(),

anyway will check this issue for V2.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Izik Eidus

Jonathan Corbet wrote:

[Let's see if I can get through the rest without premature sends...]

On Wed, 12 Nov 2008 00:17:39 +0200
Izik Eidus [EMAIL PROTECTED] wrote:

  

Actually, it occurs to me that there's no sanity checks on any of
the values passed in by ioctl().  What happens if the user tells
KSM to scan a bogus range of memory?

  

Well get_user_pages() run in context of the process, therefore it
should fail in bogus range of memory



But it will fail in a totally silent and mysterious way.  Doesn't it
seem better to verify the values when you can return a meaningful error
code to the caller?

  


Well I dont mind insert it (the above for sure is not a bug)
but even with that, the user can still free the memory that he gave to us
so this check if nice to have check, we have nothing to do but to relay on
get_user_pages return value :)


The other ioctl() calls have the same issue; you can start the thread
with nonsensical values for the number of pages to scan and the sleep
time.
  


well about this i agree, here it make alot of logic to check the values!



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Valdis . Kletnieks
On Tue, 11 Nov 2008 15:03:45 MST, Jonathan Corbet said:

  +#define PAGECMP_OFFSET 128
  +#define PAGEHASH_SIZE (PAGECMP_OFFSET ? PAGECMP_OFFSET : PAGE_SIZE)
  +/* hash the page */
  +static void page_hash(struct page *page, unsigned char *digest)
 
 So is this really saying that you only hash the first 128 bytes, relying on
 full compares for the rest?  I assume there's a perfectly good reason for
 doing it that way, but it's not clear to me from reading the code.  Do you
 gain performance which is not subsequently lost in the (presumably) higher
 number of hash collisions?

Seems reasonably sane to me - only doing the first 128 bytes rather than
a full 4K page is some 32 times faster.  Yes, you'll have the *occasional*
case where two pages were identical for 128 bytes but then differed, which is
why there's buckets.  But the vast majority of the time, at least one bit
will be different in the first part.

In fact, I'd not be surprised if only going for 64 bytes works well...


pgpfDEA8at6Sv.pgp
Description: PGP signature


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Avi Kivity

Jonathan Corbet wrote:

+static struct list_head slots;



Some of these file-static variable names seem a little..terse...
  


While ksm was written to be independent of a certain TLA-named kernel 
subsystem developed two rooms away, they share some naming... this 
refers to kvm 'memory slots' which correspond to DIMM banks.


I guess it should be renamed to merge_ranges or something.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Avi Kivity

Izik Eidus wrote:

Any benchmarks on the runtime cost of having KSM running?
  


This one is problematic, ksm can take anything from 0% to 100% cpu
its all depend on how fast you run it.
it have 3 parameters:
number of pages to scan before it go to sleep
maximum number of pages to merge while we scanning the above pages 
(merging is expensive)
time to sleep (when runing from userspace using /dev/ksm, we actually 
do it there (userspace)


The scan process priority also has its effect.  One strategy would be to 
run it at idle priority as long as you have enough free memory, and 
increase the priority as memory starts depleting.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Andrea Arcangeli
Hi Jonathan,

On Tue, Nov 11, 2008 at 03:30:28PM -0700, Jonathan Corbet wrote:
 But it will fail in a totally silent and mysterious way.  Doesn't it
 seem better to verify the values when you can return a meaningful error
 code to the caller?

I think you're right, but just because find_extend_vma will have the
effect of growing the kernel stack down. We clearly don't set it on a
stack with KVM as there's nothing to share on the stack usually - we
only set it in the guest physical memory range. And things are safe
regardless as get_user_pages is verifying the values for us. Problem
is it's using find_extend_vma because it behaves like a page fault. We
must not behave like a pagefault, we're much closer to follow_page
only than a page fault. Not a big deal, but it can be improved by
avoiding to extend the stack somehow (likely simplest is to call
find_vma twice, first time externally, we hold mmap_sem externally so
all right).

 What about things like cache effects from scanning all those pages?  My
 guess is that, if you're trying to run dozens of Windows guests, cache
 usage is not at the top of your list of concerns, but I could be
 wrong.  Usually am...

Oh that's not an issue. This is all about trading some CPU for lots of
free memory. It pays off big as so many more VM can run. With desktop
virtualization and 1G systems, you reach a RAM bottleneck much quicker
than a CPU bottleneck. Perhaps not so quick on server virtualization
but the point is this is intentional. It may be possible to compute
the jhash (that's where the cpu is spent) with instructions that don't
pollute the cpu cache but I doubt it's going to make much an huge
difference.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] add ksm kernel shared memory driver

2008-11-11 Thread Izik Eidus

Jonathan Corbet wrote:


What about things like cache effects from scanning all those pages?  My
guess is that, if you're trying to run dozens of Windows guests, cache
usage is not at the top of your list of concerns, but I could be
wrong.  Usually am...
  


Ok, ksm does make the cache of the cpu dirty when scanning the pages
(but the scanning happen slowly slowly and cache usually get dirty much 
faster)
But infact it improve the cache by the fact that it make many ptes point 
to the same page
so if before we had 12 process touching 12 diffrent physical page they 
would dirty the page

but now they will touch just one...

so i guess it depend on how you see it...


Thanks,

jon
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html