pci device assignment and mm, KSM.

2012-07-04 Thread Kamezawa Hiroyuki

I'm sorry if my understanding is incorrect. Here are some topics on
pci passthrough to guests.

When pci passthrough is used with kvm, guest's all memory are pinned by extra
reference count of get_page(). That pinned pages are never be reclaimable and
movable by migration and cannot be merged by KSM.

Now, the information that 'the page is pinned by kvm' is just represented by
page_count(). So, there are following problems.

a) pages are on ANON_LRU. So, try_to_free_page() and kswapd will scan XX GB of
   pages hopelessly.

b) KSM cannot recognize the pages in its early stage. So, it breaks transparent
   huge page mapped by kvm into small pages. But it fails to merge them finally,
   because of raised page_count(). So, all hugepages are split without any
   benefits.

2 ideas for fixing this

for a) I guess the pages should go to UNEVICTABLE list. But it's not mlocked.
   I think we use PagePinned() instread of it and move pages to UNEVICTABLE 
list.
   Then, kswapd etc will ignore pinned pages.

for b) At first, I thought qemu should call madvise(MADV_UNMERGEABLE). But I 
think
   kernel may be able to handle situation with an extra check, PagePinned() 
or
   checking a flag in mm_struct. Should we avoid this in userland or kernel 
?

BTW, I think pinned pages cannot be freed until the kvm process exits. Is it 
right ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-30 Thread KAMEZAWA Hiroyuki
On Fri, 28 Jan 2011 09:20:02 -0600 (CST)
Christoph Lameter c...@linux.com wrote:

 On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote:
 
I see it as a tradeoff of when to check? add_to_page_cache or when we
are want more free memory (due to allocation). It is OK to wakeup
kswapd while allocating memory, somehow for this purpose (global page
cache), add_to_page_cache or add_to_page_cache_locked does not seem
the right place to hook into. I'd be open to comments/suggestions
though from others as well.
 
  I don't like add hook here.
  AND I don't want to run kswapd because 'kswapd' has been a sign as
  there are memory shortage. (reusing code is ok.)
 
  How about adding new daemon ? Recently, khugepaged, ksmd works for
  managing memory. Adding one more daemon for special purpose is not
  very bad, I think. Then, you can do
   - wake up without hook
   - throttle its work.
   - balance the whole system rather than zone.
 I think per-node balance is enough...
 
 
 I think we already have enough kernel daemons floating around. They are
 multiplying in an amazing way. What would be useful is to map all
 the memory management background stuff into a process. May call this memd
 instead? Perhaps we can fold khugepaged into kswapd as well etc.
 

Making kswapd slow for whis additional, requested by user, not by system
work is good thing ? I think workqueue works enough well, it's scale based on
workloads, if using thread is bad.

Thanks,
-Kame




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread KAMEZAWA Hiroyuki
On Fri, 28 Jan 2011 16:24:19 +0900
Minchan Kim minchan@gmail.com wrote:

 On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh bal...@linux.vnet.ibm.com 
 wrote:
  * MinChan Kim minchan@gmail.com [2011-01-28 14:44:50]:
 
  On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh
  bal...@linux.vnet.ibm.com wrote:
   On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim minchan@gmail.com 
   wrote:
   [snip]
  
   index 7b56473..2ac8549 100644
   --- a/mm/page_alloc.c
   +++ b/mm/page_alloc.c
   @@ -1660,6 +1660,9 @@ zonelist_scan:
                          unsigned long mark;
                          int ret;
  
   +                       if (should_reclaim_unmapped_pages(zone))
   +                               wakeup_kswapd(zone, order, 
   classzone_idx);
   +
  
   Do we really need the check in fastpath?
   There are lost of caller of alloc_pages.
   Many of them are not related to mapped pages.
   Could we move the check into add_to_page_cache_locked?
  
   The check is a simple check to see if the unmapped pages need
   balancing, the reason I placed this check here is to allow other
   allocations to benefit as well, if there are some unmapped pages to be
   freed. add_to_page_cache_locked (check under a critical section) is
   even worse, IMHO.
 
  It just moves the overhead from general into specific case(ie,
  allocates page for just page cache).
  Another cases(ie, allocates pages for other purpose except page cache,
  ex device drivers or fs allocation for internal using) aren't
  affected.
  So, It would be better.
 
  The goal in this patch is to remove only page cache page, isn't it?
  So I think we could the balance check in add_to_page_cache and trigger 
  reclaim.
  If we do so, what's the problem?
 
 
  I see it as a tradeoff of when to check? add_to_page_cache or when we
  are want more free memory (due to allocation). It is OK to wakeup
  kswapd while allocating memory, somehow for this purpose (global page
  cache), add_to_page_cache or add_to_page_cache_locked does not seem
  the right place to hook into. I'd be open to comments/suggestions
  though from others as well.

I don't like add hook here.
AND I don't want to run kswapd because 'kswapd' has been a sign as
there are memory shortage. (reusing code is ok.)

How about adding new daemon ? Recently, khugepaged, ksmd works for
managing memory. Adding one more daemon for special purpose is not
very bad, I think. Then, you can do
 - wake up without hook
 - throttle its work.
 - balance the whole system rather than zone.
   I think per-node balance is enough...








 
  
  
  
                          mark = zone-watermark[alloc_flags  
   ALLOC_WMARK_MASK];
                          if (zone_watermark_ok(zone, order, mark,
                                      classzone_idx, alloc_flags))
   @@ -4167,8 +4170,12 @@ static void __paginginit 
   free_area_init_core(struct pglist_data *pgdat,
  
                  zone-spanned_pages = size;
                  zone-present_pages = realsize;
   +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
                  zone-min_unmapped_pages = 
   (realsize*sysctl_min_unmapped_ratio)
                                                  / 100;
   +               zone-max_unmapped_pages = 
   (realsize*sysctl_max_unmapped_ratio)
   +                                               / 100;
   +#endif
    #ifdef CONFIG_NUMA
                  zone-node = nid;
                  zone-min_slab_pages = (realsize * 
   sysctl_min_slab_ratio) / 100;
   @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table 
   *table, int write,
          return 0;
    }
  
   +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
    int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int 
   write,
          void __user *buffer, size_t *length, loff_t *ppos)
    {
   @@ -5100,6 +5108,23 @@ int 
   sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
          return 0;
    }
  
   +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int 
   write,
   +       void __user *buffer, size_t *length, loff_t *ppos)
   +{
   +       struct zone *zone;
   +       int rc;
   +
   +       rc = proc_dointvec_minmax(table, write, buffer, length, ppos);
   +       if (rc)
   +               return rc;
   +
   +       for_each_zone(zone)
   +               zone-max_unmapped_pages = (zone-present_pages *
   +                               sysctl_max_unmapped_ratio) / 100;
   +       return 0;
   +}
   +#endif
   +
    #ifdef CONFIG_NUMA
    int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
          void __user *buffer, size_t *length, loff_t *ppos)
   diff --git a/mm/vmscan.c b/mm/vmscan.c
   index 02cc82e..6377411 100644
   --- a/mm/vmscan.c
   +++ b/mm/vmscan.c
   @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
    #define scanning_global_lru(sc)        (1)
    #endif
  
   +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
   +static 

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread KAMEZAWA Hiroyuki
On Fri, 28 Jan 2011 13:49:28 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 * KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2011-01-28 16:56:05]:
 
  BTW, it seems this doesn't work when some apps use huge shmem.
  How to handle the issue ?
 
 
 Could you elaborate further? 
 
==
static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
{
unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
zone_page_state(zone, NR_ACTIVE_FILE);

/*
 * It's possible for there to be more file mapped pages than
 * accounted for by the pages on the file LRU lists because
 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
 */
return (file_lru  file_mapped) ? (file_lru - file_mapped) : 0;
}
==

Did you read ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cgroup limits only affect kvm guest under certain conditions

2011-01-06 Thread KAMEZAWA Hiroyuki
On Thu, 06 Jan 2011 14:15:37 +0100
Dominik Klein d...@in-telegence.net wrote:

 Hi
 
 I am playing with cgroups and try to limit block io for guests.
 
 The proof of concept is:
 
 # mkdir /dev/cgroup/blkio
 # mount -t cgroup -o blkio blkio /dev/cgroup/blkio/
 # cd blkio/
 # mkdir test
 # cd test/
 # ls -l /dev/vdisks/kirk
 lrwxrwxrwx 1 root root 7 2011-01-06 13:46 /dev/vdisks/kirk - ../dm-5
 # ls -l /dev/dm-5
 brw-rw 1 root disk 253, 5 2011-01-06 13:36 /dev/dm-5
 # echo 253:5  1048576  blkio.throttle.write_bps_device
 # echo $$  tasks
 # dd if=/dev/zero of=/dev/dm-5 bs=1M count=20
 20+0 records in
 20+0 records out
 20971520 bytes (21 MB) copied, 20.0223 s, 1.0 MB/s
 
 So limit applies to the dd child of my shell.
 
 Now I assign /dev/dm-5 (/dev/vdisks/kirk) to a vm and echo the qemu-kvm
 pid into tasks. Limits are not applied, the guest can happily use max io
 bandwidth.
 

qemu consists of several threads. Cgroup works per thread now.
Could you double check all threads for qemu are in a cgroup ?
I think you have to write all thread-ID to tasks file when you
move qemu after starting it.

Thanks,
-Kame


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages

2010-12-01 Thread KAMEZAWA Hiroyuki
On Thu,  2 Dec 2010 10:22:16 +0900 (JST)
KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com wrote:

  On Tue, 30 Nov 2010, Andrew Morton wrote:
  
+#define UNMAPPED_PAGE_RATIO 16
  
   Well.  Giving 16 a name didn't really clarify anything.  Attentive
   readers will want to know what this does, why 16 was chosen and what
   the effects of changing it will be.
  
  The meaning is analoguous to the other zone reclaim ratio. But yes it
  should be justified and defined.
  
Reviewed-by: Christoph Lameter c...@linux.com
  
   So you're OK with shoving all this flotsam into 100,000,000 cellphones?
   This was a pretty outrageous patchset!
  
  This is a feature that has been requested over and over for years. Using
  /proc/vm/drop_caches for fixing situations where one simply has too many
  page cache pages is not so much fun in the long run.
 
 I'm not against page cache limitation feature at all. But, this is
 too ugly and too destructive fast path. I hope this patch reduce negative
 impact more.
 

And I think min_mapped_unmapped_pages is ugly. It should be
unmapped_pagecache_limit or some because it's for limitation feature.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Refactor zone_reclaim

2010-11-30 Thread KAMEZAWA Hiroyuki
On Tue, 30 Nov 2010 15:45:55 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 Refactor zone_reclaim, move reusable functionality outside
 of zone_reclaim. Make zone_reclaim_unmapped_pages modular
 
 Signed-off-by: Balbir Singh bal...@linux.vnet.ibm.com

Why is this min_mapped_pages based on zone (IOW, per-zone) ?


Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages

2010-11-30 Thread KAMEZAWA Hiroyuki
On Tue, 30 Nov 2010 15:46:31 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 Provide control using zone_reclaim() and a boot parameter. The
 code reuses functionality from zone_reclaim() to isolate unmapped
 pages and reclaim them as a priority, ahead of other mapped pages.
 
 Signed-off-by: Balbir Singh bal...@linux.vnet.ibm.com
 ---
  include/linux/swap.h |5 ++-
  mm/page_alloc.c  |7 +++--
  mm/vmscan.c  |   72 
 +-
  3 files changed, 79 insertions(+), 5 deletions(-)
 
 diff --git a/include/linux/swap.h b/include/linux/swap.h
 index eba53e7..78b0830 100644
 --- a/include/linux/swap.h
 +++ b/include/linux/swap.h
 @@ -252,11 +252,12 @@ extern int vm_swappiness;
  extern int remove_mapping(struct address_space *mapping, struct page *page);
  extern long vm_total_pages;
  
 -#ifdef CONFIG_NUMA
 -extern int zone_reclaim_mode;
  extern int sysctl_min_unmapped_ratio;
  extern int sysctl_min_slab_ratio;
  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 +extern bool should_balance_unmapped_pages(struct zone *zone);
 +#ifdef CONFIG_NUMA
 +extern int zone_reclaim_mode;
  #else
  #define zone_reclaim_mode 0
  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int 
 order)
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 62b7280..4228da3 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -1662,6 +1662,9 @@ zonelist_scan:
   unsigned long mark;
   int ret;
  
 + if (should_balance_unmapped_pages(zone))
 + wakeup_kswapd(zone, order);
 +

Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
can't be called directly via balloon driver just before alloc_page() ?

Do you need to keep page caches small even when there are free memory on host ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages

2010-11-30 Thread KAMEZAWA Hiroyuki
On Wed, 1 Dec 2010 10:52:59 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 * Balbir Singh bal...@linux.vnet.ibm.com [2010-12-01 10:48:16]:
 
  * KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2010-12-01 10:32:54]:
  
   On Tue, 30 Nov 2010 15:46:31 +0530
   Balbir Singh bal...@linux.vnet.ibm.com wrote:
   
Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.

Signed-off-by: Balbir Singh bal...@linux.vnet.ibm.com
---
 include/linux/swap.h |5 ++-
 mm/page_alloc.c  |7 +++--
 mm/vmscan.c  |   72 
+-
 3 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index eba53e7..78b0830 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -252,11 +252,12 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page 
*page);
 extern long vm_total_pages;
 
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+extern bool should_balance_unmapped_pages(struct zone *zone);
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned 
int order)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62b7280..4228da3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1662,6 +1662,9 @@ zonelist_scan:
unsigned long mark;
int ret;
 
+   if (should_balance_unmapped_pages(zone))
+   wakeup_kswapd(zone, order);
+
   
   Hm, I'm not sure the final vision of this feature. Does this reclaiming 
   feature
   can't be called directly via balloon driver just before alloc_page() ?
  
  
  That is a separate patch, this is a boot paramter based control
  approach.
   
   Do you need to keep page caches small even when there are free memory on 
   host ?
  
  
  The goal is to avoid duplication, as you know page cache fills itself
  to consume as much memory as possible. The host generally does not
  have a lot of free memory in a consolidated environment. 
 

That's a point. Then, why the guest has to do _extra_ work for host even when
the host says nothing ? I think trigger this by guests themselves is not very 
good.

Thanks,
-Kame



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages

2010-11-30 Thread KAMEZAWA Hiroyuki
On Wed, 1 Dec 2010 12:10:43 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

  That's a point. Then, why the guest has to do _extra_ work for host even 
  when
  the host says nothing ? I think trigger this by guests themselves is not 
  very good.
 
 I've mentioned it before, the guest keeping free memory without a
 large performance hit, helps, the balloon driver is able to quickly
 retrieve this memory if required or the guest can use this memory for
 some other application/task. 


 The cached data is mostly already present in the host page cache.

Why ? Are there parameters/stats which shows this is _true_ ? How we can
guarantee/show it to users ?
Please add an interface to show shared rate between guest/host If not,
any admin will not turn this on because file cache status on host is a
black box for guest admins. I think this patch skips something important steps.

2nd point is maybe for reducing total host memory usage and for increasing
the number of guests on a host. For that, this feature is useful only when all 
guests
on a host are friendly and devoted to the health of host memory management 
because
all setting must be done in the guest. This can be passed as even by qemu's 
command line
argument. And _no_ benefit for the guests who reduce it's resource to help
host management because there is no guarantee dropped caches are on host memory.


So, for both claim, I want to see an interface to show the number of shared 
pages
between hosts and guests rather than imagine it.

BTW, I don't like this kind of please give us your victim, please please 
please
logic. The host should be able to steal what it wants in force.
Then, I think there should be no On/Off visible interfaces. The vm firmware
should tell to turn on this if administrator of the host wants.

BTW2, please test with some other benchmarks (which read file caches.)
I don't think kernel make is good test for this.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control

2010-06-14 Thread KAMEZAWA Hiroyuki
On Mon, 14 Jun 2010 12:19:55 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:
  - Why don't you believe LRU ? And if LRU doesn't work well, should it be
fixed by a knob rather than generic approach ?
  - No side effects ?
 
 I believe in LRU, just that the problem I am trying to solve is of
 using double the memory for caching the same data (consider kvm
 running in cache=writethrough or writeback mode, both the hypervisor
 and the guest OS maintain a page cache of the same data). As the VM's
 grow the overhead is substantial. In my runs I found upto 60%
 duplication in some cases.
 
 
 - Linux vm guys tend to say, free memory is bad memory. ok, for what
   free memory created by your patch is used ? IOW, I can't see the benefit.
   If free memory that your patch created will be used for another page-cache,
   it will be dropped soon by your patch itself.
 
 Free memory is good for cases when you want to do more in the same
 system. I agree that in a bare metail environment that might be
 partially true. I don't have a problem with frequently used data being
 cached, but I am targetting a consolidated environment at the moment.
 Moreover, the administrator has control via a boot option, so it is
 non-instrusive in many ways.

It sounds that what you want is to improve performance etc. but to make it
easy sizing the system and to help admins. Right ?

From performance perspective, I don't see any advantage to drop caches
which can be dropped easily. I just use cpus for the purpose it may no
be necessary.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control

2010-06-14 Thread KAMEZAWA Hiroyuki
On Mon, 14 Jun 2010 13:06:46 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:
 
  It sounds that what you want is to improve performance etc. but to make it
  easy sizing the system and to help admins. Right ?
 
 
 Right, to allow freeing up of using double the memory to cache data.
  
Oh, sorry. ask again..

It sounds that what you want is _not_ to improve performance etc. but to make it
...

?

-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] Linux/Guest unmapped page cache control

2010-06-13 Thread KAMEZAWA Hiroyuki
On Mon, 14 Jun 2010 00:01:45 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 * Balbir Singh bal...@linux.vnet.ibm.com [2010-06-08 21:21:46]:
 
  Selectively control Unmapped Page Cache (nospam version)
  
  From: Balbir Singh bal...@linux.vnet.ibm.com
  
  This patch implements unmapped page cache control via preferred
  page cache reclaim. The current patch hooks into kswapd and reclaims
  page cache if the user has requested for unmapped page control.
  This is useful in the following scenario
  
  - In a virtualized environment with cache=writethrough, we see
double caching - (one in the host and one in the guest). As
we try to scale guests, cache usage across the system grows.
The goal of this patch is to reclaim page cache when Linux is running
as a guest and get the host to hold the page cache and manage it.
There might be temporary duplication, but in the long run, memory
in the guests would be used for mapped pages.
  - The option is controlled via a boot option and the administrator
can selectively turn it on, on a need to use basis.
  
  A lot of the code is borrowed from zone_reclaim_mode logic for
  __zone_reclaim(). One might argue that the with ballooning and
  KSM this feature is not very useful, but even with ballooning,
  we need extra logic to balloon multiple VM machines and it is hard
  to figure out the correct amount of memory to balloon. With these
  patches applied, each guest has a sufficient amount of free memory
  available, that can be easily seen and reclaimed by the balloon driver.
  The additional memory in the guest can be reused for additional
  applications or used to start additional guests/balance memory in
  the host.
  
  KSM currently does not de-duplicate host and guest page cache. The goal
  of this patch is to help automatically balance unmapped page cache when
  instructed to do so.
  
  There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
  and the number of pages to reclaim when unmapped_page_control argument
  is supplied. These numbers were chosen to avoid aggressiveness in
  reaping page cache ever so frequently, at the same time providing control.
  
  The sysctl for min_unmapped_ratio provides further control from
  within the guest on the amount of unmapped pages to reclaim.
 
 
 Are there any major objections to this patch?
  

This kind of patch needs how it works well measurement.

- How did you measure the effect of the patch ? kernbench is not enough, of 
course.
- Why don't you believe LRU ? And if LRU doesn't work well, should it be
  fixed by a knob rather than generic approach ?
- No side effects ?

- Linux vm guys tend to say, free memory is bad memory. ok, for what
  free memory created by your patch is used ? IOW, I can't see the benefit.
  If free memory that your patch created will be used for another page-cache,
  it will be dropped soon by your patch itself.

  If your patch just drops duplicated, but no more necessary for other kvm,
  I agree your patch may increase available size of page-caches. But you just
  drops unmapped pages.
  Hmm.

Thanks,
-Kame
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 Jun 2010 17:07:32 -0700
Dave Hansen d...@linux.vnet.ibm.com wrote:

 On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
   I'm not sure victimizing unmapped cache pages is a good idea.
   Shouldn't page selection use the LRU for recency information instead
   of the cost of guest reclaim?  Dropping a frequently used unmapped
   cache page can be more expensive than dropping an unused text page
   that was loaded as part of some executable's initialization and
   forgotten.
  
  We victimize the unmapped cache only if it is unused (in LRU order).
  We don't force the issue too much. We also have free slab cache to go
  after.
 
 Just to be clear, let's say we have a mapped page (say of /sbin/init)
 that's been unreferenced since _just_ after the system booted.  We also
 have an unmapped page cache page of a file often used at runtime, say
 one from /etc/resolv.conf or /etc/passwd.
 

Hmm. I'm not fan of estimating working set size by calculation
based on some numbers without considering history or feedback.

Can't we use some kind of feedback algorithm as hi-low-watermark, random walk
or GA (or somehing more smart) to detect the size ?

Thanks,
-Kame




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread KAMEZAWA Hiroyuki
On Fri, 11 Jun 2010 10:16:32 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 * KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com [2010-06-11 10:54:41]:
 
  On Thu, 10 Jun 2010 17:07:32 -0700
  Dave Hansen d...@linux.vnet.ibm.com wrote:
  
   On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
 I'm not sure victimizing unmapped cache pages is a good idea.
 Shouldn't page selection use the LRU for recency information instead
 of the cost of guest reclaim?  Dropping a frequently used unmapped
 cache page can be more expensive than dropping an unused text page
 that was loaded as part of some executable's initialization and
 forgotten.

We victimize the unmapped cache only if it is unused (in LRU order).
We don't force the issue too much. We also have free slab cache to go
after.
   
   Just to be clear, let's say we have a mapped page (say of /sbin/init)
   that's been unreferenced since _just_ after the system booted.  We also
   have an unmapped page cache page of a file often used at runtime, say
   one from /etc/resolv.conf or /etc/passwd.
   
  
  Hmm. I'm not fan of estimating working set size by calculation
  based on some numbers without considering history or feedback.
  
  Can't we use some kind of feedback algorithm as hi-low-watermark, random 
  walk
  or GA (or somehing more smart) to detect the size ?
 
 
 Could you please clarify at what level you are suggesting size
 detection? I assume it is outside the OS, right? 
 
OS includes kernel and system programs ;)

I can think of both way in kernel and in user approarh and they should be
complement to each other.

An example of kernel-based approach is.
 1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
 2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
(I guess current balloon driver is only for host. Please imagine.)

(A) increases free memory in Guest.
(B) increases free memory in Host.

This is an example of feedback based memory resizing between host and guest.

I think (B) is necessary at least before considering complecated things.

To implement something clever,  (A) and (B) should take into account that
how frequently memory reclaim in guest (which requires some I/O) happens.

If doing outside kernel, I think using memcg is better than depends on
balloon driver. But co-operative balloon and memcg may show us something
good.

Thanks,
-Kame


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/T/D][PATCH 2/2] Linux/Guest cooperative unmapped page cache control

2010-06-10 Thread KAMEZAWA Hiroyuki
On Fri, 11 Jun 2010 14:05:53 +0900
KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote:

 I can think of both way in kernel and in user approarh and they should be
 complement to each other.
 
 An example of kernel-based approach is.
  1. add a shrinker callback(A) for balloon-driver-for-guest as guest kswapd.
  2. add a shrinker callback(B) for balloon-driver-for-host as host kswapd.
 (I guess current balloon driver is only for host. Please imagine.)
  
  guest.
Sorry.
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virsh dump blocking problem

2010-04-06 Thread KAMEZAWA Hiroyuki
On Tue, 06 Apr 2010 09:35:09 +0800
Gui Jianfeng guijianf...@cn.fujitsu.com wrote:

 Hi all,
 
 I'm not sure whether it's appropriate to post the problem here.
 I played with virsh under Fedora 12, and started a KVM fedora12 guest
 by virsh start command. The fedora12 guest is successfully started.
 Than I run the following command to dump the guest core:
 #virsh dump 1 mycoredump (domain id is 1)
 
 This command seemed blocking and not return. According to he strace
 output, virsh dump seems that it's blocking at poll() call. I think
 the following should be the call trace of virsh.
 
 cmdDump()
   - virDomainCoreDump()
 - remoteDomainCoreDump()
  - call()
  - remoteIO()
  - remoteIOEventLoop()
   - poll(fds, ARRAY_CARDINALITY(fds), -1)
 
 
 Any one encounters this problem also, any thoughts?
 

I met and it seems qemu-kvm continues to counting the number of dirty pages
and does no answer to libvirt. Guest never work and I have to kill it.

I met this with 2.6.32+ qemu-0.12.3+ libvirt 0.7.7.1.
When I updated the host kernel to 2.6.33, qemu-kvm never work. So, I moved
back to fedora12's latest qemu-kvm.

Now, 2.6.34-rc3+ qemu-0.11.0-13.fc12.x86_64 + libvirt 0.7.7.1
# virsh dump   
hangs.

In most case, I see following 2 back trace.(with gdb)

(gdb) bt
#0  ram_save_remaining () at /usr/src/debug/qemu-kvm-0.11.0/vl.c:3104
#1  ram_bytes_remaining () at /usr/src/debug/qemu-kvm-0.11.0/vl.c:3112
#2  0x004ab2cf in do_info_migrate (mon=0x16b7970) at migration.c:150
#3  0x00414b1a in monitor_handle_command (mon=value optimized out,
cmdline=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/monitor.c:2870
#4  0x00414c6a in monitor_command_cb (mon=0x16b7970,
cmdline=value optimized out, opaque=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/monitor.c:3160
#5  0x0048b71b in readline_handle_byte (rs=0x208d6a0,
ch=value optimized out) at readline.c:369
#6  0x00414cdc in monitor_read (opaque=value optimized out,
buf=0x7fff1b1104b0 info migrate\r, size=13)
at /usr/src/debug/qemu-kvm-0.11.0/monitor.c:3146
#7  0x004b2a53 in tcp_chr_read (opaque=0x1614c30) at qemu-char.c:2006
#8  0x0040a6c7 in main_loop_wait (timeout=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/vl.c:4188
#9  0x0040eed5 in main_loop (argc=value optimized out,
argv=value optimized out, envp=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/vl.c:4414
#10 main (argc=value optimized out, argv=value optimized out,
envp=value optimized out) at /usr/src/debug/qemu-kvm-0.11.0/vl.c:6263


(gdb) bt
#0  0x003c2680e0bd in write () at ../sysdeps/unix/syscall-template.S:82
#1  0x004b304a in unix_write (fd=11, buf=value optimized out, len1=40)
at qemu-char.c:512
#2  send_all (fd=11, buf=value optimized out, len1=40) at qemu-char.c:528
#3  0x00411201 in monitor_flush (mon=0x16b7970)
at /usr/src/debug/qemu-kvm-0.11.0/monitor.c:131
#4  0x00414cdc in monitor_read (opaque=value optimized out,
buf=0x7fff1b1104b0 info migrate\r, size=13)
at /usr/src/debug/qemu-kvm-0.11.0/monitor.c:3146
#5  0x004b2a53 in tcp_chr_read (opaque=0x1614c30) at qemu-char.c:2006
#6  0x0040a6c7 in main_loop_wait (timeout=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/vl.c:4188
#7  0x0040eed5 in main_loop (argc=value optimized out,
argv=value optimized out, envp=value optimized out)
at /usr/src/debug/qemu-kvm-0.11.0/vl.c:4414
#8  main (argc=value optimized out, argv=value optimized out,
envp=value optimized out) at /usr/src/debug/qemu-kvm-0.11.0/vl.c:6263

And see no dump progress.

I'm sorry if this is not a hang but just very slow. I don't see any
progress at lease for 15 minutes and qemu-kvm continues to use 75% of cpus.
I'm not sure why dump command trigger migration code...

How long it takes to do virsh dump xxx , an idle VM with 2G memory ?
I'm sorry if I ask wrong mailing list.

Thanks,
-Kame









--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-31 Thread KAMEZAWA Hiroyuki
On Tue, 31 Mar 2009 15:21:53 +0300
Izik Eidus iei...@redhat.com wrote:

 kpage is actually what going to be KsmPage - the shared page...
 
 Right now this pages are not swappable..., after ksm will be merged we 
 will make this pages swappable as well...
 
sure.

  If so, please
   - show the amount of kpage
   
   - allow users to set limit for usage of kpages. or preserve kpages at boot 
  or
 by user's command.

 
 kpage actually save memory..., and limiting the number of them, would 
 make you limit the number of shared pages...
 

Ah, I'm working for memory control cgroup. And *KSM* will be out of control.
It's ok to make the default limit value as INFINITY. but please add knobs.

Thanks,
-Kame

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread KAMEZAWA Hiroyuki
On Tue, 31 Mar 2009 02:59:20 +0300
Izik Eidus iei...@redhat.com wrote:

 Ksm is driver that allow merging identical pages between one or more
 applications in way unvisible to the application that use it.
 Pages that are merged are marked as readonly and are COWed when any
 application try to change them.
 
 Ksm is used for cases where using fork() is not suitable,
 one of this cases is where the pages of the application keep changing
 dynamicly and the application cannot know in advance what pages are
 going to be identical.
 
 Ksm works by walking over the memory pages of the applications it
 scan in order to find identical pages.
 It uses a two sorted data strctures called stable and unstable trees
 to find in effective way the identical pages.
 
 When ksm finds two identical pages, it marks them as readonly and merges
 them into single one page,
 after the pages are marked as readonly and merged into one page, linux
 will treat this pages as normal copy_on_write pages and will fork them
 when write access will happen to them.
 
 Ksm scan just memory areas that were registred to be scanned by it.
 
 Ksm api:
 
 KSM_GET_API_VERSION:
 Give the userspace the api version of the module.
 
 KSM_CREATE_SHARED_MEMORY_AREA:
 Create shared memory reagion fd, that latter allow the user to register
 the memory region to scan by using:
 KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION
 
 KSM_START_STOP_KTHREAD:
 Return information about the kernel thread, the inforamtion is returned
 using the ksm_kthread_info structure:
 ksm_kthread_info:
 __u32 sleep:
 number of microsecoends to sleep between each iteration of
 scanning.
 
 __u32 pages_to_scan:
 number of pages to scan for each iteration of scanning.
 
 __u32 max_pages_to_merge:
 maximum number of pages to merge in each iteration of scanning
 (so even if there are still more pages to scan, we stop this
 iteration)
 
 __u32 flags:
flags to control ksmd (right now just ksm_control_flags_run
 available)
 
 KSM_REGISTER_MEMORY_REGION:
 Register userspace virtual address range to be scanned by ksm.
 This ioctl is using the ksm_memory_region structure:
 ksm_memory_region:
 __u32 npages;
  number of pages to share inside this memory region.
 __u32 pad;
 __u64 addr:
 the begining of the virtual address of this region.
 
 KSM_REMOVE_MEMORY_REGION:
 Remove memory region from ksm.
 
 Signed-off-by: Izik Eidus iei...@redhat.com
 ---
  include/linux/ksm.h|   69 +++
  include/linux/miscdevice.h |1 +
  mm/Kconfig |6 +
  mm/Makefile|1 +
  mm/ksm.c   | 1431 
 
  5 files changed, 1508 insertions(+), 0 deletions(-)
  create mode 100644 include/linux/ksm.h
  create mode 100644 mm/ksm.c
 
 diff --git a/include/linux/ksm.h b/include/linux/ksm.h
 new file mode 100644
 index 000..5776dce
 --- /dev/null
 +++ b/include/linux/ksm.h
 @@ -0,0 +1,69 @@
 +#ifndef __LINUX_KSM_H
 +#define __LINUX_KSM_H
 +
 +/*
 + * Userspace interface for /dev/ksm - kvm shared memory
 + */
 +
 +#include linux/types.h
 +#include linux/ioctl.h
 +
 +#include asm/types.h
 +
 +#define KSM_API_VERSION 1
 +
 +#define ksm_control_flags_run 1
 +
 +/* for KSM_REGISTER_MEMORY_REGION */
 +struct ksm_memory_region {
 + __u32 npages; /* number of pages to share */
 + __u32 pad;
 + __u64 addr; /* the begining of the virtual address */
 +__u64 reserved_bits;
 +};
 +
 +struct ksm_kthread_info {
 + __u32 sleep; /* number of microsecoends to sleep */
 + __u32 pages_to_scan; /* number of pages to scan */
 + __u32 flags; /* control flags */
 +__u32 pad;
 +__u64 reserved_bits;
 +};
 +
 +#define KSMIO 0xAB
 +
 +/* ioctls for /dev/ksm */
 +
 +#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
 +/*
 + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
 + */
 +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd 
 */
 +/*
 + * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
 + * (can stop the kernel thread from working by setting running = 0)
 + */
 +#define KSM_START_STOP_KTHREAD_IOW(KSMIO,  0x02,\
 +   struct ksm_kthread_info)
 +/*
 + * KSM_GET_INFO_KTHREAD - return information about the kernel thread
 + * scanning speed.
 + */
 +#define KSM_GET_INFO_KTHREAD  _IOW(KSMIO,  0x03,\
 +   struct ksm_kthread_info)
 +
 +
 +/* ioctls for SMA fds */
 +
 +/*
 + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
 + * scanned by kvm.
 + */
 +#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
 +   struct ksm_memory_region)
 +/*
 + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
 + */
 +#define KSM_REMOVE_MEMORY_REGION 

Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another

2008-11-13 Thread KAMEZAWA Hiroyuki
On Thu, 13 Nov 2008 12:38:07 +0200
Izik Eidus [EMAIL PROTECTED] wrote:
  If KSM pages are on radix-tree, it will be accounted automatically.
  Now, we have Unevictable LRU and mlocked() pages are smartly isolated 
  into its
  own LRU. So, just doing
 
   - inode's radix-tree
   - make all pages mlocked.
   - provide special page fault handler for your purpose

 
 Well in this version that i am going to merge the pages arent going to 
 be swappable,
 Latter after Ksm will get merged we will make the KsmPages swappable...
good to hear

 so i think working with cgroups would be effective / useful only when 
 KsmPages will start be swappable...
 Do you agree?
 (What i am saying is that right now lets dont count the KsmPages inside 
 the cgroup, lets do it when KsmPages
 will be swappable)
 
ok.

 If you feel this pages should be counted in the cgroup i have no problem 
 to do it via hooks like page migration is doing.
 
 thanks.
 
  is simple one. But ok, whatever implementation you'll do, I have to check it
  and consider whether it should be tracked or not. Then, add codes to memcg 
  to
  track it or ignore it or comments on your patches ;)
 
  It's helpful to add me to CC: when you post this set again.

 
 Sure will.
 

If necessary, I'll have to add ignore in this case hook in memcg.
(ex. checking PageKSM flag in memcg.)

If you are sufferred from memcg in your test, cgroup_disable=memory boot option
will allow you to disable memcg.


Thanks,
-Kame








--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another

2008-11-12 Thread KAMEZAWA Hiroyuki
Thank you for answers.

On Wed, 12 Nov 2008 13:11:12 +0200
Izik Eidus [EMAIL PROTECTED] wrote:

 Avi Kivity wrote:
  KAMEZAWA Hiroyuki wrote:
  Can I make a question ? (I'm working for memory cgroup.)
 
  Now, we do charge to anonymous page when
- charge(+1) when it's mapped firstly (mapcount 0-1)
- uncharge(-1) it's fully unmapped (mapcount 1-0) vir 
  page_remove_rmap().
 
  My quesion is
   - PageKSM pages are not necessary to be tracked by memory cgroup ?
 When we reaplacing page using page_replace() we have:
 oldpage -  anonymous page that is going to be replaced by newpage
 newpage - kernel allocated page (KsmPage)
 so about oldpage we are calling page_remove_rmap() that will notify cgroup
 and about newpage it wont be count inside cgroup beacuse it is file rmap 
 page
 (we are calling to page_add_file_rmap), so right now PageKSM wont ever 
 be tracked by cgroup.
 
If not in radix-tree, it's not tracked.
(But we don't want to track non-LRU pages which are not freeable.)


   - Can we know that the page is just replaced and we don't necessary 
  to do
 charge/uncharge.
 
 The caller of page_replace does know it, the only problem is that 
 page_remove_rmap()
 automaticly change the cgroup for anonymous pages,
 if we want it not to change the cgroup, we can:
 increase the cgroup count before page_remove (but in that case what 
 happen if we reach to the limit???)
 give parameter to page_remove_rmap() that we dont want the cgroup to be 
 changed.

Hmm, current mem cgroup works via page_cgroup struct to track pages.

   page - page_cgroup has one-to-one relation ship.

So, exchanging page itself causes trouble. But I may be able to provide
necessary hooks to you as I did in page migraiton.

 
   - annonymous page from KSM is worth to be tracked by memory cgroup ?
 (IOW, it's on LRU and can be swapped-out ?)
 
 KSM have no anonymous pages (it share anonymous pages into KsmPAGE - 
 kernel allocated page without mapping)
 so it isnt in LRU and it cannt be swapped, only when KsmPAGEs will be 
 break by do_wp_page() the duplication will be able to swap.
 
Ok, thank you for confirmation.


 
  My feeling is that shared pages should be accounted as if they were 
  not shared; that is, a share page should be accounted for each process 
  that shares it.  Perhaps sharing within a cgroup should be counted as 
  1 page for all the ptes pointing to it.
 
 

If KSM pages are on radix-tree, it will be accounted automatically.
Now, we have Unevictable LRU and mlocked() pages are smartly isolated into its
own LRU. So, just doing

 - inode's radix-tree
 - make all pages mlocked.
 - provide special page fault handler for your purpose

is simple one. But ok, whatever implementation you'll do, I have to check it
and consider whether it should be tracked or not. Then, add codes to memcg to
track it or ignore it or comments on your patches ;)

It's helpful to add me to CC: when you post this set again.

Thanks,
-Kame






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another

2008-11-11 Thread KAMEZAWA Hiroyuki
On Tue, 11 Nov 2008 23:24:21 +0100
Andrea Arcangeli [EMAIL PROTECTED] wrote:

 On Tue, Nov 11, 2008 at 03:31:18PM -0600, Christoph Lameter wrote:
   ksm need the pte inside the vma to point from anonymous page into 
   filebacked
   page
   can migrate.c do it without changes?
  
  So change anonymous to filebacked page?
 
  Currently page migration assumes that the page will continue to be part
  of the existing file or anon vma.
  
  What you want sounds like assigning a swap pte to an anonymous page? That
  way a anon page gains membership in a file backed mapping.
 
 KSM needs to convert anonymous pages to PageKSM, which means a page
 owned by ksm.c and only known by ksm.c. The Linux VM will free this
 page in munmap but that's about it, all we do is to match the number
 of anon-ptes pointing to the page with the page_count. So besides
 freeing the page when the last user exit()s or cows it, the VM will do
 nothing about it. Initially. Later it can swap it in a nonlinear way.
 
Can I make a question ? (I'm working for memory cgroup.)

Now, we do charge to anonymous page when
  - charge(+1) when it's mapped firstly (mapcount 0-1)
  - uncharge(-1) it's fully unmapped (mapcount 1-0) vir page_remove_rmap().

My quesion is
 - PageKSM pages are not necessary to be tracked by memory cgroup ?
 - Can we know that the page is just replaced and we don't necessary to do
   charge/uncharge.
 - annonymous page from KSM is worth to be tracked by memory cgroup ?
   (IOW, it's on LRU and can be swapped-out ?)

Thanks,
-Kame


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html