[Devel] Re: [PATCH 02/10] memcg: document cgroup dirty memory interfaces

2010-10-04 Thread KAMEZAWA Hiroyuki
On Sun,  3 Oct 2010 23:57:57 -0700
Greg Thelen  wrote:

> Document cgroup dirty memory interfaces and statistics.
> 
> Signed-off-by: Andrea Righi 
> Signed-off-by: Greg Thelen 

Nice.

Acked-by: KAMEZAWA Hiroyuki 





> ---
>  Documentation/cgroups/memory.txt |   37 +
>  1 files changed, 37 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/cgroups/memory.txt 
> b/Documentation/cgroups/memory.txt
> index 7781857..eab65e2 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -385,6 +385,10 @@ mapped_file  - # of bytes of mapped file (includes 
> tmpfs/shmem)
>  pgpgin   - # of pages paged in (equivalent to # of charging 
> events).
>  pgpgout  - # of pages paged out (equivalent to # of uncharging 
> events).
>  swap - # of bytes of swap usage
> +dirty- # of bytes that are waiting to get written back to 
> the disk.
> +writeback- # of bytes that are actively being written back to the disk.
> +nfs  - # of bytes sent to the NFS server, but not yet committed to
> + the actual storage.
>  inactive_anon- # of bytes of anonymous memory and swap cache memory 
> on
>   LRU list.
>  active_anon  - # of bytes of anonymous and swap cache memory on active
> @@ -453,6 +457,39 @@ memory under it will be reclaimed.
>  You can reset failcnt by writing 0 to failcnt file.
>  # echo 0 > .../memory.failcnt
>  
> +5.5 dirty memory
> +
> +Control the maximum amount of dirty pages a cgroup can have at any given 
> time.
> +
> +Limiting dirty memory is like fixing the max amount of dirty (hard to 
> reclaim)
> +page cache used by a cgroup.  So, in case of multiple cgroup writers, they 
> will
> +not be able to consume more than their designated share of dirty pages and 
> will
> +be forced to perform write-out if they cross that limit.
> +
> +The interface is equivalent to the procfs interface: /proc/sys/vm/dirty_*.  
> It
> +is possible to configure a limit to trigger both a direct writeback or a
> +background writeback performed by per-bdi flusher threads.  The root cgroup
> +memory.dirty_* control files are read-only and match the contents of
> +the /proc/sys/vm/dirty_* files.
> +
> +Per-cgroup dirty limits can be set using the following files in the cgroupfs:
> +
> +- memory.dirty_ratio: the amount of dirty memory (expressed as a percentage 
> of
> +  cgroup memory) at which a process generating dirty pages will itself start
> +  writing out dirty data.
> +
> +- memory.dirty_bytes: the amount of dirty memory (expressed in bytes) in the
> +  cgroup at which a process generating dirty pages will start itself writing 
> out
> +  dirty data.
> +
> +- memory.dirty_background_ratio: the amount of dirty memory of the cgroup
> +  (expressed as a percentage of cgroup memory) at which background writeback
> +  kernel threads will start writing out dirty data.
> +
> +- memory.dirty_background_bytes: the amount of dirty memory (expressed in 
> bytes)
> +  in the cgroup at which background writeback kernel threads will start 
> writing
> +  out dirty data.
> +
>  6. Hierarchy support
>  
>  The memory controller supports a deep hierarchy and hierarchical accounting.
> -- 
> 1.7.1
> 
> 

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 01/10] memcg: add page_cgroup flags for dirty page tracking

2010-10-04 Thread KAMEZAWA Hiroyuki
On Sun,  3 Oct 2010 23:57:56 -0700
Greg Thelen  wrote:

> Add additional flags to page_cgroup to track dirty pages
> within a mem_cgroup.
> 
> Signed-off-by: KAMEZAWA Hiroyuki 
> Signed-off-by: Andrea Righi 
> Signed-off-by: Greg Thelen 

Ack...oh, but it seems I've signed. Thanks.
-Kame

> ---
>  include/linux/page_cgroup.h |   23 +++
>  1 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 5bb13b3..b59c298 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -40,6 +40,9 @@ enum {
>   PCG_USED, /* this object is in use. */
>   PCG_ACCT_LRU, /* page has been accounted for */
>   PCG_FILE_MAPPED, /* page is accounted as "mapped" */
> + PCG_FILE_DIRTY, /* page is dirty */
> + PCG_FILE_WRITEBACK, /* page is under writeback */
> + PCG_FILE_UNSTABLE_NFS, /* page is NFS unstable */
>   PCG_MIGRATION, /* under page migration */
>  };
>  
> @@ -59,6 +62,10 @@ static inline void ClearPageCgroup##uname(struct 
> page_cgroup *pc)  \
>  static inline int TestClearPageCgroup##uname(struct page_cgroup *pc) \
>   { return test_and_clear_bit(PCG_##lname, &pc->flags);  }
>  
> +#define TESTSETPCGFLAG(uname, lname) \
> +static inline int TestSetPageCgroup##uname(struct page_cgroup *pc)   \
> + { return test_and_set_bit(PCG_##lname, &pc->flags);  }
> +
>  TESTPCGFLAG(Locked, LOCK)
>  
>  /* Cache flag is set only once (at allocation) */
> @@ -80,6 +87,22 @@ SETPCGFLAG(FileMapped, FILE_MAPPED)
>  CLEARPCGFLAG(FileMapped, FILE_MAPPED)
>  TESTPCGFLAG(FileMapped, FILE_MAPPED)
>  
> +SETPCGFLAG(FileDirty, FILE_DIRTY)
> +CLEARPCGFLAG(FileDirty, FILE_DIRTY)
> +TESTPCGFLAG(FileDirty, FILE_DIRTY)
> +TESTCLEARPCGFLAG(FileDirty, FILE_DIRTY)
> +TESTSETPCGFLAG(FileDirty, FILE_DIRTY)
> +
> +SETPCGFLAG(FileWriteback, FILE_WRITEBACK)
> +CLEARPCGFLAG(FileWriteback, FILE_WRITEBACK)
> +TESTPCGFLAG(FileWriteback, FILE_WRITEBACK)
> +
> +SETPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
> +CLEARPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
> +TESTPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
> +TESTCLEARPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
> +TESTSETPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
> +
>  SETPCGFLAG(Migration, MIGRATION)
>  CLEARPCGFLAG(Migration, MIGRATION)
>  TESTPCGFLAG(Migration, MIGRATION)
> -- 
> 1.7.1
> 
> 

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 00/10] memcg: per cgroup dirty page accounting

2010-10-04 Thread Greg Thelen
Balbir Singh  writes:
>
> * Greg Thelen  [2010-10-03 23:57:55]:
>
>> This patch set provides the ability for each cgroup to have independent dirty
>> page limits.
>> 
>> Limiting dirty memory is like fixing the max amount of dirty (hard to 
>> reclaim)
>> page cache used by a cgroup.  So, in case of multiple cgroup writers, they 
>> will
>> not be able to consume more than their designated share of dirty pages and 
>> will
>> be forced to perform write-out if they cross that limit.
>> 
>> These patches were developed and tested on mmotm 2010-09-28-16-13.  The 
>> patches
>> are based on a series proposed by Andrea Righi in Mar 2010.
>
> Hi, Greg,
>
> I see a problem with "memcg: add dirty page accounting infrastructure".
>
> The reject is
>
>  enum mem_cgroup_write_page_stat_item {
> MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
> +   MEMCG_NR_FILE_DIRTY, /* # of dirty pages in page cache */
> +   MEMCG_NR_FILE_WRITEBACK, /* # of pages under writeback */
> +   MEMCG_NR_FILE_UNSTABLE_NFS, /* # of NFS unstable pages */
>  };
>
> I don't see mem_cgroup_write_page_stat_item in memcontrol.h. Is this
> based on top of Kame's cleanup.
>
> I am working off of mmotm 28 sept 2010 16:13.

Balbir,

All of the 10 memcg dirty limits patches should apply directly to mmotm
28 sept 2010 16:13 without any other patches.  Any of Kame's cleanup
patches that are not in mmotm are not needed by this memcg dirty limit
series.

The patch you refer to, "[PATCH 05/10] memcg: add dirty page accounting
infrastructure" depends on a change from an earlier patch in the series.
Specifically, "[PATCH 03/10] memcg: create extensible page stat update
routines" contains the addition of mem_cgroup_write_page_stat_item:

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -25,6 +25,11 @@ struct page_cgroup;
 struct page;
 struct mm_struct;
 
+/* Stats that can be updated by kernel. */
+enum mem_cgroup_write_page_stat_item {
+ MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
+};
+

Do you have trouble applying patch 5 after applying patches 1-4?
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 00/10] memcg: per cgroup dirty page accounting

2010-10-04 Thread Balbir Singh
* Greg Thelen  [2010-10-03 23:57:55]:

> This patch set provides the ability for each cgroup to have independent dirty
> page limits.
> 
> Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
> page cache used by a cgroup.  So, in case of multiple cgroup writers, they 
> will
> not be able to consume more than their designated share of dirty pages and 
> will
> be forced to perform write-out if they cross that limit.
> 
> These patches were developed and tested on mmotm 2010-09-28-16-13.  The 
> patches
> are based on a series proposed by Andrea Righi in Mar 2010.

Hi, Greg,

I see a problem with "memcg: add dirty page accounting infrastructure".

The reject is

 enum mem_cgroup_write_page_stat_item {
MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
+   MEMCG_NR_FILE_DIRTY, /* # of dirty pages in page cache */
+   MEMCG_NR_FILE_WRITEBACK, /* # of pages under writeback */
+   MEMCG_NR_FILE_UNSTABLE_NFS, /* # of NFS unstable pages */
 };

I don't see mem_cgroup_write_page_stat_item in memcontrol.h. Is this
based on top of Kame's cleanup.

I am working off of mmotm 28 sept 2010 16:13.


-- 
Three Cheers,
Balbir
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 00/10] memcg: per cgroup dirty page accounting

2010-10-04 Thread Balbir Singh
* Greg Thelen  [2010-10-03 23:57:55]:

> This patch set provides the ability for each cgroup to have independent dirty
> page limits.
> 
> Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
> page cache used by a cgroup.  So, in case of multiple cgroup writers, they 
> will
> not be able to consume more than their designated share of dirty pages and 
> will
> be forced to perform write-out if they cross that limit.
> 
> These patches were developed and tested on mmotm 2010-09-28-16-13.  The 
> patches
> are based on a series proposed by Andrea Righi in Mar 2010.
> 
> Overview:
> - Add page_cgroup flags to record when pages are dirty, in writeback, or nfs
>   unstable.
> - Extend mem_cgroup to record the total number of pages in each of the 
>   interesting dirty states (dirty, writeback, unstable_nfs).  
> - Add dirty parameters similar to the system-wide  /proc/sys/vm/dirty_*
>   limits to mem_cgroup.  The mem_cgroup dirty parameters are accessible
>   via cgroupfs control files.
> - Consider both system and per-memcg dirty limits in page writeback when
>   deciding to queue background writeback or block for foreground writeback.
> 
> Known shortcomings:
> - When a cgroup dirty limit is exceeded, then bdi writeback is employed to
>   writeback dirty inodes.  Bdi writeback considers inodes from any cgroup, not
>   just inodes contributing dirty pages to the cgroup exceeding its limit.  

I suspect this means that we'll need a bdi controller in the I/O
controller spectrum or make writeback cgroup aware.

> 
> Performance measurements:
> - kernel builds are unaffected unless run with a small dirty limit.
> - all data collected with CONFIG_CGROUP_MEM_RES_CTLR=y.
> - dd has three data points (in secs) for three data sizes (100M, 200M, and 
> 1G).  
>   As expected, dd slows when it exceed its cgroup dirty limit.
> 
>kernel_build  dd
> mmotm 2:370.18, 0.38, 1.65
>   root_memcg
> 
> mmotm 2:370.18, 0.35, 1.66
>   non-root_memcg
> 
> mmotm+patches 2:370.18, 0.35, 1.68
>   root_memcg
> 
> mmotm+patches 2:370.19, 0.35, 1.69
>   non-root_memcg
> 
> mmotm+patches 2:370.19, 2.34, 22.82
>   non-root_memcg
>   150 MiB memcg dirty limit
> 
> mmotm+patches 3:581.71, 3.38, 17.33
>   non-root_memcg
>   1 MiB memcg dirty limit
> 
> Greg Thelen (10):
>   memcg: add page_cgroup flags for dirty page tracking
>   memcg: document cgroup dirty memory interfaces
>   memcg: create extensible page stat update routines
>   memcg: disable local interrupts in lock_page_cgroup()
>   memcg: add dirty page accounting infrastructure
>   memcg: add kernel calls for memcg dirty page stats
>   memcg: add dirty limits to mem_cgroup
>   memcg: add cgroupfs interface to memcg dirty limits
>   writeback: make determine_dirtyable_memory() static.
>   memcg: check memcg dirty limits in page writeback
> 
>  Documentation/cgroups/memory.txt |   37 
>  fs/nfs/write.c   |4 +
>  include/linux/memcontrol.h   |   78 +++-
>  include/linux/page_cgroup.h  |   31 +++-
>  include/linux/writeback.h|2 -
>  mm/filemap.c |1 +
>  mm/memcontrol.c  |  426 
> ++
>  mm/page-writeback.c  |  211 ---
>  mm/rmap.c|4 +-
>  mm/truncate.c|1 +
>  10 files changed, 672 insertions(+), 123 deletions(-)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 

-- 
Three Cheers,
Balbir
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 8/8] net: Implement socketat.

2010-10-04 Thread Eric W. Biederman
jamal  writes:

> One thing still confuses me...
> The app control point is in namespace0. I still want to be able to
> "boot" namespaces first and maybe a few seconds later do a socketat()...
> and create devices, tcp sockets etc. I suspect create_ns(namespace-name)
> would involve:
>  * open /proc/self/ns/net (namespace-name)
>  * unshare the netns
> Is this correct?

Almost.

create should be:
* verify namespace-name is not already in use
* mkdir -p /var/run/netns/
* unshare the netns
* mount --bind /proc/self/ns/net /var/run/netns/

Are you talking about an replacing something that used to use the linux
vrf patches that are floating around?

Eric
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 03/10] memcg: create extensible page stat update routines

2010-10-04 Thread Greg Thelen
Ciju Rajan K  writes:

> Greg Thelen wrote:
>> Replace usage of the mem_cgroup_update_file_mapped() memcg
>> statistic update routine with two new routines:
>> * mem_cgroup_inc_page_stat()
>> * mem_cgroup_dec_page_stat()
>>
>> As before, only the file_mapped statistic is managed.  However,
>> these more general interfaces allow for new statistics to be
>> more easily added.  New statistics are added with memcg dirty
>> page accounting.
>>
>>
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 512cb12..f4259f4 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1592,7 +1592,9 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *mem, 
>> gfp_t mask)
>>   * possibility of race condition. If there is, we take a lock.
>>   */
>>
>>   -static void mem_cgroup_update_file_stat(struct page *page, int idx, int
>> val)
>>   
> Not seeing this function in mmotm 28/09. So not able to apply this patch.
> Am I missing anything?

How are you getting mmotm?

I see the mem_cgroup_update_file_stat() routine added in mmotm
(stamp-2010-09-28-16-13) using patch file:
  
http://userweb.kernel.org/~akpm/mmotm/broken-out/memcg-generic-filestat-update-interface.patch

  Author: KAMEZAWA Hiroyuki 
  Date:   Tue Sep 28 21:48:19 2010 -0700
  
  This patch extracts the core logic from mem_cgroup_update_file_mapped() as
  mem_cgroup_update_file_stat() and adds a wrapper.
  
  As a planned future update, memory cgroup has to count dirty pages to
  implement dirty_ratio/limit.  And more, the number of dirty pages is
  required to kick flusher thread to start writeback.  (Now, no kick.)
  
  This patch is preparation for it and makes other statistics implementation
  clearer.  Just a clean up.
  
  Signed-off-by: KAMEZAWA Hiroyuki 
  Acked-by: Balbir Singh 
  Reviewed-by: Greg Thelen 
  Cc: Daisuke Nishimura 
  Signed-off-by: Andrew Morton 

If you are using the zen mmotm repository,
git://zen-kernel.org/kernel/mmotm.git, the commit id of
memcg-generic-filestat-update-interface.patch is
616960dc0cb0172a5e5adc9e2b83e668e1255b50.

>> +void mem_cgroup_update_page_stat(struct page *page,
>> + enum mem_cgroup_write_page_stat_item idx,
>> + int val)
>>  {
>>  struct mem_cgroup *mem;
>>
>>   
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH 8/8] net: Implement socketat.

2010-10-04 Thread Daniel Lezcano
On 10/03/2010 03:44 PM, jamal wrote:
> Hi Daniel,
>
> Thanks for clarifying this ..
>
> On Sat, 2010-10-02 at 23:13 +0200, Daniel Lezcano wrote:
>
>> Just to clarify this point. You enter the namespace, create the socket
>> and go back to the initial namespace (or create a new one). Further
>> operations can be made against this fd because it is the network
>> namespace stored in the sock struct which is used, not the current
>> process network namespace which is used at the socket creation only.
>>
>> We can actually already do that by unsharing and then create a
>> socket.
>> This socket will pin the namespace and can be used as a control socket
>> for the namespace (assuming the socket domain will be ok for all the
>> operations).
>>
>> Jamal, I don't know what kind of application you want to use but if I
>> assume you want to create a process controlling 1024 netns,
>>  
> At the moment i am looking at 8K on a Nehalem with lots of RAM. They
> will mostly be created at startup but some could be created afterwards.
> Each will have its own netdevs etc. also created at startup (and some
> other config that may happen later).
> Because startup time may accumulate, it is clearly important to me
> to pick whatever scheme that reduces the number of calls...
>

8K ! whow ! :)


>> let's try to identificate what happen with setns and with socketat :
>>
>> With setns:
>>
>>   * open /proc/self/ns/net (1)
>>   * unshare the netns
>>   * open /proc/self/ns/net (2)
>>   * setns (1)
>>   * create a virtual network device
>>   * move the virtual device to (2) (using the set netns by fd)
>>   * unshare the netns
>>   ...
>>
>> With socketat:
>>
>>   * open a socket (1)
>>   * unshare the netns
>>   * open a netlink with socketat(1) =>  (2)
>>   * create a virtual device using (2) (at this point it is
>> init_net_ns)
>>   * move the virtual device to the current netns (using the set
>> netns
>> by pid)
>>   * open a socket (3)
>>   * unshare the netns
>>   ...
>>
>> We have the same number of file descriptors kept opened. Except, with
>> setns we can bind mount the directory somewhere, that will pin the
>> namespace and then we can close the /proc/self/ns/net file descriptors
>> and reopen them later.
>>
>>  
> Ok, so a wrapper such as: create_socket_on(namespaceid)
> will have generally less system calls with socketat()
>

Yes, I think so.

>> If your application has to do a lot of specific network processing,
>> during its life cycle, in different namespaces, the socketat syscall
>> will be better because it will reduce the number of syscalls but at
>> the cost of keeping the file descriptors opened (potentially a big
>> number). Otherwise, setns should fit your needs.
>>  
> Makes sense.
>
> One thing still confuses me...
> The app control point is in namespace0. I still want to be able to
> "boot" namespaces first and maybe a few seconds later do a socketat()...
> and create devices, tcp sockets etc. I suspect create_ns(namespace-name)
> would involve:
>   * open /proc/self/ns/net (namespace-name)
>   * unshare the netns
> Is this correct?
>

Maybe I misunderstanding but you are trying to save some syscalls, you 
should use socketat only and keep app control namespace0 socket for it. 
The process will be in the last netns you unshared (maybe you can use 
here one setns syscall to return back to the namespace0).

 (1) socketat  :
 * pros : 1 syscall to create a socket
 * cons : a file descriptor per namespace, namespace is only 
manageable via a socket

 (2) setns :
 * pros : namespace is fully manageable with a generic code
 * cons : 2 syscall (or 3 if we want to return to the initial 
netns) to create a socket(setns + socket [ + setns ]), a file descriptor 
per namespace

 (3) setns + bind mount :
 * pros : no file descriptor need to be kept opened
 * cons : startup longer, (unshare + mount --bind), 4 syscalls 
to create a socket in the namespace (open, setns, socket, close), (may 
be 5 syscalls if we want to return to the initial netns).

Depending of the scheme you choose the startup will be for:

 (1) socketat :
  * open /proc/self/ns/net (one time to 'save' and pin the 
initial netns)
 and then

 int create_ns(void)
 {
 unshare(CLONE_NEWNET);
 return socket(...)
 }

 and,

  for (i = 0; i < 8192; i++)
  mynsfd[i] = create_ns();

 (2) setns :
  * open /proc/self/ns/net (one time to 'save' and pin the 
initial netns)
   and then

 int create_ns(void)
 {
 unshare(CLONE_NEWNET);
 return open("/proc/self/ns/net");
 }

 and,

 for (i = 0; i < 8192; i++)
   mynsfd[i] = create_ns();

 (3) setns + mount :

  * open /proc/self/ns/net (one time to 'save' and p

[Devel] [PATCH 09/10] writeback: make determine_dirtyable_memory() static.

2010-10-04 Thread Greg Thelen
The determine_dirtyable_memory() function is not used outside of
page writeback.  Make the routine static.  No functional change.
Just a cleanup in preparation for a change that adds memcg dirty
limits consideration into global_dirty_limits().

Signed-off-by: Andrea Righi 
Signed-off-by: Greg Thelen 
---
 include/linux/writeback.h |2 -
 mm/page-writeback.c   |  122 ++--
 2 files changed, 61 insertions(+), 63 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 72a5d64..9eacdca 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -105,8 +105,6 @@ extern int vm_highmem_is_dirtyable;
 extern int block_dump;
 extern int laptop_mode;
 
-extern unsigned long determine_dirtyable_memory(void);
-
 extern int dirty_background_ratio_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 820eb66..a0bb3e2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -132,6 +132,67 @@ static struct prop_descriptor vm_completions;
 static struct prop_descriptor vm_dirties;
 
 /*
+ * Work out the current dirty-memory clamping and background writeout
+ * thresholds.
+ *
+ * The main aim here is to lower them aggressively if there is a lot of mapped
+ * memory around.  To avoid stressing page reclaim with lots of unreclaimable
+ * pages.  It is better to clamp down on writers than to start swapping, and
+ * performing lots of scanning.
+ *
+ * We only allow 1/2 of the currently-unmapped memory to be dirtied.
+ *
+ * We don't permit the clamping level to fall below 5% - that is getting rather
+ * excessive.
+ *
+ * We make sure that the background writeout level is below the adjusted
+ * clamping level.
+ */
+
+static unsigned long highmem_dirtyable_memory(unsigned long total)
+{
+#ifdef CONFIG_HIGHMEM
+   int node;
+   unsigned long x = 0;
+
+   for_each_node_state(node, N_HIGH_MEMORY) {
+   struct zone *z =
+   &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
+
+   x += zone_page_state(z, NR_FREE_PAGES) +
+zone_reclaimable_pages(z);
+   }
+   /*
+* Make sure that the number of highmem pages is never larger
+* than the number of the total dirtyable memory. This can only
+* occur in very strange VM situations but we want to make sure
+* that this does not occur.
+*/
+   return min(x, total);
+#else
+   return 0;
+#endif
+}
+
+/**
+ * determine_dirtyable_memory - amount of memory that may be used
+ *
+ * Returns the numebr of pages that can currently be freed and used
+ * by the kernel for direct mappings.
+ */
+static unsigned long determine_dirtyable_memory(void)
+{
+   unsigned long x;
+
+   x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
+
+   if (!vm_highmem_is_dirtyable)
+   x -= highmem_dirtyable_memory(x);
+
+   return x + 1;   /* Ensure that we never return 0 */
+}
+
+/*
  * couple the period to the dirty_ratio:
  *
  *   period/2 ~ roundup_pow_of_two(dirty limit)
@@ -337,67 +398,6 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned max_ratio)
 EXPORT_SYMBOL(bdi_set_max_ratio);
 
 /*
- * Work out the current dirty-memory clamping and background writeout
- * thresholds.
- *
- * The main aim here is to lower them aggressively if there is a lot of mapped
- * memory around.  To avoid stressing page reclaim with lots of unreclaimable
- * pages.  It is better to clamp down on writers than to start swapping, and
- * performing lots of scanning.
- *
- * We only allow 1/2 of the currently-unmapped memory to be dirtied.
- *
- * We don't permit the clamping level to fall below 5% - that is getting rather
- * excessive.
- *
- * We make sure that the background writeout level is below the adjusted
- * clamping level.
- */
-
-static unsigned long highmem_dirtyable_memory(unsigned long total)
-{
-#ifdef CONFIG_HIGHMEM
-   int node;
-   unsigned long x = 0;
-
-   for_each_node_state(node, N_HIGH_MEMORY) {
-   struct zone *z =
-   &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
-
-   x += zone_page_state(z, NR_FREE_PAGES) +
-zone_reclaimable_pages(z);
-   }
-   /*
-* Make sure that the number of highmem pages is never larger
-* than the number of the total dirtyable memory. This can only
-* occur in very strange VM situations but we want to make sure
-* that this does not occur.
-*/
-   return min(x, total);
-#else
-   return 0;
-#endif
-}
-
-/**
- * determine_dirtyable_memory - amount of memory that may be used
- *
- * Returns the numebr of pages that can currently be freed and used
- * by the kernel for direct mappings.
- */
-unsigned long determine_dirtyable_memory(void)
-{
-   unsigned 

[Devel] [PATCH 06/10] memcg: add kernel calls for memcg dirty page stats

2010-10-04 Thread Greg Thelen
Add calls into memcg dirty page accounting.  Notify memcg when pages
transition between clean, file dirty, writeback, and unstable nfs.
This allows the memory controller to maintain an accurate view of
the amount of its memory that is dirty.

Signed-off-by: Greg Thelen 
Signed-off-by: Andrea Righi 
---
 fs/nfs/write.c  |4 
 mm/filemap.c|1 +
 mm/page-writeback.c |4 
 mm/truncate.c   |1 +
 4 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 48199fb..9e206bd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -450,6 +450,7 @@ nfs_mark_request_commit(struct nfs_page *req)
NFS_PAGE_TAG_COMMIT);
nfsi->ncommit++;
spin_unlock(&inode->i_lock);
+   mem_cgroup_inc_page_stat(req->wb_page, MEMCG_NR_FILE_UNSTABLE_NFS);
inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
@@ -461,6 +462,7 @@ nfs_clear_request_commit(struct nfs_page *req)
struct page *page = req->wb_page;
 
if (test_and_clear_bit(PG_CLEAN, &(req)->wb_flags)) {
+   mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_UNSTABLE_NFS);
dec_zone_page_state(page, NR_UNSTABLE_NFS);
dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
return 1;
@@ -1316,6 +1318,8 @@ nfs_commit_list(struct inode *inode, struct list_head 
*head, int how)
req = nfs_list_entry(head->next);
nfs_list_remove_request(req);
nfs_mark_request_commit(req);
+   mem_cgroup_dec_page_stat(req->wb_page,
+MEMCG_NR_FILE_UNSTABLE_NFS);
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
BDI_RECLAIMABLE);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3d4df44..82e0870 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -135,6 +135,7 @@ void __remove_from_page_cache(struct page *page)
 * having removed the page entirely.
 */
if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
+   mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b840afa..820eb66 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1114,6 +1114,7 @@ int __set_page_dirty_no_writeback(struct page *page)
 void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
if (mapping_cap_account_dirty(mapping)) {
+   mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_DIRTY);
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_zone_page_state(page, NR_DIRTIED);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
@@ -1303,6 +1304,7 @@ int clear_page_dirty_for_io(struct page *page)
 * for more comments.
 */
if (TestClearPageDirty(page)) {
+   mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info,
BDI_RECLAIMABLE);
@@ -1333,6 +1335,7 @@ int test_clear_page_writeback(struct page *page)
__dec_bdi_stat(bdi, BDI_WRITEBACK);
__bdi_writeout_inc(bdi);
}
+   mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_WRITEBACK);
}
spin_unlock_irqrestore(&mapping->tree_lock, flags);
} else {
@@ -1360,6 +1363,7 @@ int test_set_page_writeback(struct page *page)
PAGECACHE_TAG_WRITEBACK);
if (bdi_cap_account_writeback(bdi))
__inc_bdi_stat(bdi, BDI_WRITEBACK);
+   mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_WRITEBACK);
}
if (!PageDirty(page))
radix_tree_tag_clear(&mapping->page_tree,
diff --git a/mm/truncate.c b/mm/truncate.c
index ba887bf..551dc23 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -74,6 +74,7 @@ void cancel_dirty_page(struct page *page, unsigned int 
account_size)
if (TestClearPageDirty(page)) {
struct address_space *mapping = page->mapping;
if (mapping && mapping_cap_account_dirty(mapping)) {
+   mem_cgroup_dec_page_stat(page, MEMCG_NR_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(ma

[Devel] [PATCH 00/10] memcg: per cgroup dirty page accounting

2010-10-04 Thread Greg Thelen
This patch set provides the ability for each cgroup to have independent dirty
page limits.

Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
not be able to consume more than their designated share of dirty pages and will
be forced to perform write-out if they cross that limit.

These patches were developed and tested on mmotm 2010-09-28-16-13.  The patches
are based on a series proposed by Andrea Righi in Mar 2010.

Overview:
- Add page_cgroup flags to record when pages are dirty, in writeback, or nfs
  unstable.
- Extend mem_cgroup to record the total number of pages in each of the 
  interesting dirty states (dirty, writeback, unstable_nfs).  
- Add dirty parameters similar to the system-wide  /proc/sys/vm/dirty_*
  limits to mem_cgroup.  The mem_cgroup dirty parameters are accessible
  via cgroupfs control files.
- Consider both system and per-memcg dirty limits in page writeback when
  deciding to queue background writeback or block for foreground writeback.

Known shortcomings:
- When a cgroup dirty limit is exceeded, then bdi writeback is employed to
  writeback dirty inodes.  Bdi writeback considers inodes from any cgroup, not
  just inodes contributing dirty pages to the cgroup exceeding its limit.  

Performance measurements:
- kernel builds are unaffected unless run with a small dirty limit.
- all data collected with CONFIG_CGROUP_MEM_RES_CTLR=y.
- dd has three data points (in secs) for three data sizes (100M, 200M, and 1G). 
 
  As expected, dd slows when it exceed its cgroup dirty limit.

   kernel_build  dd
mmotm 2:370.18, 0.38, 1.65
  root_memcg

mmotm 2:370.18, 0.35, 1.66
  non-root_memcg

mmotm+patches 2:370.18, 0.35, 1.68
  root_memcg

mmotm+patches 2:370.19, 0.35, 1.69
  non-root_memcg

mmotm+patches 2:370.19, 2.34, 22.82
  non-root_memcg
  150 MiB memcg dirty limit

mmotm+patches 3:581.71, 3.38, 17.33
  non-root_memcg
  1 MiB memcg dirty limit

Greg Thelen (10):
  memcg: add page_cgroup flags for dirty page tracking
  memcg: document cgroup dirty memory interfaces
  memcg: create extensible page stat update routines
  memcg: disable local interrupts in lock_page_cgroup()
  memcg: add dirty page accounting infrastructure
  memcg: add kernel calls for memcg dirty page stats
  memcg: add dirty limits to mem_cgroup
  memcg: add cgroupfs interface to memcg dirty limits
  writeback: make determine_dirtyable_memory() static.
  memcg: check memcg dirty limits in page writeback

 Documentation/cgroups/memory.txt |   37 
 fs/nfs/write.c   |4 +
 include/linux/memcontrol.h   |   78 +++-
 include/linux/page_cgroup.h  |   31 +++-
 include/linux/writeback.h|2 -
 mm/filemap.c |1 +
 mm/memcontrol.c  |  426 ++
 mm/page-writeback.c  |  211 ---
 mm/rmap.c|4 +-
 mm/truncate.c|1 +
 10 files changed, 672 insertions(+), 123 deletions(-)

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] [PATCH 10/10] memcg: check memcg dirty limits in page writeback

2010-10-04 Thread Greg Thelen
If the current process is in a non-root memcg, then
global_dirty_limits() will consider the memcg dirty limit.
This allows different cgroups to have distinct dirty limits
which trigger direct and background writeback at different
levels.

Signed-off-by: Andrea Righi 
Signed-off-by: Greg Thelen 
---
 mm/page-writeback.c |   87 ++-
 1 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a0bb3e2..c1db336 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -180,7 +180,7 @@ static unsigned long highmem_dirtyable_memory(unsigned long 
total)
  * Returns the numebr of pages that can currently be freed and used
  * by the kernel for direct mappings.
  */
-static unsigned long determine_dirtyable_memory(void)
+static unsigned long get_global_dirtyable_memory(void)
 {
unsigned long x;
 
@@ -192,6 +192,58 @@ static unsigned long determine_dirtyable_memory(void)
return x + 1;   /* Ensure that we never return 0 */
 }
 
+static unsigned long get_dirtyable_memory(void)
+{
+   unsigned long memory;
+   s64 memcg_memory;
+
+   memory = get_global_dirtyable_memory();
+   if (!mem_cgroup_has_dirty_limit())
+   return memory;
+   memcg_memory = mem_cgroup_page_stat(MEMCG_NR_DIRTYABLE_PAGES);
+   BUG_ON(memcg_memory < 0);
+
+   return min((unsigned long)memcg_memory, memory);
+}
+
+static long get_reclaimable_pages(void)
+{
+   s64 ret;
+
+   if (!mem_cgroup_has_dirty_limit())
+   return global_page_state(NR_FILE_DIRTY) +
+   global_page_state(NR_UNSTABLE_NFS);
+   ret = mem_cgroup_page_stat(MEMCG_NR_RECLAIM_PAGES);
+   BUG_ON(ret < 0);
+
+   return ret;
+}
+
+static long get_writeback_pages(void)
+{
+   s64 ret;
+
+   if (!mem_cgroup_has_dirty_limit())
+   return global_page_state(NR_WRITEBACK);
+   ret = mem_cgroup_page_stat(MEMCG_NR_WRITEBACK);
+   BUG_ON(ret < 0);
+
+   return ret;
+}
+
+static unsigned long get_dirty_writeback_pages(void)
+{
+   s64 ret;
+
+   if (!mem_cgroup_has_dirty_limit())
+   return global_page_state(NR_UNSTABLE_NFS) +
+   global_page_state(NR_WRITEBACK);
+   ret = mem_cgroup_page_stat(MEMCG_NR_DIRTY_WRITEBACK_PAGES);
+   BUG_ON(ret < 0);
+
+   return ret;
+}
+
 /*
  * couple the period to the dirty_ratio:
  *
@@ -204,7 +256,7 @@ static int calc_period_shift(void)
if (vm_dirty_bytes)
dirty_total = vm_dirty_bytes / PAGE_SIZE;
else
-   dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
+   dirty_total = (vm_dirty_ratio * get_global_dirtyable_memory()) /
100;
return 2 + ilog2(dirty_total - 1);
 }
@@ -410,18 +462,23 @@ void global_dirty_limits(unsigned long *pbackground, 
unsigned long *pdirty)
 {
unsigned long background;
unsigned long dirty;
-   unsigned long available_memory = determine_dirtyable_memory();
+   unsigned long available_memory = get_dirtyable_memory();
struct task_struct *tsk;
+   struct vm_dirty_param dirty_param;
 
-   if (vm_dirty_bytes)
-   dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
+   get_vm_dirty_param(&dirty_param);
+
+   if (dirty_param.dirty_bytes)
+   dirty = DIV_ROUND_UP(dirty_param.dirty_bytes, PAGE_SIZE);
else
-   dirty = (vm_dirty_ratio * available_memory) / 100;
+   dirty = (dirty_param.dirty_ratio * available_memory) / 100;
 
-   if (dirty_background_bytes)
-   background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
+   if (dirty_param.dirty_background_bytes)
+   background = DIV_ROUND_UP(dirty_param.dirty_background_bytes,
+ PAGE_SIZE);
else
-   background = (dirty_background_ratio * available_memory) / 100;
+   background = (dirty_param.dirty_background_ratio *
+ available_memory) / 100;
 
if (background >= dirty)
background = dirty / 2;
@@ -493,9 +550,8 @@ static void balance_dirty_pages(struct address_space 
*mapping,
.range_cyclic   = 1,
};
 
-   nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
-   global_page_state(NR_UNSTABLE_NFS);
-   nr_writeback = global_page_state(NR_WRITEBACK);
+   nr_reclaimable = get_reclaimable_pages();
+   nr_writeback = get_writeback_pages();
 
global_dirty_limits(&background_thresh, &dirty_thresh);
 
@@ -652,6 +708,7 @@ void throttle_vm_writeout(gfp_t gfp_mask)
 {
unsigned long background_thresh;
unsigned long dirty_thresh;
+   unsigned long dirty;
 
 for ( ; ; ) {
global_dirt

[Devel] [PATCH 03/10] memcg: create extensible page stat update routines

2010-10-04 Thread Greg Thelen
Replace usage of the mem_cgroup_update_file_mapped() memcg
statistic update routine with two new routines:
* mem_cgroup_inc_page_stat()
* mem_cgroup_dec_page_stat()

As before, only the file_mapped statistic is managed.  However,
these more general interfaces allow for new statistics to be
more easily added.  New statistics are added with memcg dirty
page accounting.

Signed-off-by: Greg Thelen 
Signed-off-by: Andrea Righi 
---
 include/linux/memcontrol.h |   31 ---
 mm/memcontrol.c|   17 -
 mm/rmap.c  |4 ++--
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 159a076..7c7bec4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -25,6 +25,11 @@ struct page_cgroup;
 struct page;
 struct mm_struct;
 
+/* Stats that can be updated by kernel. */
+enum mem_cgroup_write_page_stat_item {
+   MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
+};
+
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
struct list_head *dst,
unsigned long *scanned, int order,
@@ -121,7 +126,22 @@ static inline bool mem_cgroup_disabled(void)
return false;
 }
 
-void mem_cgroup_update_file_mapped(struct page *page, int val);
+void mem_cgroup_update_page_stat(struct page *page,
+enum mem_cgroup_write_page_stat_item idx,
+int val);
+
+static inline void mem_cgroup_inc_page_stat(struct page *page,
+   enum mem_cgroup_write_page_stat_item idx)
+{
+   mem_cgroup_update_page_stat(page, idx, 1);
+}
+
+static inline void mem_cgroup_dec_page_stat(struct page *page,
+   enum mem_cgroup_write_page_stat_item idx)
+{
+   mem_cgroup_update_page_stat(page, idx, -1);
+}
+
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
gfp_t gfp_mask);
 u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
@@ -293,8 +313,13 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct 
task_struct *p)
 {
 }
 
-static inline void mem_cgroup_update_file_mapped(struct page *page,
-   int val)
+static inline void mem_cgroup_inc_page_stat(struct page *page,
+   enum mem_cgroup_write_page_stat_item idx)
+{
+}
+
+static inline void mem_cgroup_dec_page_stat(struct page *page,
+   enum mem_cgroup_write_page_stat_item idx)
 {
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 512cb12..f4259f4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1592,7 +1592,9 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *mem, gfp_t 
mask)
  * possibility of race condition. If there is, we take a lock.
  */
 
-static void mem_cgroup_update_file_stat(struct page *page, int idx, int val)
+void mem_cgroup_update_page_stat(struct page *page,
+enum mem_cgroup_write_page_stat_item idx,
+int val)
 {
struct mem_cgroup *mem;
struct page_cgroup *pc = lookup_page_cgroup(page);
@@ -1615,30 +1617,27 @@ static void mem_cgroup_update_file_stat(struct page 
*page, int idx, int val)
goto out;
}
 
-   this_cpu_add(mem->stat->count[idx], val);
-
switch (idx) {
-   case MEM_CGROUP_STAT_FILE_MAPPED:
+   case MEMCG_NR_FILE_MAPPED:
if (val > 0)
SetPageCgroupFileMapped(pc);
else if (!page_mapped(page))
ClearPageCgroupFileMapped(pc);
+   idx = MEM_CGROUP_STAT_FILE_MAPPED;
break;
default:
BUG();
}
 
+   this_cpu_add(mem->stat->count[idx], val);
+
 out:
if (unlikely(need_unlock))
unlock_page_cgroup(pc);
rcu_read_unlock();
return;
 }
-
-void mem_cgroup_update_file_mapped(struct page *page, int val)
-{
-   mem_cgroup_update_file_stat(page, MEM_CGROUP_STAT_FILE_MAPPED, val);
-}
+EXPORT_SYMBOL(mem_cgroup_update_page_stat);
 
 /*
  * size of first charge trial. "32" comes from vmscan.c's magic value.
diff --git a/mm/rmap.c b/mm/rmap.c
index 8734312..779c0db 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -912,7 +912,7 @@ void page_add_file_rmap(struct page *page)
 {
if (atomic_inc_and_test(&page->_mapcount)) {
__inc_zone_page_state(page, NR_FILE_MAPPED);
-   mem_cgroup_update_file_mapped(page, 1);
+   mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
}
 }
 
@@ -950,7 +950,7 @@ void page_remove_rmap(struct page *page)
__dec_zone_page_state(page, NR_ANON_PAGES);
} else {
__dec_zone_page_state(page, NR_FILE_MAPPED);
-  

[Devel] [PATCH 01/10] memcg: add page_cgroup flags for dirty page tracking

2010-10-04 Thread Greg Thelen
Add additional flags to page_cgroup to track dirty pages
within a mem_cgroup.

Signed-off-by: KAMEZAWA Hiroyuki 
Signed-off-by: Andrea Righi 
Signed-off-by: Greg Thelen 
---
 include/linux/page_cgroup.h |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 5bb13b3..b59c298 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -40,6 +40,9 @@ enum {
PCG_USED, /* this object is in use. */
PCG_ACCT_LRU, /* page has been accounted for */
PCG_FILE_MAPPED, /* page is accounted as "mapped" */
+   PCG_FILE_DIRTY, /* page is dirty */
+   PCG_FILE_WRITEBACK, /* page is under writeback */
+   PCG_FILE_UNSTABLE_NFS, /* page is NFS unstable */
PCG_MIGRATION, /* under page migration */
 };
 
@@ -59,6 +62,10 @@ static inline void ClearPageCgroup##uname(struct page_cgroup 
*pc)\
 static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)   \
{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
 
+#define TESTSETPCGFLAG(uname, lname)   \
+static inline int TestSetPageCgroup##uname(struct page_cgroup *pc) \
+   { return test_and_set_bit(PCG_##lname, &pc->flags);  }
+
 TESTPCGFLAG(Locked, LOCK)
 
 /* Cache flag is set only once (at allocation) */
@@ -80,6 +87,22 @@ SETPCGFLAG(FileMapped, FILE_MAPPED)
 CLEARPCGFLAG(FileMapped, FILE_MAPPED)
 TESTPCGFLAG(FileMapped, FILE_MAPPED)
 
+SETPCGFLAG(FileDirty, FILE_DIRTY)
+CLEARPCGFLAG(FileDirty, FILE_DIRTY)
+TESTPCGFLAG(FileDirty, FILE_DIRTY)
+TESTCLEARPCGFLAG(FileDirty, FILE_DIRTY)
+TESTSETPCGFLAG(FileDirty, FILE_DIRTY)
+
+SETPCGFLAG(FileWriteback, FILE_WRITEBACK)
+CLEARPCGFLAG(FileWriteback, FILE_WRITEBACK)
+TESTPCGFLAG(FileWriteback, FILE_WRITEBACK)
+
+SETPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
+CLEARPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
+TESTPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
+TESTCLEARPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
+TESTSETPCGFLAG(FileUnstableNFS, FILE_UNSTABLE_NFS)
+
 SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)
-- 
1.7.1

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] [PATCH 07/10] memcg: add dirty limits to mem_cgroup

2010-10-04 Thread Greg Thelen
Extend mem_cgroup to contain dirty page limits.  Also add routines
allowing the kernel to query the dirty usage of a memcg.

These interfaces not used by the kernel yet.  A subsequent commit
will add kernel calls to utilize these new routines.

Signed-off-by: Greg Thelen 
Signed-off-by: Andrea Righi 
---
 include/linux/memcontrol.h |   44 +++
 mm/memcontrol.c|  180 +++-
 2 files changed, 223 insertions(+), 1 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6303da1..dc8952d 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -19,6 +19,7 @@
 
 #ifndef _LINUX_MEMCONTROL_H
 #define _LINUX_MEMCONTROL_H
+#include 
 #include 
 struct mem_cgroup;
 struct page_cgroup;
@@ -33,6 +34,30 @@ enum mem_cgroup_write_page_stat_item {
MEMCG_NR_FILE_UNSTABLE_NFS, /* # of NFS unstable pages */
 };
 
+/* Cgroup memory statistics items exported to the kernel */
+enum mem_cgroup_read_page_stat_item {
+   MEMCG_NR_DIRTYABLE_PAGES,
+   MEMCG_NR_RECLAIM_PAGES,
+   MEMCG_NR_WRITEBACK,
+   MEMCG_NR_DIRTY_WRITEBACK_PAGES,
+};
+
+/* Dirty memory parameters */
+struct vm_dirty_param {
+   int dirty_ratio;
+   int dirty_background_ratio;
+   unsigned long dirty_bytes;
+   unsigned long dirty_background_bytes;
+};
+
+static inline void get_global_vm_dirty_param(struct vm_dirty_param *param)
+{
+   param->dirty_ratio = vm_dirty_ratio;
+   param->dirty_bytes = vm_dirty_bytes;
+   param->dirty_background_ratio = dirty_background_ratio;
+   param->dirty_background_bytes = dirty_background_bytes;
+}
+
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
struct list_head *dst,
unsigned long *scanned, int order,
@@ -145,6 +170,10 @@ static inline void mem_cgroup_dec_page_stat(struct page 
*page,
mem_cgroup_update_page_stat(page, idx, -1);
 }
 
+bool mem_cgroup_has_dirty_limit(void);
+void get_vm_dirty_param(struct vm_dirty_param *param);
+s64 mem_cgroup_page_stat(enum mem_cgroup_read_page_stat_item item);
+
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
gfp_t gfp_mask);
 u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
@@ -326,6 +355,21 @@ static inline void mem_cgroup_dec_page_stat(struct page 
*page,
 {
 }
 
+static inline bool mem_cgroup_has_dirty_limit(void)
+{
+   return false;
+}
+
+static inline void get_vm_dirty_param(struct vm_dirty_param *param)
+{
+   get_global_vm_dirty_param(param);
+}
+
+static inline s64 mem_cgroup_page_stat(enum mem_cgroup_read_page_stat_item 
item)
+{
+   return -ENOSYS;
+}
+
 static inline
 unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
gfp_t gfp_mask)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f40839f..6ec2625 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -233,6 +233,10 @@ struct mem_cgroup {
atomic_trefcnt;
 
unsigned intswappiness;
+
+   /* control memory cgroup dirty pages */
+   struct vm_dirty_param dirty_param;
+
/* OOM-Killer disable */
int oom_kill_disable;
 
@@ -1132,6 +1136,172 @@ static unsigned int get_swappiness(struct mem_cgroup 
*memcg)
return swappiness;
 }
 
+/*
+ * Returns a snapshot of the current dirty limits which is not synchronized 
with
+ * the routines that change the dirty limits.  If this routine races with an
+ * update to the dirty bytes/ratio value, then the caller must handle the case
+ * where both dirty_[background_]_ratio and _bytes are set.
+ */
+static void __mem_cgroup_get_dirty_param(struct vm_dirty_param *param,
+struct mem_cgroup *mem)
+{
+   if (mem && !mem_cgroup_is_root(mem)) {
+   param->dirty_ratio = mem->dirty_param.dirty_ratio;
+   param->dirty_bytes = mem->dirty_param.dirty_bytes;
+   param->dirty_background_ratio =
+   mem->dirty_param.dirty_background_ratio;
+   param->dirty_background_bytes =
+   mem->dirty_param.dirty_background_bytes;
+   } else {
+   get_global_vm_dirty_param(param);
+   }
+}
+
+/*
+ * Get dirty memory parameters of the current memcg or global values (if memory
+ * cgroups are disabled or querying the root cgroup).
+ */
+void get_vm_dirty_param(struct vm_dirty_param *param)
+{
+   struct mem_cgroup *memcg;
+
+   if (mem_cgroup_disabled()) {
+   get_global_vm_dirty_param(param);
+   return;
+   }
+
+   /*
+* It's possible that "current" may be moved to other cgroup while we
+* access cgroup. But precise check is meaningless because the task can
+* be moved after our access and writeback tends to take lo

[Devel] [PATCH 08/10] memcg: add cgroupfs interface to memcg dirty limits

2010-10-04 Thread Greg Thelen
Add cgroupfs interface to memcg dirty page limits:
  Direct write-out is controlled with:
  - memory.dirty_ratio
  - memory.dirty_bytes

  Background write-out is controlled with:
  - memory.dirty_background_ratio
  - memory.dirty_background_bytes

Signed-off-by: Andrea Righi 
Signed-off-by: Greg Thelen 
---
 mm/memcontrol.c |   89 +++
 1 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6ec2625..2d45a0a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -100,6 +100,13 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_NSTATS,
 };
 
+enum {
+   MEM_CGROUP_DIRTY_RATIO,
+   MEM_CGROUP_DIRTY_BYTES,
+   MEM_CGROUP_DIRTY_BACKGROUND_RATIO,
+   MEM_CGROUP_DIRTY_BACKGROUND_BYTES,
+};
+
 struct mem_cgroup_stat_cpu {
s64 count[MEM_CGROUP_STAT_NSTATS];
 };
@@ -4292,6 +4299,64 @@ static int mem_cgroup_oom_control_write(struct cgroup 
*cgrp,
return 0;
 }
 
+static u64 mem_cgroup_dirty_read(struct cgroup *cgrp, struct cftype *cft)
+{
+   struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+   bool root;
+
+   root = mem_cgroup_is_root(mem);
+
+   switch (cft->private) {
+   case MEM_CGROUP_DIRTY_RATIO:
+   return root ? vm_dirty_ratio : mem->dirty_param.dirty_ratio;
+   case MEM_CGROUP_DIRTY_BYTES:
+   return root ? vm_dirty_bytes : mem->dirty_param.dirty_bytes;
+   case MEM_CGROUP_DIRTY_BACKGROUND_RATIO:
+   return root ? dirty_background_ratio :
+   mem->dirty_param.dirty_background_ratio;
+   case MEM_CGROUP_DIRTY_BACKGROUND_BYTES:
+   return root ? dirty_background_bytes :
+   mem->dirty_param.dirty_background_bytes;
+   default:
+   BUG();
+   }
+}
+
+static int
+mem_cgroup_dirty_write(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+   struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+   int type = cft->private;
+
+   if (cgrp->parent == NULL)
+   return -EINVAL;
+   if ((type == MEM_CGROUP_DIRTY_RATIO ||
+type == MEM_CGROUP_DIRTY_BACKGROUND_RATIO) && val > 100)
+   return -EINVAL;
+   switch (type) {
+   case MEM_CGROUP_DIRTY_RATIO:
+   memcg->dirty_param.dirty_ratio = val;
+   memcg->dirty_param.dirty_bytes = 0;
+   break;
+   case MEM_CGROUP_DIRTY_BYTES:
+   memcg->dirty_param.dirty_bytes = val;
+   memcg->dirty_param.dirty_ratio  = 0;
+   break;
+   case MEM_CGROUP_DIRTY_BACKGROUND_RATIO:
+   memcg->dirty_param.dirty_background_ratio = val;
+   memcg->dirty_param.dirty_background_bytes = 0;
+   break;
+   case MEM_CGROUP_DIRTY_BACKGROUND_BYTES:
+   memcg->dirty_param.dirty_background_bytes = val;
+   memcg->dirty_param.dirty_background_ratio = 0;
+   break;
+   default:
+   BUG();
+   break;
+   }
+   return 0;
+}
+
 static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -4355,6 +4420,30 @@ static struct cftype mem_cgroup_files[] = {
.unregister_event = mem_cgroup_oom_unregister_event,
.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
},
+   {
+   .name = "dirty_ratio",
+   .read_u64 = mem_cgroup_dirty_read,
+   .write_u64 = mem_cgroup_dirty_write,
+   .private = MEM_CGROUP_DIRTY_RATIO,
+   },
+   {
+   .name = "dirty_bytes",
+   .read_u64 = mem_cgroup_dirty_read,
+   .write_u64 = mem_cgroup_dirty_write,
+   .private = MEM_CGROUP_DIRTY_BYTES,
+   },
+   {
+   .name = "dirty_background_ratio",
+   .read_u64 = mem_cgroup_dirty_read,
+   .write_u64 = mem_cgroup_dirty_write,
+   .private = MEM_CGROUP_DIRTY_BACKGROUND_RATIO,
+   },
+   {
+   .name = "dirty_background_bytes",
+   .read_u64 = mem_cgroup_dirty_read,
+   .write_u64 = mem_cgroup_dirty_write,
+   .private = MEM_CGROUP_DIRTY_BACKGROUND_BYTES,
+   },
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
-- 
1.7.1

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] [PATCH 05/10] memcg: add dirty page accounting infrastructure

2010-10-04 Thread Greg Thelen
Add memcg routines to track dirty, writeback, and unstable_NFS pages.
These routines are not yet used by the kernel to count such pages.
A later change adds kernel calls to these new routines.

Signed-off-by: Greg Thelen 
Signed-off-by: Andrea Righi 
---
 include/linux/memcontrol.h |3 +
 mm/memcontrol.c|   89 
 2 files changed, 84 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7c7bec4..6303da1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -28,6 +28,9 @@ struct mm_struct;
 /* Stats that can be updated by kernel. */
 enum mem_cgroup_write_page_stat_item {
MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
+   MEMCG_NR_FILE_DIRTY, /* # of dirty pages in page cache */
+   MEMCG_NR_FILE_WRITEBACK, /* # of pages under writeback */
+   MEMCG_NR_FILE_UNSTABLE_NFS, /* # of NFS unstable pages */
 };
 
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 267d774..f40839f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -85,10 +85,13 @@ enum mem_cgroup_stat_index {
 */
MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
MEM_CGROUP_STAT_RSS,   /* # of pages charged as anon rss */
-   MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
MEM_CGROUP_STAT_PGPGIN_COUNT,   /* # of pages paged in */
MEM_CGROUP_STAT_PGPGOUT_COUNT,  /* # of pages paged out */
MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
+   MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
+   MEM_CGROUP_STAT_FILE_DIRTY, /* # of dirty pages in page cache */
+   MEM_CGROUP_STAT_FILE_WRITEBACK, /* # of pages under writeback */
+   MEM_CGROUP_STAT_FILE_UNSTABLE_NFS,  /* # of NFS unstable pages */
MEM_CGROUP_STAT_DATA, /* end of data requires synchronization */
/* incremented at every  pagein/pageout */
MEM_CGROUP_EVENTS = MEM_CGROUP_STAT_DATA,
@@ -1626,6 +1629,48 @@ void mem_cgroup_update_page_stat(struct page *page,
ClearPageCgroupFileMapped(pc);
idx = MEM_CGROUP_STAT_FILE_MAPPED;
break;
+
+   case MEMCG_NR_FILE_DIRTY:
+   /* Use Test{Set,Clear} to only un/charge the memcg once. */
+   if (val > 0) {
+   if (TestSetPageCgroupFileDirty(pc))
+   /* already set */
+   val = 0;
+   } else {
+   if (!TestClearPageCgroupFileDirty(pc))
+   /* already cleared */
+   val = 0;
+   }
+   idx = MEM_CGROUP_STAT_FILE_DIRTY;
+   break;
+
+   case MEMCG_NR_FILE_WRITEBACK:
+   /*
+* This counter is adjusted while holding the mapping's
+* tree_lock.  Therefore there is no race between settings and
+* clearing of this flag.
+*/
+   if (val > 0)
+   SetPageCgroupFileWriteback(pc);
+   else
+   ClearPageCgroupFileWriteback(pc);
+   idx = MEM_CGROUP_STAT_FILE_WRITEBACK;
+   break;
+
+   case MEMCG_NR_FILE_UNSTABLE_NFS:
+   /* Use Test{Set,Clear} to only un/charge the memcg once. */
+   if (val > 0) {
+   if (TestSetPageCgroupFileUnstableNFS(pc))
+   /* already set */
+   val = 0;
+   } else {
+   if (!TestClearPageCgroupFileUnstableNFS(pc))
+   /* already cleared */
+   val = 0;
+   }
+   idx = MEM_CGROUP_STAT_FILE_UNSTABLE_NFS;
+   break;
+
default:
BUG();
}
@@ -2133,6 +2178,16 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup 
*mem,
memcg_check_events(mem, pc->page);
 }
 
+static void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
+ struct mem_cgroup *to,
+ enum mem_cgroup_stat_index idx)
+{
+   preempt_disable();
+   __this_cpu_dec(from->stat->count[idx]);
+   __this_cpu_inc(to->stat->count[idx]);
+   preempt_enable();
+}
+
 /**
  * __mem_cgroup_move_account - move account of the page
  * @pc:page_cgroup of the page.
@@ -2159,13 +2214,18 @@ static void __mem_cgroup_move_account(struct 
page_cgroup *pc,
VM_BUG_ON(!PageCgroupUsed(pc));
VM_BUG_ON(pc->mem_cgroup != from);
 
-   if (PageCgroupFileMapped(pc)) {
-   /* Update mapped_file data for mem_cgroup */
-   preempt_disable();
-   

[Devel] [PATCH 04/10] memcg: disable local interrupts in lock_page_cgroup()

2010-10-04 Thread Greg Thelen
If pages are being migrated from a memcg, then updates to that
memcg's page statistics are protected by grabbing a bit spin lock
using lock_page_cgroup().  In an upcoming commit memcg dirty page
accounting will be updating memcg page accounting (specifically:
num writeback pages) from softirq.  Avoid a deadlocking nested
spin lock attempt by disabling interrupts on the local processor
when grabbing the page_cgroup bit_spin_lock in lock_page_cgroup().
This avoids the following deadlock:
statistic
  CPU 0 CPU 1
inc_file_mapped
rcu_read_lock
  start move
  synchronize_rcu
lock_page_cgroup
  softirq
  test_clear_page_writeback
  mem_cgroup_dec_page_stat(NR_WRITEBACK)
  rcu_read_lock
  lock_page_cgroup   /* deadlock */
  unlock_page_cgroup
  rcu_read_unlock
unlock_page_cgroup
rcu_read_unlock

By disabling interrupts in lock_page_cgroup, nested calls
are avoided.  The softirq would be delayed until after inc_file_mapped
enables interrupts when calling unlock_page_cgroup().

The normal, fast path, of memcg page stat updates typically
does not need to call lock_page_cgroup(), so this change does
not affect the performance of the common case page accounting.

Signed-off-by: Andrea Righi 
Signed-off-by: Greg Thelen 
---
 include/linux/page_cgroup.h |8 +-
 mm/memcontrol.c |   51 +-
 2 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index b59c298..872f6b1 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -117,14 +117,18 @@ static inline enum zone_type page_cgroup_zid(struct 
page_cgroup *pc)
return page_zonenum(pc->page);
 }
 
-static inline void lock_page_cgroup(struct page_cgroup *pc)
+static inline void lock_page_cgroup(struct page_cgroup *pc,
+   unsigned long *flags)
 {
+   local_irq_save(*flags);
bit_spin_lock(PCG_LOCK, &pc->flags);
 }
 
-static inline void unlock_page_cgroup(struct page_cgroup *pc)
+static inline void unlock_page_cgroup(struct page_cgroup *pc,
+ unsigned long flags)
 {
bit_spin_unlock(PCG_LOCK, &pc->flags);
+   local_irq_restore(flags);
 }
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4259f4..267d774 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1599,6 +1599,7 @@ void mem_cgroup_update_page_stat(struct page *page,
struct mem_cgroup *mem;
struct page_cgroup *pc = lookup_page_cgroup(page);
bool need_unlock = false;
+   unsigned long flags;
 
if (unlikely(!pc))
return;
@@ -1610,7 +1611,7 @@ void mem_cgroup_update_page_stat(struct page *page,
/* pc->mem_cgroup is unstable ? */
if (unlikely(mem_cgroup_stealed(mem))) {
/* take a lock against to access pc->mem_cgroup */
-   lock_page_cgroup(pc);
+   lock_page_cgroup(pc, &flags);
need_unlock = true;
mem = pc->mem_cgroup;
if (!mem || !PageCgroupUsed(pc))
@@ -1633,7 +1634,7 @@ void mem_cgroup_update_page_stat(struct page *page,
 
 out:
if (unlikely(need_unlock))
-   unlock_page_cgroup(pc);
+   unlock_page_cgroup(pc, flags);
rcu_read_unlock();
return;
 }
@@ -2053,11 +2054,12 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct 
page *page)
struct page_cgroup *pc;
unsigned short id;
swp_entry_t ent;
+   unsigned long flags;
 
VM_BUG_ON(!PageLocked(page));
 
pc = lookup_page_cgroup(page);
-   lock_page_cgroup(pc);
+   lock_page_cgroup(pc, &flags);
if (PageCgroupUsed(pc)) {
mem = pc->mem_cgroup;
if (mem && !css_tryget(&mem->css))
@@ -2071,7 +2073,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct 
page *page)
mem = NULL;
rcu_read_unlock();
}
-   unlock_page_cgroup(pc);
+   unlock_page_cgroup(pc, flags);
return mem;
 }
 
@@ -2084,13 +2086,15 @@ static void __mem_cgroup_commit_charge(struct 
mem_cgroup *mem,
 struct page_cgroup *pc,
 enum charge_type ctype)
 {
+   unsigned long flags;
+
/* try_charge() can return NULL to *memcg, taking care of it. */
if (!mem)
return;
 
-   lock_page_cgroup(pc);
+   lock_page_cgroup(pc, &flags);
if (unlikely(PageCgroupUsed(pc))) {
-   unlock_page_cgroup(pc);
+   unlock_page_cgroup(pc, flags);
mem_cgroup_cancel_charge(mem);
   

[Devel] [PATCH 02/10] memcg: document cgroup dirty memory interfaces

2010-10-04 Thread Greg Thelen
Document cgroup dirty memory interfaces and statistics.

Signed-off-by: Andrea Righi 
Signed-off-by: Greg Thelen 
---
 Documentation/cgroups/memory.txt |   37 +
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 7781857..eab65e2 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -385,6 +385,10 @@ mapped_file- # of bytes of mapped file (includes 
tmpfs/shmem)
 pgpgin - # of pages paged in (equivalent to # of charging events).
 pgpgout- # of pages paged out (equivalent to # of uncharging 
events).
 swap   - # of bytes of swap usage
+dirty  - # of bytes that are waiting to get written back to the disk.
+writeback  - # of bytes that are actively being written back to the disk.
+nfs- # of bytes sent to the NFS server, but not yet committed to
+   the actual storage.
 inactive_anon  - # of bytes of anonymous memory and swap cache memory on
LRU list.
 active_anon- # of bytes of anonymous and swap cache memory on active
@@ -453,6 +457,39 @@ memory under it will be reclaimed.
 You can reset failcnt by writing 0 to failcnt file.
 # echo 0 > .../memory.failcnt
 
+5.5 dirty memory
+
+Control the maximum amount of dirty pages a cgroup can have at any given time.
+
+Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
+page cache used by a cgroup.  So, in case of multiple cgroup writers, they will
+not be able to consume more than their designated share of dirty pages and will
+be forced to perform write-out if they cross that limit.
+
+The interface is equivalent to the procfs interface: /proc/sys/vm/dirty_*.  It
+is possible to configure a limit to trigger both a direct writeback or a
+background writeback performed by per-bdi flusher threads.  The root cgroup
+memory.dirty_* control files are read-only and match the contents of
+the /proc/sys/vm/dirty_* files.
+
+Per-cgroup dirty limits can be set using the following files in the cgroupfs:
+
+- memory.dirty_ratio: the amount of dirty memory (expressed as a percentage of
+  cgroup memory) at which a process generating dirty pages will itself start
+  writing out dirty data.
+
+- memory.dirty_bytes: the amount of dirty memory (expressed in bytes) in the
+  cgroup at which a process generating dirty pages will start itself writing 
out
+  dirty data.
+
+- memory.dirty_background_ratio: the amount of dirty memory of the cgroup
+  (expressed as a percentage of cgroup memory) at which background writeback
+  kernel threads will start writing out dirty data.
+
+- memory.dirty_background_bytes: the amount of dirty memory (expressed in 
bytes)
+  in the cgroup at which background writeback kernel threads will start writing
+  out dirty data.
+
 6. Hierarchy support
 
 The memory controller supports a deep hierarchy and hierarchical accounting.
-- 
1.7.1

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel