Add documentation on memcg oom reserves to Documentation/cgroups/memory.txt and give an example of its usage and recommended best practices.
Signed-off-by: David Rientjes <rient...@google.com> --- Documentation/cgroups/memory.txt | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -71,6 +71,7 @@ Brief summary of control files. (See sysctl's vm.swappiness) memory.move_charge_at_immigrate # set/show controls of moving charges memory.oom_control # set/show oom controls. + memory.oom_reserve_in_bytes # set/show limit of oom memory reserves memory.numa_stat # show the number of memory usage per numa node memory.kmem.limit_in_bytes # set/show hard limit for kernel memory @@ -772,6 +773,31 @@ At reading, current status of OOM is shown. under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may be stopped.) +Processes that handle oom conditions in their own memcgs or their child +memcgs may need to allocate memory themselves to do anything useful, +including pagefaulting its text or allocating kernel memory to read the +memcg "tasks" file. For this reason, memory.oom_reserve_in_bytes is +provided that specifies how much memory that processes waiting on +memory.oom_control can allocate above the memcg limit. + +The memcg that the oom handler is attached to is charged for the memory +that it allocates against its own memory.oom_reserve_in_bytes. This +memory is therefore only available to processes that are waiting for +a notification. + +For example, if you do + + # echo 2m > memory.oom_reserve_in_bytes + +then any process attached to this memcg that is waiting on memcg oom +notifications anywhere on the system can allocate an additional 2MB +above memory.limit_in_bytes. + +You may still consider doing mlockall(MCL_FUTURE) for processes that +are waiting on oom notifications to keep this vaue as minimal as +possible, or allow it to be large enough so that its text can still +be pagefaulted in under oom conditions when the value is known. + 11. Memory Pressure The pressure level notifications can be used to monitor the memory -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/