I am sponsoring this fast-track for Menno Lageman.
Thanks,
Jerry
Template Version: @(#)sac_nextcase %I% %G% SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
max-processes rctl
1.2. Name of Document Author/Supplier:
Author: Menno Lageman
1.3 Date of This Document:
22 January, 2009
4. Technical Description
1. Summary
This case proposes the addition of two new resource controls to limit the
number of procesess in a zone or project at any one time to prevent the
problem described in 6631612 (non-global zone can overrun the process table
of the system)[1]. For observability of resource usage and limits, this
case also adds kstats similar to those introduced by PSARC 2006/598[2].
Patch binding is requested.
2. Description
The max-processes resource controls introduced by this case limit the number
of slots in the process table that a zone or project may occupy. Currently
there is no way to prevent a zone or project from exhausting the process
table (intentionally or by accident), thus allowing a non-global zone to
impact other non-global zones and the system as a whole.
While the existing max-lwps resource controls offer some protection, they
cannot prevent zombie processes from filling up the process table. Zombie
processes by definition do not have any lwps and are thus not limited by
the max-lwps resource controls. They do however still take up a slot in
the process table until their exit status is reaped, leading to the problem
described in 6631612 where a misbehaving application in a non-global zone
created thousands of zombies eventually using up all process table slots
and preventing new processes from being created on the system.
The zone.max-processes and project.max-processes resource controls are
enforced during fork(2). The project.max-procesess resource control is
also enforced when switching to another project. Processes running in the
'system' project in the global zone are exempt from both resource controls.
Since the goal of these resource controls is to protect the system and
other non-global zones from a misbehaving non-global zone, all processes
in non-global zones are subject to these resource controls without
exception. Setting these resource controls on the global zone is allowed
but not recommended.
For compatibility with current behavior the max-processes resource controls
only have an unlimited 'system' limit by default. To limit the number of
processes an administrator must add a 'privileged' limit. The zonecfg(1M)
utility will be enhanced to support setting the limit for a zone using
the short form 'set max-processes=n' or the long form 'add rctl ...'. To
prevent an administrator from setting the limit too low, zonecfg(1M) will
enforce a safe lower limit of 100 processes (cf. max-lwps).
The max-processes resource control only limits the number of process table
slots used; setting the max-lwps resource control is still required to
limit the number of lwps. Rather than require the administrator to always
configure both resource controls, the zonecfg(1M) max-processes property
will provide a single point of control for both resource controls. If the
max-processes property is set but the max-lwps property is not set, the
max-lwps resource control will be configured automatically at zone boot
with a reasonable limit derived from the value of max-processes and a
multiplier of 10. This value was chosen based on analysis of data
collected on customer systems by the Explorer tool.
The kstats will have three statistics:
usage: The current quantity of resource consumed.
value: The current enforced cap.
zonename: The name of the zone.
The global zone will see kstats for all zones, non-global zones can only
observe their own kstats.
3. Exported Interfaces
INTERFACE TYPE COMMITMENT
project.max-processes resource control Committed
zone.max-processes resource control Committed
max-processes zonecfg(1M) property Committed
caps:{zoneid}:nprocs_zone_{zoneid} kstat Uncommitted
caps:{zoneid}:nprocs_project_{projectid} kstat Uncommitted
4. References
[1] 6631612 non-global zone can overrun the process table of the system
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6631612
[2] PSARC 2006/598 Swap resource control; locked memory RM improvements
http://opensolaris.org/os/community/arc/caselog/2006/598/
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open