I am sponsoring this fast-track for Menno Lageman. Thanks, Jerry
Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: max-processes rctl 1.2. Name of Document Author/Supplier: Author: Menno Lageman 1.3 Date of This Document: 22 January, 2009 4. Technical Description 1. Summary This case proposes the addition of two new resource controls to limit the number of procesess in a zone or project at any one time to prevent the problem described in 6631612 (non-global zone can overrun the process table of the system)[1]. For observability of resource usage and limits, this case also adds kstats similar to those introduced by PSARC 2006/598[2]. Patch binding is requested. 2. Description The max-processes resource controls introduced by this case limit the number of slots in the process table that a zone or project may occupy. Currently there is no way to prevent a zone or project from exhausting the process table (intentionally or by accident), thus allowing a non-global zone to impact other non-global zones and the system as a whole. While the existing max-lwps resource controls offer some protection, they cannot prevent zombie processes from filling up the process table. Zombie processes by definition do not have any lwps and are thus not limited by the max-lwps resource controls. They do however still take up a slot in the process table until their exit status is reaped, leading to the problem described in 6631612 where a misbehaving application in a non-global zone created thousands of zombies eventually using up all process table slots and preventing new processes from being created on the system. The zone.max-processes and project.max-processes resource controls are enforced during fork(2). The project.max-procesess resource control is also enforced when switching to another project. Processes running in the 'system' project in the global zone are exempt from both resource controls. Since the goal of these resource controls is to protect the system and other non-global zones from a misbehaving non-global zone, all processes in non-global zones are subject to these resource controls without exception. Setting these resource controls on the global zone is allowed but not recommended. For compatibility with current behavior the max-processes resource controls only have an unlimited 'system' limit by default. To limit the number of processes an administrator must add a 'privileged' limit. The zonecfg(1M) utility will be enhanced to support setting the limit for a zone using the short form 'set max-processes=n' or the long form 'add rctl ...'. To prevent an administrator from setting the limit too low, zonecfg(1M) will enforce a safe lower limit of 100 processes (cf. max-lwps). The max-processes resource control only limits the number of process table slots used; setting the max-lwps resource control is still required to limit the number of lwps. Rather than require the administrator to always configure both resource controls, the zonecfg(1M) max-processes property will provide a single point of control for both resource controls. If the max-processes property is set but the max-lwps property is not set, the max-lwps resource control will be configured automatically at zone boot with a reasonable limit derived from the value of max-processes and a multiplier of 10. This value was chosen based on analysis of data collected on customer systems by the Explorer tool. The kstats will have three statistics: usage: The current quantity of resource consumed. value: The current enforced cap. zonename: The name of the zone. The global zone will see kstats for all zones, non-global zones can only observe their own kstats. 3. Exported Interfaces INTERFACE TYPE COMMITMENT project.max-processes resource control Committed zone.max-processes resource control Committed max-processes zonecfg(1M) property Committed caps:{zoneid}:nprocs_zone_{zoneid} kstat Uncommitted caps:{zoneid}:nprocs_project_{projectid} kstat Uncommitted 4. References [1] 6631612 non-global zone can overrun the process table of the system http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6631612 [2] PSARC 2006/598 Swap resource control; locked memory RM improvements http://opensolaris.org/os/community/arc/caselog/2006/598/ 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open