I am sponsoring this case on behalf of Eric Saxe. The release binding
is patch/micro. We believe that this is a self review, but can be
changed to a regular fasttrack should someone request.
Template Version: @(#)sac_nextcase %I% %G% SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
cpupm keyword mode extensions
1.2. Name of Document Author/Supplier:
Author: Eric Saxe
1.3 Date of This Document:
18 December, 2008
4. Technical Description
Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name: cpupm keyword mode extensions
1.2. Name of Document Author/Supplier: Eric Saxe
1.3. Date of This Document: 12/1/2008
1.4. Name of Major Document Customer(s)/Consumer(s): PSARC
1.5. Email Aliases:
1.5.1. Responsible Manager: darrin.johnson at sun.com
1.5.2. Responsible Engineer: eric.saxe at sun.com
1.5.4. Interest List: tesla-dev at opensolaris.org
2. Project Summary
2.1. Project Description:
The Power Aware Dispatcher Project seeks to provide an event
based CPU power management architecture, where CPU power
state changes (Frequency/Voltage scaling) are triggered by
changes in processor utilization tracked by the dispatcher's
broader efforts to optimize thread placement across Chip
Multi-Threading and NUMA system architectures.
As part of this effort, the project seeks to extend the cpupm keyword
documented in power.conf(4), to support an optional "mode" enabling
customers to explicitly choose event based or polling based CPU power
management.
2.2. Risks and Assumptions:
It is anticipated that the vast majority of customers won't want
or need to explicitly specify polling vs. event mode behavior,
in which case the system will default to one or another.
3. Business Summary
3.1. Problem Area:
This proposal describes extensions to an interface supported by
power.conf(4) used in configuring the system's CPU power management
behavior, which in turn relates to the power efficiency of
[Open]Solaris based systems.
3.3. Business Justification:
Allowing customers to express preference for event based vs polling
based CPUPM provides a way to fall back on the existing polling
implementation...should cases arise where polling based CPUPM is
preferred or where polling based CPUPM best meets the customers
power/performance objectives.
3.4. Competitive Analysis:
It is believe that an event based, dispatcher driven/assisted CPUPM
architecture will drive improved power/performance in [Open]Solaris.
A patch is available for Linux that provides for scheduler aware CPU
power management.
See: http://lesswatts.org/tips/cpu.php#smpsched
3.5. Opportunity Window/Exposure:
Now.
3.6. How will you know when you are done?:
When the work described in this proposal is complete, and integrated
into the release.
4. Technical Description:
4.1. Details:
pmconfig(1M) parses and interprets /etc/power.conf, and configures
the system's power management features via ioctl(2)s to the pm kernel
module. Where the cpupm keyword is encountered, the driver's
pm_ioctl() interface configures the system's CPU power management
features. As documented in power.conf(4):
| A cpupm entry may be used to enable or disable Power Manage-
| ment of CPUs on a system-wide basis, independent of autopm.
| The format of the cpupm entry is:
|
| cpupm behavior
|
| Current acceptable behavior values and their meanings are :
|
| enable CPU Power Management will be started when this
| entry is encountered.
|
|
| disable CPU Power Management will be stopped when this
| entry is encountered.
This work will allow cpupm to take an additional optional argument
"mode" when the enable behavior is specified:
cpupm enable [mode]
where "mode" is one of:
event-mode CPU power state transitions will be driven by thread
scheduler/dispatcher events. The cpu-threshold, and
system-threshold directives are not used for CPUs when
operating in event-mode.
poll-mode The Power Management framework will poll the idleness of
the system's CPUs, and will manage their power once
idle for the period of time specified by either the
system-threshold or cpu-threshold.
Where the "enable" behavior is specified without a "mode" argument,
CPU Power Management will be enabled and the system will default to
using one of the modes.
Some existing keywords supported by power.conf(4) assume a polling
based CPU power management architecture. cpu-threshold is one
example:
| If a system has power manageable CPUs, these may be managed
| independently of the system idleness threshold by using one
| of the following entries:
|
| cpu-threshold threshold
|
|
| cpu-threshold always-on
|
| where threshold is the value of the CPU idleness threshold
| in hours, minutes or seconds as indicated by a trailing h, m
| or s (defaulting to seconds if only a number is given). If
| always-on is specified, then by default, all CPUs will be
| left at full power.
This work provides an opportunity for us to document in
power.conf(4) the availability of the event based architecture,
contrast it against the polling based architecture, and indicate that
where event based architecture is used, the time based cpu-threshold
keyword won't be relevant if expressed.
4.2. Bug/RFE Number(s): 6567156
4.4. Out of Scope:
The broader "power aware dispatcher" RFE (6567156) is outside the
scope of this proposal, which captures only the power.conf(4)
interface changes.
4.5. Interfaces:
This project will import these existing interfaces.
Interface stability will be "committed".
Import:
power.conf(4) (PSARC/1992/202)
pmconfig(1m) (PSARC/1992/202)
Export:
"mode" argument for the "cpupm enable" keyword/behavior
power.conf(4) man page addition under section describing cpupm
keyword behaviors:
| Where the behavior is "enable", an optional "mode" argument may be
| specified:
|
| cpupm enable mode
|
| Acceptable mode values and their meanings are:
|
| event-mode CPU power state transitions will be driven by thread
| scheduler/dispatcher events. The cpu-threshold, and
| system-threshold keywords are not used for CPUs in
| this mode.
|
| poll-mode The Power Management framework will poll the idleness
| of the system's CPUs, and will manage their power
| once idle for the period of time specified by either
| the system-threshold or cpu-threshold.
power.conf(4) man page addition under section for the cpu-threshold
keyword:
| The cpu-threshold keyword is used only when CPU Power Management has
| been configured to operate in poll-mode, which is expressed through
| the cpupm keyword.
power.conf(4) man page addition under section for the
system-threshold keyword:
| The system-threshold is applicable to CPU Power Management only when
| CPU Power Management has been configured to operate in poll-mode,
| which is expressed through the cpupm keyword.
+========================================================================+
|Release Binding: Patch/Micro |
| |
|Imports: |
| |
|Name Case Classification |
|------------------------------------------------------------------------|
|power.conf(4) PSARC/1992/202 Committed |
|pmconfig(1m) PSARC/1992/202 Committed |
| |
| |
|Exports: |
| |
|Name Classification Comments |
|------------------------------------------------------------------------|
|Optional "mode" argument for "cpupm enable": |
|poll-mode Committed Polling based CPU power management |
|event-mode Committed Dispatcher / Event based CPUPM |
+========================================================================+
4.6. Doc Impact:
power.conf man page. See above.
4.7. Admin/Config Impact:
It is the intention of this propsoal to be consistent with
the "principle of least surprise". That is, existing power.conf(4)
configuration files will continue to work, and the system's
power/performance characteristics will transparently improve.
Where administrators would wish to fallback on the existing CPUPM
mechanism, and continue to use the time based CPU utilization
thresholds supported by power.conf, poll-mode will provide this.
4.8. HA Impact: None.
4.9. I18N/L10N Impact: No.
4.10. Packaging & Delivery:
This change will be delivered as part of the Power Aware Dispatcher
RFE. These changes will be made at the same time:
kernel package
cpudrv package
power.conf package
pmconfig package
4.11. Security Impact: None.
4.12. Dependencies: power.conf, pmconfig(1M)
5. Reference Documents:
Advanced Configuration and Power Interface:
http://www.acpi.info/
6. Resources and Schedule:
6.1. Projected Availability: Winter 2008
6.4. Product Approval Committee requested information:
6.4.3. Type of CPT Review and Approval expected: BugFix
6.4.5. Is this a necessary project for OEM agreements: Yes.
A deliverable of the Intel/SUN master collaboration agreement.
6.4.7. Target RTI Date/Release:
RTI around onnv_106 and S10U8
6.4.8. Target Code Design Review Date: 10/10/2008
6.4.9. Update approval addition: No.
6.5. ARC review type: SelfReview
7. Prototype Availability:
7.1. Prototype Availability:
Prototype has been available via OpenSolaris since October 2008.
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: Automatic
6.6. ARC Exposure: open