The Project
This project proposes extensions to the fault management architecture
(FMA) to support a sensor abstraction layer for the collection and
analysis of sensor based telemetry that can be used in fault and
resource management.
The Problem
How do we manage raw telemetry data kept, maintained and exported by
disparate sources for the purposes of fault, resource management and
budgeting? Today, there are a number of sensor collection mechanisms
exported by the hardware and software. For the most part, the
information they export is hap-haphazardly presented and accessed
according to ad-hoc operating system interfaces, per-platform methods or
per-subsystem industry standards (SMBus, SMART and IPMI). Using this
data for fault or resource management is clumsy and typically requires
low-level system knowledge baked into higher-level management applications.
Key Objectives
As part of an overall sensor abstraction layer based on our current
fault management architecture, we can solve the problem described in
section 1.1 and provide a better understanding of the overall health and
usage of a system through more sophisticated diagnosis technologies and
fine-grained observability of sensor data via common access methods. A
sensor abstraction layer must posses:
1. the ability to alert the administrator to conditions observed by
platform sensors that may impact the operational state of the
platform.
2. the ability to alert the administrator to conditions that resolve
themselves as observed by platform sensors.
3. the ability to watch one or more sensors and correlate the data for
predictive fault analysis or resource management.
4. the ability to continuously record sensor data and retrieve it from
systems for offline analysis, future system design or development of
more advanced diagnosis algorithms.
5. the ability for administrators and service personnel to manually
inspect sensor values without having to understand the exact
implementation (e.g. IPMI or SMBus).
6. the ability to connect sensor data to higher-level diagnosis (e.g.
SMART disk data to SCSI and ZFS diagnosis engines)
7. the ability to understand and observe performance and power budgets
based on raw sensor data.
Cindi
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org