Github user govind-menon commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2845#discussion_r220219222
  
    --- Diff: docs/ClusterMetrics.md ---
    @@ -0,0 +1,256 @@
    +---
    +title: Cluster Metrics
    +layout: documentation
    +documentation: true
    +---
    +
    +#Cluster Metrics
    +
    +There are lots of metrics to help you monitor a running cluster.  Many of 
these metrics are still a work in progress and so is the metrics system itself 
so any of them may change, even between minor version releases.  We will try to 
keep them as stable as possible, but they should all be considered somewhat 
unstable. Some of the metrics may also be for experimental features, or 
features that are not complete yet, so please read the description of the 
metric before using it for monitoring or alerting.
    +
    +Also be aware that depending on the metrics system you use, the names are 
likely to be translated into a different format that is compatible with the 
system.  Typically this means that the ':' separating character will be 
replaced with a '.' character.
    +
    +Most metrics should have the units that they are reported in as a part of 
the description.  For Timers often this is configured by the reporter that is 
uploading them to your system.  Pay attention because even if the metric name 
has a time unit in it, it may be false.
    +
    +Also most metrics, except for gauges and counters, are a collection of 
numbers, and not a single value.  Often these result in multiple metrics being 
uploaded to a reporting system, such as percentiles for a histogram, or rates 
for a meter.  It is dependent on the configured metrics reporter how this 
happens, or how the name here corresponds to the metric in your reporting 
system.
    +
    +## Cluster Metrics (From Nimbus)
    +
    +These are metrics that come from the active nimbus instance and report the 
state of the cluster as a whole, as seen by nimbus.
    +
    +| Metric Name | Type | Description |
    +|-------------|------|-------------|
    +| cluster:num-nimbus-leaders | gauge | Number of nimbuses marked as a 
leader. This should really only ever be 1 in a health cluster, or 0 for a short 
period of time while a failover happens. |
    --- End diff --
    
    Nit: healthy


---

Reply via email to