[
https://issues.apache.org/jira/browse/AMBARI-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Hurley resolved AMBARI-8569.
-------------------------------------
Resolution: Fixed
> Alert JSON Files Need Descriptions
> ----------------------------------
>
> Key: AMBARI-8569
> URL: https://issues.apache.org/jira/browse/AMBARI-8569
> Project: Ambari
> Issue Type: Task
> Components: alerts
> Affects Versions: 2.0.0
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Fix For: 2.0.0
>
>
> BUG-28018 adds a new {{description}} field to an alert definition. The
> {{alerts.json}} files for every service in every stack should be updated to
> have this field for each alert definition.
> |DateNode Process | HDFS | This host-level alert is triggered if the
> individual DataNode processes cannot be established to be up and listening on
> the network for the configured critical threshold.|
> |NameNode Process | HDFS | This host-level alert is triggered if the NameNode
> process cannot be confirmed to be up and listening on the network for the
> configured critical threshold.|
> |NameNode Host CPU Utilization | HDFS |This host-level alert is triggered if
> CPU utilization of the NameNode exceeds certain warning and critical
> thresholds. It checks the NameNode JMX Servlet for the SystemCPULoad
> property. |
> |NameNode Blocks Health | HDFS | This service-level alert is triggered if the
> number of corrupt or missing blocks exceeds the configured critical
> threshold.|
> |DataNode Storage| HDFS | This host-level alert is triggered if storage
> capacity if full on the DataNode. It checks the DataNode JMX Servlet for the
> Capacity and Remaining properties. |
> |NameNode Web UI | HDFS | This host-level alert is triggered if the NameNode
> Web UI is unreachable.|
> |Percent DataNodes With Available Space | HDFS | This service-level alert is
> triggered if the storage if full on a certain percentage of DataNodes exceed
> the warning and critical thresholds. |
> |Percent DataNodes Available | HDFS | This alert is triggered if the number
> of down DataNodes in the cluster is greater than the configured critical
> threshold. It aggregates the results of DataNode process checks.|
> |NameNode RPC Latency | HDFS |his host-level alert is triggered if the
> NameNode operations RPC latency exceeds the configured critical threshold.
> Typically an increase in the RPC processing time increases the RPC queue
> length, causing the average queue wait time to increase for NameNode
> operations.|
> |HDFS Capacity Utilization | HDFS |This service-level alert is triggered if
> the HDFS capacity utilization exceeds the configured warning and critical
> thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and
> CapacityRemaining properties.|
> |DataNode Web UI | HDFS | This host-level alert is triggered if the DataNode
> Web UI is unreachable.|
> |Secondary NameNode Process | HDFS | This host-level alert is triggered if
> the Secondary NameNode process cannot be confirmed to be up and listening on
> the network for the configured critical threshold.|
> |JournalNode Process | HDFS |This host-level alert is triggered if the
> JournalNode process cannot be confirmed to be up and listening on the network
> for the configured critical threshold.
> |ZooKeeper Failover Controller Process | HDFS | This host-level alert is
> triggered if the ZooKeeper Failover Controller process cannot be confirmed to
> be up and listening on the network for the configured critical threshold.|
> |Percent JournalNodes Available | HDFS | This alert is triggered if the
> number of down JournalNodes in the cluster is greater than the configured
> critical threshold. It aggregates the results of JournalNode process checks.
> |NameNode High Availability Health | HDFS | This service-level alert is
> triggered if either the Active NameNode or Standby NameNode are not running.
> |
> |History Server Process | MAPREDUCE2 | This host-level alert is
> triggered if the HistoryServer process cannot be established to be up and
> listening on the network for the configured critical threshold|
> |History Server RPC Latency | MAPREDUCE2 |This host-level alert is triggered
> if the HistoryServer operations RPC latency exceeds the configured critical
> threshold. Typically an increase in the RPC processing time increases the RPC
> queue length, causing the average queue wait time to increase for operations.
>
> |History Server CPU Utilization | MAPREDUCE2 | This host-level alert is
> triggered if the percent of CPU utilization on the HistoryServer exceeds the
> configured critical threshold.|
> |History Server Web UI | MAPREDUCE2 | This host-level alert is triggered if
> the HistoryServer Web UI is unreachable. |
> |ZooKeeper Server Process | ZOOKEEPER | This host-level alert is
> triggered if the ZooKeeper server process cannot be determined to be up and
> listening on the network for the configured critical threshold.|
> |Percent ZooKeeper Servers Available | ZOOKEEPER |This service-level alert is
> triggered if the configured percentage of ZooKeeper processes cannot be
> determined to be up and listening on the network for the configured critical
> threshold. It aggregates the results of ZooKeeper process checks.|
> |ResourceManager RPC Latency | YARN | This host-level alert is triggered if
> the ResourceManager operations RPC latency exceeds the configured critical
> threshold. Typically an increase in the RPC processing time increases the RPC
> queue length, causing the average queue wait time to increase for
> ResourceManager operations.|
> |ResourceManager CPU Utilization | YARN | This host-level alert is triggered
> if CPU utilization of the ResourceManager exceeds certain warning and
> critical thresholds. It checks the ResourceManager JMX Servlet for the
> SystemCPULoad property.|
> |NodeManager Health | YARN | This host-level alert checks the node health
> property available from the NodeManager component.|
> |Percent NodeManagers Available | YARN | This alert is triggered if the
> number of down NodeManagers in the cluster is greater than the configured
> critical threshold. It aggregates the results of NodeManager process checks.
> |
> |ResourceManager Web UI | YARN | This host-level alert is triggered if
> the ResourceManager Web UI is unreachable.|
> |App Timeline Web UI | YARN | This host-level alert is triggered if the App
> Timeline Server Web UI is unreachable.|
> |NodeManager Web UI | YARN |This host-level alert is triggered if the
> NodeManager Web UI is unreachable.|
> |NameNode Last Checkpoint | HDFS |Checks the last time that the NameNode
> performed a checkpoint. This script will also check for the number of
> uncommitted transactions.|
> |NameNode Directory Status | HDFS |It checks the NameNode JMX Servlet for the
> NameDirStatuses metric to see if any directories report a failure.|
> |Percent RegionServers process|HBASE|This service-level alert is triggered if
> the configured percentage of Region Server processes cannot be determined to
> be up and listening on the network for the configured warning and critical
> thresholds. It aggregates the results of RegionServer process down checks.
> |Percent HBase Master process|HBASE|This alert is triggered if the HBase
> master processes cannot be confirmed to be up and listening on the network
> for the configured critical threshold, given in seconds. |
> |HBase Master Web UI|HBASE|This host-level alert is triggered if the HBase
> Master Web UI is unreachable.|
> |Percent HBase Master CPU utilization|HBASE|This host-level alert is
> triggered if CPU utilization of the HBase Master exceeds certain warning and
> critical thresholds. It checks the HBase Master JMX Servlet for the
> SystemCPULoad property.|
> |RegionServer process|HBASE|This host-level alert is triggered if the
> RegionServer processes cannot be confirmed to be up and listening on the
> network for the configured critical threshold, given in seconds.|
> |Hive Metastore status|HIVE|This host-level alert is triggered if the Hive
> Metastore process cannot be determined to be up and listening on the network
> for the configured critical threshold.|
> |WebHCat Server process|HIVE|This host-level alert is triggered if the
> WebHCat server cannot be determined to be up and responding to client
> requests.|
> |Oozie Server process|OOZIE|This host-level alert is triggered if the Oozie
> server cannot be determined to be up and responding to client requests.|
> |Knox Gateway process|KNOX|This host-level alert is triggered if the Knox
> Gateway cannot be determined to be up.|
> |Kafka Broker process|KAFKA|This host-level alert is triggered if the Kafka
> Broker cannot be determined to be up.|
> |Falcon Server Web UI|FALCON|This host-level alert is triggered if the Falcon
> Server Web UI is unreachable.|
> |Falcon Server process UI|FALCON|This host-level alert is triggered if the
> Falcon Server cannot be determined to be up.|
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)