[jira] [Resolved] (AMBARI-8569) Alert JSON Files Need Descriptions

Jonathan Hurley (JIRA) Mon, 08 Dec 2014 08:21:34 -0800

     [ 
https://issues.apache.org/jira/browse/AMBARI-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Hurley resolved AMBARI-8569.
-------------------------------------
    Resolution: Fixed

> Alert JSON Files Need Descriptions
> ----------------------------------
>
>                 Key: AMBARI-8569
>                 URL: https://issues.apache.org/jira/browse/AMBARI-8569
>             Project: Ambari
>          Issue Type: Task
>          Components: alerts
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>             Fix For: 2.0.0
>
>
> BUG-28018 adds a new {{description}} field to an alert definition. The 
> {{alerts.json}} files for every service in every stack should be updated to 
> have this field for each alert definition.
> |DateNode Process | HDFS | This host-level alert is triggered if the 
> individual DataNode processes cannot be established to be up and listening on 
> the network for the configured critical threshold.|
> |NameNode Process | HDFS | This host-level alert is triggered if the NameNode 
> process cannot be confirmed to be up and listening on the network for the 
> configured critical threshold.|               
> |NameNode Host CPU Utilization | HDFS |This host-level alert is triggered if 
> CPU utilization of the NameNode exceeds certain warning and critical 
> thresholds. It checks the NameNode JMX Servlet for the SystemCPULoad 
> property. |
> |NameNode Blocks Health | HDFS | This service-level alert is triggered if the 
> number of corrupt or missing blocks exceeds the configured critical 
> threshold.|
> |DataNode Storage| HDFS | This host-level alert is triggered if storage 
> capacity if full on the DataNode. It checks the DataNode JMX Servlet for the 
> Capacity and Remaining properties. |
> |NameNode Web UI | HDFS | This host-level alert is triggered if the NameNode 
> Web UI is unreachable.|  
> |Percent DataNodes With Available Space | HDFS | This service-level alert is 
> triggered if the storage if full on a certain percentage of DataNodes exceed 
> the warning and critical thresholds. |              
> |Percent DataNodes Available | HDFS | This alert is triggered if the number 
> of down DataNodes in the cluster is greater than the configured critical 
> threshold. It aggregates the results of DataNode process checks.|
> |NameNode RPC Latency | HDFS |his host-level alert is triggered if the 
> NameNode operations RPC latency exceeds the configured critical threshold. 
> Typically an increase in the RPC processing time increases the RPC queue 
> length, causing the average queue wait time to increase for NameNode 
> operations.|
> |HDFS Capacity Utilization | HDFS |This service-level alert is triggered if 
> the HDFS capacity utilization exceeds the configured warning and critical 
> thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and 
> CapacityRemaining properties.|
> |DataNode Web UI | HDFS | This host-level alert is triggered if the DataNode 
> Web UI is unreachable.|
> |Secondary NameNode Process | HDFS | This host-level alert is triggered if 
> the Secondary NameNode process cannot be confirmed to be up and listening on 
> the network for the configured critical threshold.|
> |JournalNode Process | HDFS |This host-level alert is triggered if the 
> JournalNode process cannot be confirmed to be up and listening on the network 
> for the configured critical threshold.
> |ZooKeeper Failover Controller Process | HDFS | This host-level alert is 
> triggered if the ZooKeeper Failover Controller process cannot be confirmed to 
> be up and listening on the network for the configured critical threshold.|
> |Percent JournalNodes Available | HDFS | This alert is triggered if the 
> number of down JournalNodes in the cluster is greater than the configured 
> critical threshold. It aggregates the results of JournalNode process checks.
> |NameNode High Availability Health | HDFS | This service-level alert is 
> triggered if either the Active NameNode or Standby NameNode are not running. 
> |        
> |History Server Process | MAPREDUCE2 |        This host-level alert is 
> triggered if the HistoryServer process cannot be established to be up and 
> listening on the network for the configured critical threshold|      
> |History Server RPC Latency | MAPREDUCE2 |This host-level alert is triggered 
> if the HistoryServer operations RPC latency exceeds the configured critical 
> threshold. Typically an increase in the RPC processing time increases the RPC 
> queue length, causing the average queue wait time to increase for operations. 
>          
> |History Server CPU Utilization | MAPREDUCE2 | This host-level alert is 
> triggered if the percent of CPU utilization on the HistoryServer exceeds the 
> configured critical threshold.|
> |History Server Web UI | MAPREDUCE2 | This host-level alert is triggered if 
> the HistoryServer Web UI is unreachable.  |       
> |ZooKeeper Server Process | ZOOKEEPER |       This host-level alert is 
> triggered if the ZooKeeper server process cannot be determined to be up and 
> listening on the network for the configured critical threshold.|   
> |Percent ZooKeeper Servers Available | ZOOKEEPER |This service-level alert is 
> triggered if the configured percentage of ZooKeeper processes cannot be 
> determined to be up and listening on the network for the configured critical 
> threshold. It aggregates the results of ZooKeeper process checks.|
> |ResourceManager RPC Latency | YARN | This host-level alert is triggered if 
> the ResourceManager operations RPC latency exceeds the configured critical 
> threshold. Typically an increase in the RPC processing time increases the RPC 
> queue length, causing the average queue wait time to increase for 
> ResourceManager operations.|
> |ResourceManager CPU Utilization | YARN | This host-level alert is triggered 
> if CPU utilization of the ResourceManager exceeds certain warning and 
> critical thresholds. It checks the ResourceManager JMX Servlet for the 
> SystemCPULoad property.|
> |NodeManager Health | YARN | This host-level alert checks the node health 
> property available from the NodeManager component.|
> |Percent NodeManagers Available | YARN | This alert is triggered if the 
> number of down NodeManagers in the cluster is greater than the configured 
> critical threshold. It aggregates the results of NodeManager process checks. 
> |      
> |ResourceManager Web UI | YARN        | This host-level alert is triggered if 
> the ResourceManager Web UI is unreachable.|
> |App Timeline Web UI | YARN | This host-level alert is triggered if the App 
> Timeline Server Web UI is unreachable.|
> |NodeManager Web UI | YARN |This host-level alert is triggered if the 
> NodeManager Web UI is unreachable.|
> |NameNode Last Checkpoint | HDFS |Checks the last time that the NameNode 
> performed a checkpoint. This script will also check for the number of 
> uncommitted transactions.|
> |NameNode Directory Status | HDFS |It checks the NameNode JMX Servlet for the 
> NameDirStatuses metric to see if any directories report a failure.|
> |Percent RegionServers process|HBASE|This service-level alert is triggered if 
> the configured percentage of Region Server processes cannot be determined to 
> be up and listening on the network for the configured warning and critical 
> thresholds. It aggregates the results of RegionServer process down checks.
> |Percent HBase Master process|HBASE|This alert is triggered if the HBase 
> master processes cannot be confirmed to be up and listening on the network 
> for the configured critical threshold, given in seconds. |
> |HBase Master Web UI|HBASE|This host-level alert is triggered if the HBase 
> Master Web UI is unreachable.|
> |Percent HBase Master CPU utilization|HBASE|This host-level alert is 
> triggered if CPU utilization of the HBase Master exceeds certain warning and 
> critical thresholds. It checks the HBase Master JMX Servlet for the 
> SystemCPULoad property.|
> |RegionServer process|HBASE|This host-level alert is triggered if the 
> RegionServer processes cannot be confirmed to be up and listening on the 
> network for the configured critical threshold, given in seconds.|
> |Hive Metastore status|HIVE|This host-level alert is triggered if the Hive 
> Metastore process cannot be determined to be up and listening on the network 
> for the configured critical threshold.|
> |WebHCat Server process|HIVE|This host-level alert is triggered if the 
> WebHCat server cannot be determined to be up and responding to client 
> requests.|
> |Oozie Server process|OOZIE|This host-level alert is triggered if the Oozie 
> server cannot be determined to be up and responding to client requests.|
> |Knox Gateway process|KNOX|This host-level alert is triggered if the Knox 
> Gateway cannot be determined to be up.|
> |Kafka Broker process|KAFKA|This host-level alert is triggered if the Kafka 
> Broker cannot be determined to be up.|
> |Falcon Server Web UI|FALCON|This host-level alert is triggered if the Falcon 
> Server Web UI is unreachable.|
> |Falcon Server process UI|FALCON|This host-level alert is triggered if the 
> Falcon Server cannot be determined to be up.|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (AMBARI-8569) Alert JSON Files Need Descriptions

Reply via email to