----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30566/#review71525 -----------------------------------------------------------
Committed; please close the review. - Jonathan Hurley On Feb. 5, 2015, 11:13 a.m., Yurii Shylov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30566/ > ----------------------------------------------------------- > > (Updated Feb. 5, 2015, 11:13 a.m.) > > > Review request for Ambari, Jonathan Robie and Srimanth Gunturi. > > > Bugs: AMBARI-9458 > https://issues.apache.org/jira/browse/AMBARI-9458 > > > Repository: ambari > > > Description > ------- > > When a slave component, such as a DataNode, encounters some catastrophic > problem like a heap allocation error, and no longer can perform its work, the > NameNode marks this DataNode as being unhealthy. > > The current alert definitions only check for the DataNode process being > alive, which is still technically is. We need to add new alert definitions > for: > > - HDFS/DataNode (runs on NameNode, query is to NameNode JMX) > - YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX) > - HBase/RegionServer (runs on HBase Master, queries HBase Master JMX) > > Which will check for slaves that are in some sort of bad state. Depending on > the JMX structures that need to be queried, these can either be METRIC or > SCRIPT style alert definitions. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json > fa911e1 > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json > b8a20ac > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json > dc4fafd > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py > PRE-CREATION > > Diff: https://reviews.apache.org/r/30566/diff/ > > > Testing > ------- > > In progress > > > Thanks, > > Yurii Shylov > >