[ https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739667#comment-17739667 ]
ASF GitHub Bot commented on HDFS-17055: --------------------------------------- goiri merged PR #5790: URL: https://github.com/apache/hadoop/pull/5790 > Export HAState as a metric from Namenode for monitoring > ------------------------------------------------------- > > Key: HDFS-17055 > URL: https://issues.apache.org/jira/browse/HDFS-17055 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 3.4.0, 3.3.9 > Reporter: Xing Lin > Assignee: Xing Lin > Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > We'd like measure the uptime for Namenodes: percentage of time when we have > the active/standby/observer node available (up and running). We could monitor > the namenode from an external service, such as ZKFC. But that would require > the external service to be available 100% itself. And when this third-party > external monitoring service is down, we won't have info on whether our > Namenodes are still up. > We propose to take a different approach: we will emit Namenode state directly > from namenode itself. Whenever we miss a data point for this metric, we > consider the corresponding namenode to be down/not available. In other words, > we assume the metric collection/monitoring infrastructure to be 100% reliable. > One implementation detail: in hadoop, we have the _NameNodeMetrics_ class, > which is currently used to emit all metrics for {_}NameNode.java{_}. However, > we don't think that is a good place to emit NameNode HAState. HAState is > stored in NameNode.java and we should directly emit it from NameNode.java. > Otherwise, we basically duplicate this info in two classes and we would have > to keep them in sync. Besides, _NameNodeMetrics_ class does not have a > reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_ > is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}. > We shouldn't emit HA state from FSNameSystem.java either, as it is > initialized from NameNode.java and all state transitions are implemented in > NameNode.java. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org