[ https://issues.apache.org/jira/browse/HDFS-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683240#comment-17683240 ]
ASF GitHub Bot commented on HDFS-16902: --------------------------------------- tomscut commented on code in PR #5334: URL: https://github.com/apache/hadoop/pull/5334#discussion_r1094080750 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode/datanode.html: ########## @@ -81,6 +81,7 @@ <thead> <tr> <th>Namenode Address</th> + <th>Namenode HA state</th> Review Comment: nit: `state` -> `State`. > Add Namenode status to BPServiceActor metrics and improve logging in > offerservice > --------------------------------------------------------------------------------- > > Key: HDFS-16902 > URL: https://issues.apache.org/jira/browse/HDFS-16902 > Project: Hadoop HDFS > Issue Type: Task > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Labels: pull-request-available > > Recently came across an k8s environment where randomly some datanode pods are > not able to stay connected to all namenode pods (e.g. last heartbeat time > stays higher than 2 hr sometimes). When any standby namenode becomes active, > any datanode that is not heartbeating to it for quite sometime would not be > able to send any further block reports, leading to missing replicas > immediately after namenode failover, which could only be resolved with > datanode pod restart. > While the issue seems env specific, BPServiceActor's offer service could use > some logging improvements. It is also good to get namenode status exposed > with BPServiceActorInfo to identify any lags from datanode side in > recognizing updated Active namenode status with heartbeats. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org