[ https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027943#comment-17027943 ]
Ahmed Hussein edited comment on HDFS-14651 at 2/1/20 2:31 AM: -------------------------------------------------------------- Hi [~leosun08] and [~linyiqun]! I am looking at the timeouts caused by the changes in this patch HDFS-15149 I have couple of questions: # what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need {{deadNodeDetectInterval}} if the actual time gap between every check is {{IDLE_SLEEP_MS}}? # Correct me if I am wrong: {{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop the deadNodeDetector thread; but it looks like the implementation never of the runnable never terminates. {{DeadNodeDetector}} surpresses all interrupts and never checks for a termination flag. Therefore, the caller will just hang for 3 seconds waiting to join. was (Author: ahussein): Hi [~leosun08] and [~linyiqun]! I am looking at the timeouts caused by the changes in this patch HDFS-15147 I have couple of questions: # what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need {{deadNodeDetectInterval}} if the actual time gap between every check is {{IDLE_SLEEP_MS}}? # Correct me if I am wrong: {{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop the deadNodeDetector thread; but it looks like the implementation never of the runnable never terminates. {{DeadNodeDetector}} surpresses all interrupts and never checks for a termination flag. Therefore, the caller will just hang for 3 seconds waiting to join. > DeadNodeDetector checks dead node periodically > ---------------------------------------------- > > Key: HDFS-14651 > URL: https://issues.apache.org/jira/browse/HDFS-14651 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Lisheng Sun > Assignee: Lisheng Sun > Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, > HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, > HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch > > > DeadNodeDetector checks dead node periodically. > DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, > If the access is successful, the Node will be moved from > DeadNodeDetector#deadnode. Continuous detection of the dead node is > necessary. The DataNode need rejoin the cluster due to a service > restart/machine repair. The DataNode may be permanently excluded if there is > no added probe mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org