[jira] [Comment Edited] (HDFS-14651) DeadNodeDetector checks dead node periodically

Ahmed Hussein (Jira) Fri, 31 Jan 2020 18:32:46 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027943#comment-17027943
 ]


Ahmed Hussein edited comment on HDFS-14651 at 2/1/20 2:31 AM:
--------------------------------------------------------------

Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15149

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation never of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 


was (Author: ahussein):
Hi [~leosun08] and [~linyiqun]!
I am looking at the timeouts caused by the changes in this patch HDFS-15147

I have couple of questions:
# what is the usage of {{deadNodeDetectInterval}} ? As far as I understand, 
every call to {{checkDeadNodes()}} will change the state to {{IDLE}} forcing 
the {{DeadNodeDetector}} to sleep for {{IDLE_SLEEP_MS}}. So, why do we need 
{{deadNodeDetectInterval}} if the actual time gap between every check is 
{{IDLE_SLEEP_MS}}?
# Correct me if I am wrong: 
{{stopDeadNodeDetectorThread.stopDeadNodeDetectorThread()}} is supposed to stop 
the deadNodeDetector thread; but it looks like the implementation never of the 
runnable  never terminates. {{DeadNodeDetector}} surpresses all interrupts and 
never checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join. 

> DeadNodeDetector checks dead node periodically
> ----------------------------------------------
>
>                 Key: HDFS-14651
>                 URL: https://issues.apache.org/jira/browse/HDFS-14651
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14651) DeadNodeDetector checks dead node periodically

Reply via email to