[ https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972990#comment-16972990 ]
Lisheng Sun commented on HDFS-14648: ------------------------------------ hi [~linyiqun] {quote} DFSInputStream.java I haven't seen the call dfsClient.addNodeToDeadNodeDetector added in method createBlockReader under this class {quote} i do not find the method DFSInputStream#createBlockReader. createBlockReader should be in DFSStripedInputStream. > DeadNodeDetector basic model > ---------------------------- > > Key: HDFS-14648 > URL: https://issues.apache.org/jira/browse/HDFS-14648 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Lisheng Sun > Assignee: Lisheng Sun > Priority: Major > Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, > HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, > HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, > HDFS-14648.009.patch, HDFS-14648.010.patch > > > This Jira constructs DeadNodeDetector state machine model. The function it > implements as follow: > # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode > of the block is found to inaccessible, put the DataNode into > DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when > DataNode is not accessible, it is likely that the replica has been removed > from the DataNode.Therefore, it needs to be confirmed by re-probing and > requires a higher priority processing. > # DeadNodeDetector will periodically detect the Node in > DeadNodeDetector#deadnode, If the access is successful, the Node will be > moved from DeadNodeDetector#deadnode. Continuous detection of the dead node > is necessary. The DataNode need rejoin the cluster due to a service > restart/machine repair. The DataNode may be permanently excluded if there is > no added probe mechanism. > # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using > DataNode. When the DFSInputstream is closed, it will be moved from > DeadNodeDetector#dfsInputStreamNodes. > # Every time get the global deanode, update the DeadNodeDetector#deadnode. > The new DeadNodeDetector#deadnode Equals to the intersection of the old > DeadNodeDetector#deadnode and the Datanodes are by > DeadNodeDetector#dfsInputStreamNodes. > # DeadNodeDetector has a switch that is turned off by default. When it is > closed, each DFSInputstream still uses its own local deadnode. > # This feature has been used in the XIAOMI production environment for a long > time. Reduced hbase read stuck, due to node hangs. > # Just open the DeadNodeDetector switch and you can use it directly. No > other restrictions. Don't want to use DeadNodeDetector, just close it. > {code:java} > if (sharedDeadNodesEnabled && deadNodeDetector == null) { > deadNodeDetector = new DeadNodeDetector(name); > deadNodeDetectorThr = new Daemon(deadNodeDetector); > deadNodeDetectorThr.start(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org