[ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968588#comment-14968588 ]
Walter Su commented on HDFS-9275: --------------------------------- ||DN0||DN1||DN2||DN3||DN4||DN5||DN6||DN7||DN8||DN9||DN10||DN11 | |*|*|*|*|*|*|*|*|*| | | <-- BlockGroup_0 | | |*|*|*|*|*|*|*|*|*| | <-- BlockGroup_1 The test case only tests last block group. Suppose DN8~10 are shutdown. ReplicationMonitor will schedule a recovery. Firstly need to call BlockPlacementPolicy to choose targets. DN2~DN10 are excluded because they already have internal blocks on them. To recover 3 blocks, it must choose DN0, DN1, DN11. But DN1 has a block belonging to BlockGroup_0. The last time DN1 sent a heartbeat, it reported its {{xceiverCount}} is 3. {{xceiverCount}} is equals to the active thread in DataNode.threadGroup, as show below. {noformat} DatanodeRegistration(127.0.0.1:47705, datanodeUuid=43e5be32-2066-4057-9b25-8544d2d542bc, infoPort=43445, infoSecurePort=0, ipcPort=34036, storageInfo=lv=-56;cid=testClusterID;nsid=23260287;c=1445489667626) java.lang.ThreadGroup[name=dataXceiverServer,maxpri=10] Thread[org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@6aa03871,5,dataXceiverServer] Thread[DataXceiver for client DFSClient_NONMAPREDUCE_-1867405584_1 at /127.0.0.1:56717 [Receiving block BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001],5,dataXceiverServer] Thread[PacketResponder: BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001, type=LAST_IN_PIPELINE, downstreams=0:[],5,dataXceiverServer] {noformat} {{xceiverCount}} equals to 3 is lager than average number, so DN1 is excluded by {{chooseRandom()}}. Then BlockGroup_1 can only recover 2 blocks. As discussed [here|https://issues.apache.org/jira/browse/HDFS-8220?focusedCommentId=14518931&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14518931] , now temporarily PlacementPolicy doesn't support return two identical storages, aka, no 2 replicas(internal blocks) in the same storage. We could simply add more DNs to fix the test. Or we can set {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY}} to false in the test case. The 02 patch includes some clean up. Kindly review. Thanks. > Fix TestRecoverStripedFile > -------------------------- > > Key: HDFS-9275 > URL: https://issues.apache.org/jira/browse/HDFS-9275 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test > Reporter: Walter Su > Assignee: Walter Su > Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)