[ 
https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968588#comment-14968588
 ] 

Walter Su commented on HDFS-9275:
---------------------------------

||DN0||DN1||DN2||DN3||DN4||DN5||DN6||DN7||DN8||DN9||DN10||DN11
| |*|*|*|*|*|*|*|*|*| | |   <-- BlockGroup_0
| | |*|*|*|*|*|*|*|*|*| |   <-- BlockGroup_1

The test case only tests last block group. Suppose DN8~10 are shutdown. 
ReplicationMonitor will schedule a recovery. Firstly need to call 
BlockPlacementPolicy to choose targets. DN2~DN10 are excluded because they 
already have internal blocks on them. To recover 3 blocks, it must choose DN0, 
DN1, DN11.

But DN1 has a block belonging to BlockGroup_0. The last time DN1 sent a 
heartbeat, it reported its {{xceiverCount}} is 3. {{xceiverCount}} is equals to 
the active thread in DataNode.threadGroup, as show below.

{noformat}
DatanodeRegistration(127.0.0.1:47705, 
datanodeUuid=43e5be32-2066-4057-9b25-8544d2d542bc, infoPort=43445, 
infoSecurePort=0, ipcPort=34036, 
storageInfo=lv=-56;cid=testClusterID;nsid=23260287;c=1445489667626)
java.lang.ThreadGroup[name=dataXceiverServer,maxpri=10]
    
Thread[org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@6aa03871,5,dataXceiverServer]
    Thread[DataXceiver for client DFSClient_NONMAPREDUCE_-1867405584_1 at 
/127.0.0.1:56717 [Receiving block 
BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001],5,dataXceiverServer]
    Thread[PacketResponder: 
BP-1612020377-9.96.1.34-1445489667626:blk_-9223372036854775791_1001, 
type=LAST_IN_PIPELINE, downstreams=0:[],5,dataXceiverServer]
{noformat}
{{xceiverCount}} equals to 3 is lager than average number, so DN1 is excluded 
by {{chooseRandom()}}. Then BlockGroup_1 can only recover 2 blocks. As 
discussed 
[here|https://issues.apache.org/jira/browse/HDFS-8220?focusedCommentId=14518931&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14518931]
 , now temporarily PlacementPolicy doesn't support return two identical 
storages, aka, no 2 replicas(internal blocks) in the same storage. 

We could simply add more DNs to fix the test. Or we can set 
{{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY}} to false in the test case.

The 02 patch includes some clean up. Kindly review. Thanks.

> Fix TestRecoverStripedFile
> --------------------------
>
>                 Key: HDFS-9275
>                 URL: https://issues.apache.org/jira/browse/HDFS-9275
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to