[jira] [Commented] (HDFS-10720) Fix intermittent test failure of TestDataNodeErasureCodingMetrics#testEcTasks

Kai Zheng (JIRA) Thu, 04 Aug 2016 15:58:18 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408613#comment-15408613
 ]


Kai Zheng commented on HDFS-10720:
----------------------------------

Thanks [~rakeshr] for the ping! The analysis looks correct and the fix is good. 
It was caused because we have an extra datanode so it's not ensured every 
datanode in the mini cluster will have a block location in the group. 

A comment:
Could we use the same loop to figure out both {{workerDn}} and {{toCorruptDn}}? 
Thanks!

> Fix intermittent test failure of TestDataNodeErasureCodingMetrics#testEcTasks
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10720
>                 URL: https://issues.apache.org/jira/browse/HDFS-10720
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-10720-00.patch, HDFS-10720-01.patch
>
>
> The test is wrongly finding out the datanode to be corrupted from the block 
> locations. Instead of finding out a datanode which is used in the block 
> locations it is simply getting a datanode from the cluster, which may not be 
> a datanode present in the block locations.
> {code}
>     byte[] indices = lastBlock.getBlockIndices();
>     //corrupt the first block
>     DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]);
> {code}
> For example, datanodes in the cluster.getDataNodes() array indexed like, 
> 0->Dn1, 1->Dn2, 2->Dn3, 3->Dn4, 4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 
> 9->Dn10
> Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4, 
> Dn5, Dn6, Dn7, Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the 
> corrupted datanode as cluster.getDataNodes().get(0) which will be Dn1 and 
> corruption of this datanode will not result in ECWork and is failing the 
> tests. 
> Ideally, the test should find a datanode from the block locations and corrupt 
> it, that will trigger ECWork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10720) Fix intermittent test failure of TestDataNodeErasureCodingMetrics#testEcTasks

Reply via email to