[ 
https://issues.apache.org/jira/browse/HDFS-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407242#comment-15407242
 ] 

Rakesh R commented on HDFS-10434:
---------------------------------

I think I got the cause of the failure. It is wrongly finding out the datanode 
to be corrupted from the block locations. Instead of finding out a datanode 
which is used in the block locations it is simply getting a datanode from the 
cluster, which may not be a datanode present in the block locations.
{code}
    byte[] indices = lastBlock.getBlockIndices();
    //corrupt the first block
    DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]);
{code}

For example, datanodes in the {{cluster.getDataNodes()}} array indexed like, 
0->Dn1, 1->Dn2, 2->Dn3, 3->Dn4,  4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 9->Dn10

Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4,  
Dn5, Dn6, Dn7, Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the 
corrupted datanode as {{cluster.getDataNodes().get(0)}} which will be Dn1 and 
corruption of this datanode will not result in ECWork and is failing the tests. 
Ideally, the test should find a datanode from the block locations.

Basically there are two problems in this test case. First one was fixed as part 
of this jira. For the second part, I think will raise another jira and fix it 
as there is no relation between first and second.

> Fix intermittent test failure of TestDataNodeErasureCodingMetrics
> -----------------------------------------------------------------
>
>                 Key: HDFS-10434
>                 URL: https://issues.apache.org/jira/browse/HDFS-10434
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 3.0.0-alpha1
>
>         Attachments: HDFS-10434-00.patch, HDFS-10434-01.patch
>
>
> This jira is to fix the test case failure.
> Reference : 
> [Build15485_TestDataNodeErasureCodingMetrics_testEcTasks|https://builds.apache.org/job/PreCommit-HDFS-Build/15485/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeErasureCodingMetrics/testEcTasks/]
> {code}
> Error Message
> Bad value for metric EcReconstructionTasks expected:<1> but was:<0>
> Stacktrace
> java.lang.AssertionError: Bad value for metric EcReconstructionTasks 
> expected:<1> but was:<0>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at 
> org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
>       at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics.testEcTasks(TestDataNodeErasureCodingMetrics.java:92)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to