[ 
https://issues.apache.org/jira/browse/HDFS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986818#comment-14986818
 ] 

Walter Su commented on HDFS-6101:
---------------------------------

The test failed is possibly because the stopped DN doesn't be removed from 
cluster map, and {{sleepSeconds(5)}} doesn't make sure it's removed from 
cluster map.

1. Please don't remove this. It's intended. After sleeping, we want some writer 
NOT yet started.
{code}
-      // Some of them are too slow and will be not yet started. 
-      sleepSeconds(1);
{code}

2. Instead of hardcode sleep time 5s. We can use 
{{GenericTestUtils.waitFor(..)}} to check the block replication. The 
wait/notify is unnecessary.

3. After
{code}
cluster.stopDataNode(AppendTestUtil.nextInt(REPLICATION));
{code}
We should call cluster.setDataNodeDead(..) to remove it from cluster map.

> TestReplaceDatanodeOnFailure fails occasionally
> -----------------------------------------------
>
>                 Key: HDFS-6101
>                 URL: https://issues.apache.org/jira/browse/HDFS-6101
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Arpit Agarwal
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-6101.001.patch, HDFS-6101.002.patch, 
> HDFS-6101.003.patch, TestReplaceDatanodeOnFailure.log
>
>
> Exception details in a comment below.
> The failure repros on both OS X and Linux if I run the test ~10 times in a 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to