[ https://issues.apache.org/jira/browse/HDFS-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480523#comment-13480523 ]
Jing Zhao commented on HDFS-4061: --------------------------------- Nicholas, I checked the test output and guess maybe the test failure is caused by this: When the NameNode invalides a block for a datanode D1 and remove the datanode-block pair from the blockMap, and before the invalidation request is sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to work and schedule the replication to D1. So the invalidation and replication request will be sent to D1 at the same time. D1 will then ignore the replication request (also throws a ReplicaAlreadyExistsException), and delete the replica. Thus NN cannot receive the blockreceived msg from D1. And the testcast will timeout in 5min which is smaller than the timeout of PendingReplication request (usually 5~10 min). I can file another jira to fix the testcase if you think it is correct. > TestBalancer and TestUnderReplicatedBlocks need timeouts > -------------------------------------------------------- > > Key: HDFS-4061 > URL: https://issues.apache.org/jira/browse/HDFS-4061 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.0-alpha > Reporter: Eli Collins > Assignee: Eli Collins > Fix For: 2.0.3-alpha > > Attachments: hdfs-4061.txt > > > Saw TestBalancer and TestUnderReplicatedBlocks timeout hard on a jenkins job > recently, let's annotate the relevant tests with timeouts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira