[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903896#comment-13903896 ]
dan dan zheng commented on HDFS-5892: ------------------------------------- Here's a patch which addresses the issue. The cause of the intermittent failure is that the test tries to set name services in the configuration when starting the federation, but MiniDFSTopology generates the services ids without considering the name services set in the configuration. So the BPOfferServices started are actually for ns1 and ns2, not the ones set during the test ("namesServerId1,namesServerId2"). Later on, the test refreshes the service using the id namesServerId2, which starts the service for the first time. Also, ns1 and ns2 are not in the refresh list anymore, they are stopped. The test fails when namesServerId2 is not completely started and tries to create file /gamma, which is the reason we see the failure is intermittent due to the race condition. Refer to current log for issue, 2014-02-13 22:14:02,489 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for nameservices: ns1,ns2 2014-02-13 22:14:02,491 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for nameservices: ns1,ns2 2014-02-13 22:51:40,326 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for nameservices: namesServerId2 2014-02-13 22:51:40,327 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for nameservices: ns1,ns2 After apply patch, MiniDFSTopology can get name service from configuration correctly, then BPOfferServices are started for correct nameservices. Correct one should be, 2014-02-13 22:14:02,489 INFO datanode.DataNode (BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for nameservices: namesServerId1,namesServerId2 2014-02-13 22:14:02,491 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for nameservices: namesServerId1,namesServerId2 2014-02-13 22:51:40,327 INFO datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for nameservices: namesServerId1 > TestDeleteBlockPool fails in branch-2 > ------------------------------------- > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test > Reporter: Ted Yu > Priority: Minor > Attachments: > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)