[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903896#comment-13903896
 ] 

dan dan zheng commented on HDFS-5892:
-------------------------------------

Here's a patch which addresses the issue. The cause of the intermittent failure 
is that the test tries to set name services in the configuration when starting 
the federation, but MiniDFSTopology generates the services ids without 
considering the name services set in the configuration. So the BPOfferServices 
started are actually for ns1 and ns2, not the ones set during the test 
("namesServerId1,namesServerId2"). Later on, the test refreshes the service 
using the id namesServerId2,  which starts the service for the first time. 
Also, ns1 and ns2 are not in the refresh list anymore, they are stopped. The 
test fails when namesServerId2 is not completely started and tries to create 
file /gamma, which is the reason we see the failure is intermittent due to the 
race condition. 

Refer to current log for issue,
2014-02-13 22:14:02,489 INFO  datanode.DataNode 
(BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for 
nameservices: ns1,ns2
2014-02-13 22:14:02,491 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for 
nameservices: ns1,ns2 
2014-02-13 22:51:40,326 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for 
nameservices: namesServerId2
2014-02-13 22:51:40,327 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for 
nameservices: ns1,ns2

After apply patch, MiniDFSTopology can get name service from configuration 
correctly, then BPOfferServices are started for correct nameservices.

Correct one should be,
2014-02-13 22:14:02,489 INFO  datanode.DataNode 
(BlockPoolManager.java:refreshNamenodes(148)) - Refresh request received for 
nameservices: namesServerId1,namesServerId2
2014-02-13 22:14:02,491 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(193)) - Starting BPOfferServices for 
nameservices: namesServerId1,namesServerId2
2014-02-13 22:51:40,327 INFO  datanode.DataNode 
(BlockPoolManager.java:doRefreshNamenodes(211)) - Stopping BPOfferServices for 
nameservices: namesServerId1



> TestDeleteBlockPool fails in branch-2
> -------------------------------------
>
>                 Key: HDFS-5892
>                 URL: https://issues.apache.org/jira/browse/HDFS-5892
>             Project: Hadoop HDFS
>          Issue Type: Test
>            Reporter: Ted Yu
>            Priority: Minor
>         Attachments: 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to