[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251244#comment-14251244
 ] 

Binglin Chang commented on HDFS-7527:
-------------------------------------

Make sense, looks like the behavior is changed at some point. 
Update the patch to partially support dfs.datanode.hostname(if it is an ip 
address, or the hostname resolve to a proper ip address). 
And add change to test to properly wait for the excluded datanode become back 
again(using Datanode.isDatanodeFullyStarted rather than checking ALIVE node 
count).
Note that too fully restore the old behavior requires a lot more changes, 
currently I only made minimal changes.

> TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-7527
>                 URL: https://issues.apache.org/jira/browse/HDFS-7527
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, test
>            Reporter: Yongjun Zhang
>            Assignee: Binglin Chang
>         Attachments: HDFS-7527.001.patch
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
> {quote}
> Error Message
> test timed out after 360000 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 360000 milliseconds
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
> (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
> the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
> datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
> infoSecurePort=0, ipcPort=43726, 
> storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
>       at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
> 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
> (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565. Exiting. 
> java.io.IOException: DN shut down before block pool connected
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
>       at java.lang.Thread.run(Thread.java:745)
> {quote}
> Found by tool proposed in HADOOP-11045:
> {quote}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> Hadoop-Hdfs-trunk -n 5 | tee bt.log
> ****Recently FAILED builds in url: 
> https://builds.apache.org//job/Hadoop-Hdfs-trunk
>     THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
> as listed below:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
> (2014-12-15 03:30:01)
>     Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
>     Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
> (2014-12-13 10:32:27)
>     Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
> (2014-12-13 03:30:01)
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
> (2014-12-11 03:30:01)
>     Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
> Among 6 runs examined, all failed tests <#failedRuns: testName>:
>     3: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
>     2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
>     2: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     1: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to