[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250767#comment-14250767
 ] 

Colin Patrick McCabe commented on HDFS-7527:
--------------------------------------------

I am -1 for removing this test right now, until we understand this issue 
better.  Putting "registration names" in the host include and exclude files 
used to work.  If it stopped working, then that's a bug that we should fix.  
Or, alternately, we should have a JIRA to remove registration names entirely.  
Last time we proposed that, it got rejected, though.  See HDFS-5237.

One example of where you might want to set registration names is if you're on 
an AWS instance with internal and external IP interfaces.  On each datanode, 
you would set {{dfs.datanode.hostname}} to the internal IP address to ensure 
that traffic flowed over the internal interface, rather than the (expensive) 
external interfaces.  In this case, you should be able to specify what nodes 
are in the cluster using these same registration names, even if doing reverse 
DNS on the datanode hostnames returns another IP address as the first entry.

> TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-7527
>                 URL: https://issues.apache.org/jira/browse/HDFS-7527
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, test
>            Reporter: Yongjun Zhang
>            Assignee: Binglin Chang
>         Attachments: HDFS-7527.001.patch
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
> {quote}
> Error Message
> test timed out after 360000 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 360000 milliseconds
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
> (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
> the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
> datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
> infoSecurePort=0, ipcPort=43726, 
> storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
>       at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
> 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
> (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
> BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
> localhost/127.0.0.1:40565. Exiting. 
> java.io.IOException: DN shut down before block pool connected
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
>       at java.lang.Thread.run(Thread.java:745)
> {quote}
> Found by tool proposed in HADOOP-11045:
> {quote}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
> Hadoop-Hdfs-trunk -n 5 | tee bt.log
> ****Recently FAILED builds in url: 
> https://builds.apache.org//job/Hadoop-Hdfs-trunk
>     THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
> as listed below:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
> (2014-12-15 03:30:01)
>     Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
>     Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
> (2014-12-13 10:32:27)
>     Failed test: 
> org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
> (2014-12-13 03:30:01)
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
> (2014-12-11 03:30:01)
>     Failed test: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
>     Failed test: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
> Among 6 runs examined, all failed tests <#failedRuns: testName>:
>     3: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
>     2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
>     2: 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     1: 
> org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to