[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464177#comment-16464177 ] Wei-Chiu Chuang commented on HDFS-7527: --- Hmm someone this test always time out before and after the patch. How come it passed Hadoop precommit? > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang >Priority: Major > Labels: flaky-test > Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch, > HDFS-7527.003.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > Failed test: > org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport > (2014-12-13 10:32:27) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport > (2014-12-13 03:30:01) > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline >
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396364#comment-16396364 ] genericqa commented on HDFS-7527: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 54s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HDFS-7527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914172/HDFS-7527.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux db9fd30c0c31 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 39a5fba | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/23429/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23429/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23429/testReport/ | | Max. process+thread count | 3076 (vs. ulimit of 1) | |
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396197#comment-16396197 ] Ajay Kumar commented on HDFS-7527: -- Patch v3 rebased with current trunk. Removed {{HostFileManager}} change as it is already included and reduced datanode heartbeat time to 250ms. > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang >Priority: Major > Labels: flaky-test > Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch, > HDFS-7527.003.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > Failed test: > org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport > (2014-12-13 10:32:27) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport > (2014-12-13 03:30:01) > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline >
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181807#comment-16181807 ] Hadoop QA commented on HDFS-7527: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-7527 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12687971/HDFS-7527.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21373/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang > Labels: flaky-test > Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > Failed test:
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806018#comment-15806018 ] Hadoop QA commented on HDFS-7527: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HDFS-7527 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12687971/HDFS-7527.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18084/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang > Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > Failed test: >
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251415#comment-14251415 ] Hadoop QA commented on HDFS-7527: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687971/HDFS-7527.002.patch against trunk revision 1050d42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9070//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9070//console This message is automatically generated. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252501#comment-14252501 ] Colin Patrick McCabe commented on HDFS-7527: Thanks for looking at this, [~decster] and [~wheat9]. It's a difficult and frustrating area of the code, in my opinion. Unfortunately, I don't think this latest patch is exactly what we need. Last time we proposed adding more DNS lookups in the {{DatanodeManager}}, the Yahoo guys said this was unacceptable from a performance point of view. Caching DNS lookups, so that we didn't have to do them all the time, is a big part of what the {{HostFileManager}} was created to do. [~daryn], [~eli], do you have any ideas here? TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test:
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250767#comment-14250767 ] Colin Patrick McCabe commented on HDFS-7527: I am -1 for removing this test right now, until we understand this issue better. Putting registration names in the host include and exclude files used to work. If it stopped working, then that's a bug that we should fix. Or, alternately, we should have a JIRA to remove registration names entirely. Last time we proposed that, it got rejected, though. See HDFS-5237. One example of where you might want to set registration names is if you're on an AWS instance with internal and external IP interfaces. On each datanode, you would set {{dfs.datanode.hostname}} to the internal IP address to ensure that traffic flowed over the internal interface, rather than the (expensive) external interfaces. In this case, you should be able to specify what nodes are in the cluster using these same registration names, even if doing reverse DNS on the datanode hostnames returns another IP address as the first entry. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251244#comment-14251244 ] Binglin Chang commented on HDFS-7527: - Make sense, looks like the behavior is changed at some point. Update the patch to partially support dfs.datanode.hostname(if it is an ip address, or the hostname resolve to a proper ip address). And add change to test to properly wait for the excluded datanode become back again(using Datanode.isDatanodeFullyStarted rather than checking ALIVE node count). Note that too fully restore the old behavior requires a lot more changes, currently I only made minimal changes. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248411#comment-14248411 ] Binglin Chang commented on HDFS-7527: - Read some related code, the test is intended to test dfs.host list can support dfs.datanode.hostname (e.g. if you set adatanode's name to host1, and dfs.host file contains host1, this datanode should be able to connect to namenode). But after reading to code, turns out DatanodeManager check dfs.host list only using ip address, not hostname(namenode resolve all hostnames in dfs.host file to ip address), so this test should fail as the expect behavior. The reason the test passes most of the time is because the code is missing proper waiting to make sure the old datanode is expired. {code} refreshNodes(cluster.getNamesystem(0), hdfsConf); cluster.restartDataNode(0); // there should be some wait time before the original datanode becoming dead, // or the following checking code will always success, because old datanode is still alive // Wait for the DN to come back. while (true) { DatanodeInfo info[] = client.datanodeReport(DatanodeReportType.LIVE); if (info.length == 1) { Assert.assertFalse(info[0].isDecommissioned()); Assert.assertFalse(info[0].isDecommissionInProgress()); assertEquals(registrationName, info[0].getHostName()); break; } LOG.info(Waiting for datanode to come back); Thread.sleep(HEARTBEAT_INTERVAL * 1000); } {code} I added some sleep time in the comment above, and the test always fail, which verify my theory. Since the test is not valid, I think we should just remove it. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote}
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248784#comment-14248784 ] Hadoop QA commented on HDFS-7527: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687508/HDFS-7527.001.patch against trunk revision 07bb0b0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9052//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9052//console This message is automatically generated. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249005#comment-14249005 ] Haohui Mai commented on HDFS-7527: -- +1 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test:
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249014#comment-14249014 ] Haohui Mai commented on HDFS-7527: -- It loos like the test is flaky, [~cmccabe], do you have any comments about this, or do you have any idea how we can improve the test? TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: