[jira] [Created] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
Wei-Chiu Chuang created HDFS-9744: - Summary: TestDirectoryScanner#testThrottling occasionally time out after 300 seconds Key: HDFS-9744 URL: https://issues.apache.org/jira/browse/HDFS-9744 Project: Hadoop HDFS Issue Type: Bug Components: datanode Environment: Jenkins Reporter: Wei-Chiu Chuang Priority: Minor I have seen quite a few test failures in TestDirectoryScanner#testThrottling. https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/ Looking at the log, it does not look like the test got stucked. On my local machine, this test took 219 seconds. It is likely that this test takes more than 300 seconds to complete on a busy jenkins slave. I think it is reasonable to set a longer time out value, or reduce the number of blocks to reduce the duration of the test. Error Message {noformat} test timed out after 30 milliseconds {noformat} Stacktrace {noformat} java.lang.Exception: test timed out after 30 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804) at org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423) at org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432) at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418) at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376) at org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108) at org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9648) Test TestStartup.testImageChecksum keeps failing
Wei-Chiu Chuang created HDFS-9648: - Summary: Test TestStartup.testImageChecksum keeps failing Key: HDFS-9648 URL: https://issues.apache.org/jira/browse/HDFS-9648 Project: Hadoop HDFS Issue Type: Bug Environment: Jenkins Reporter: Wei-Chiu Chuang I saw the Jenkins log shows TestStartup.testImageChecksum has been failing consecutively 5 times. https://builds.apache.org/job/Hadoop-Hdfs-trunk/2724/testReport/org.apache.hadoop.hdfs.server.namenode/TestStartup/testImageChecksum/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9640) Remove hsftp from DistCp
Wei-Chiu Chuang created HDFS-9640: - Summary: Remove hsftp from DistCp Key: HDFS-9640 URL: https://issues.apache.org/jira/browse/HDFS-9640 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 3.0.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Per discussion in HDFS-9638, after HDFS-5570, hftp/hsftp are removed from Hadoop 3.0.0. But DistCp still makes references to hsftp via parameter -mapredSslConf. This parameter would be useless after Hadoop 3.0.0, and therefore should be removed, and document the changes. This JIRA is intended to track the status of the code/docs change involving the removal of hsftp in DistCp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9638) Improve DistCp Help and documentation
Wei-Chiu Chuang created HDFS-9638: - Summary: Improve DistCp Help and documentation Key: HDFS-9638 URL: https://issues.apache.org/jira/browse/HDFS-9638 Project: Hadoop HDFS Issue Type: Improvement Components: distcp Affects Versions: 3.0.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor For example, -mapredSslConfConfiguration for ssl config file, to use with hftps:// But this ssl config file should be in the classpath, which is not clearly stated. http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html "When using the hsftp protocol with a source, the security- related properties may be specified in a config-file and passed to DistCp. needs to be in the classpath. " It is also not clear from the context if this ssl_conf_file should be at the client issuing the command. (I think the answer is yes) Also, in: http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html "The following is an example of the contents of the contents of a SSL Configuration file:" there's an extra "of the contents of the contents " -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9631) Restarting namenode after deleting a directory with snapshot will fail
Wei-Chiu Chuang created HDFS-9631: - Summary: Restarting namenode after deleting a directory with snapshot will fail Key: HDFS-9631 URL: https://issues.apache.org/jira/browse/HDFS-9631 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang I found a number of TestOpenFilesWithSnapshot tests failed quite frequently. These tests (testParentDirWithUCFileDeleteWithSnapshot, testOpenFilesWithRename, testWithCheckpoint) are unable to reconnect to the namenode after restart. It looks like the reconnection failed due to an EOFException between data node and the name node. It appears that these three tests all call doWriteAndAbort(), which creates files and then abort, and then set the parent directory with a snapshot, and then delete the parent directory. Interestingly, if the parent directory does not have a snapshot, the tests will not fail. The following test will fail intermittently: {code:java} public void testDeleteParentDirWithSnapShot() throws Exception { Path path = new Path("/test"); fs.mkdirs(path); fs.allowSnapshot(path); Path file = new Path("/test/test/test2"); FSDataOutputStream out = fs.create(file); for (int i = 0; i < 2; i++) { long count = 0; while (count < 1048576) { out.writeBytes("hell"); count += 4; } } ((DFSOutputStream) out.getWrappedStream()).hsync(EnumSet .of(SyncFlag.UPDATE_LENGTH)); DFSTestUtil.abortStream((DFSOutputStream) out.getWrappedStream()); Path file2 = new Path("/test/test/test3"); FSDataOutputStream out2 = fs.create(file2); for (int i = 0; i < 2; i++) { long count = 0; while (count < 1048576) { out2.writeBytes("hell"); count += 4; } } ((DFSOutputStream) out2.getWrappedStream()).hsync(EnumSet .of(SyncFlag.UPDATE_LENGTH)); DFSTestUtil.abortStream((DFSOutputStream) out2.getWrappedStream()); fs.createSnapshot(path, "s1"); // delete parent directory fs.delete(new Path("/test/test"), true); cluster.restartNameNode(); } {code} I am not sure if it's a test case issue, or something to do with snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode
Wei-Chiu Chuang created HDFS-9619: - Summary: DataNode sometimes can not find blockpool for the correct namenode Key: HDFS-9619 URL: https://issues.apache.org/jira/browse/HDFS-9619 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang We sometimes see TestBalancerWithMultipleNameNodes.testBalancer failed to replicate a file, because a data node is excluded. {noformat} File /tmp.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) {noformat} Relevent logs suggest root cause is due to block pool not found. {noformat} 2016-01-03 22:11:43,174 [DataXceiver for client DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: /127.0.0.1:49997 java.io.IOException: Non existent blockpool BP-1927700312-172.26.2.1-145188790 at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) at java.lang.Thread.run(Thread.java:745) {noformat} For a bit more context, this test starts a cluster with two name nodes and one data node. The block pools are added, but one of them is not found after added. The root cause is due to an undetected concurrent access in a hash map in SimulatedFSDataset. The solution would be to use a thread safe class instead, like ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9612) DistCp worker threads are not terminated after jobs are done.
Wei-Chiu Chuang created HDFS-9612: - Summary: DistCp worker threads are not terminated after jobs are done. Key: HDFS-9612 URL: https://issues.apache.org/jira/browse/HDFS-9612 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 2.8.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang In HADOOP-11827, a producer-consumer style thread pool was introduced to parallelize the task of listing files/directories. We have a use case where a distcp job is run during the commit phase of a MR2 job. However, it was found distcp does not terminate ProducerConsumer thread pools properly. Because threads are not terminated, those MR2 jobs never finish. In a more typical use case where distcp is run as a standalone job, those threads are terminated forcefully when the java process is terminated. So these leaked threads did not become a problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9594) DataNode threw NullPointerException
Wei-Chiu Chuang created HDFS-9594: - Summary: DataNode threw NullPointerException Key: HDFS-9594 URL: https://issues.apache.org/jira/browse/HDFS-9594 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang In a precommit jenkins, I saw multiple exceptions. https://builds.apache.org/job/PreCommit-HDFS-Build/13984/testReport/org.apache.hadoop.hdfs/TestDFSShell/testSymLinkReserved/ One of which is a null pointer exception in datanode. {noformat} 2015-12-23 13:26:50,337 [DataNode: [[[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/1/dfs/data/data1/, [DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/1/dfs/data/data2/]] heartbeating to localhost/127.0.0.1:38151] WARN datanode.DataNode (BPServiceActor.java:run(859)) - Unexpected exception in block pool Block pool BP-1060337608-172.17.0.3-1450877209942 (Datanode Uuid 6b120576-5c02-402f-ab38-079295bda597) service to localhost/127.0.0.1:38151 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1391) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:360) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:796) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:231) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9597) TestReplicationPolicyConsiderLoad#testChooseTargetWithDecomNodes is failing
Wei-Chiu Chuang created HDFS-9597: - Summary: TestReplicationPolicyConsiderLoad#testChooseTargetWithDecomNodes is failing Key: HDFS-9597 URL: https://issues.apache.org/jira/browse/HDFS-9597 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang It seems that HDFS-9034 broken this test. This test has been failing since yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9591) FSImage.loadEdits threw NullPointerException
Wei-Chiu Chuang created HDFS-9591: - Summary: FSImage.loadEdits threw NullPointerException Key: HDFS-9591 URL: https://issues.apache.org/jira/browse/HDFS-9591 Project: Hadoop HDFS Issue Type: Bug Components: fs, ha, namenode Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang https://builds.apache.org/job/PreCommit-HDFS-Build/13963/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestFailureToReadEdits/testCheckpointStartingMidEditsFile_0_/ {noformat} Error Message Expected non-empty /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/name-0-3/current/fsimage_005 Stacktrace java.lang.AssertionError: Expected non-empty /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/name-0-3/current/fsimage_005 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.FSImageTestUtil.assertNNHasCheckpoints(FSImageTestUtil.java:470) at org.apache.hadoop.hdfs.server.namenode.ha.HATestUtil.waitForCheckpoint(HATestUtil.java:235) at org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits.testCheckpointStartingMidEditsFile(TestFailureToReadEdits.java:240) {noformat} {noformat} Exception in thread "Edit log tailer" org.apache.hadoop.util.ExitUtil$ExitException: java.lang.NullPointerException at com.google.common.base.Joiner.join(Joiner.java:226) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:818) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:812) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:257) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:371) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126) at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:385) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails
Wei-Chiu Chuang created HDFS-9583: - Summary: TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails Key: HDFS-9583 URL: https://issues.apache.org/jira/browse/HDFS-9583 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/ Looking at the code, the test expects that replacing a block from one data node to another will issue a delete request to FsDatasetAsyncDiskService.deleteAsync(), which should have print log "Scheduling ... file ... for deletion", and it waits for 3 seconds. However, it never occurred. I think the test needs a better way to determine if the delete request is executed, rather than using a fix time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9565) TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes is flaky
Wei-Chiu Chuang created HDFS-9565: - Summary: TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes is flaky Key: HDFS-9565 URL: https://issues.apache.org/jira/browse/HDFS-9565 Project: Hadoop HDFS Issue Type: Bug Components: fs, test Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes occasionally fails with the following error: https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/699/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testLocatedFileStatusStorageIdsTypes/ {noformat} FAILED: org.apache.hadoop.hdfs.TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes Error Message: Unexpected num storage ids expected:<2> but was:<1> Stack Trace: java.lang.AssertionError: Unexpected num storage ids expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes(TestDistributedFileSystem.java:855) {noformat} It appears that this test failed due to race condition: it does not wait for the file replication to finish, before checking the file's status. This flaky test can be fixed by using DFSTestUtil.waitForReplication() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9549) TestCacheDirectives#testExceedsCapacity is flaky
Wei-Chiu Chuang created HDFS-9549: - Summary: TestCacheDirectives#testExceedsCapacity is flaky Key: HDFS-9549 URL: https://issues.apache.org/jira/browse/HDFS-9549 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Environment: Jenkins Reporter: Wei-Chiu Chuang I have observed that this test (TestCacheDirectives.testExceedsCapacity) fails quite frequently in Jenkins (trunk, trunk-Java8) Error Message Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, replication=1, mark=true}] Stacktrace java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, replication=1, mark=true}] at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9515) NPE in TestDFSZKFailoverController due to binding exception in MiniDFSCluster.initMiniDFSCluster()
Wei-Chiu Chuang created HDFS-9515: - Summary: NPE in TestDFSZKFailoverController due to binding exception in MiniDFSCluster.initMiniDFSCluster() Key: HDFS-9515 URL: https://issues.apache.org/jira/browse/HDFS-9515 Project: Hadoop HDFS Issue Type: Bug Environment: Jenkins Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor If MiniDFSCluster constructor throws an exception, the cluster object is not assigned, so shutdown() call not be called on the object. I saw in a recent Jenkins job where binding error threw an exception, and later on the NPE of cluster.shutdown() hid the real cause of the test failure. HDFS-9333 has a patch that fixes the bind error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9517) Make TestDistCpUtils.testUnpackAttributes testable
Wei-Chiu Chuang created HDFS-9517: - Summary: Make TestDistCpUtils.testUnpackAttributes testable Key: HDFS-9517 URL: https://issues.apache.org/jira/browse/HDFS-9517 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 3.0.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor testUnpackAttributes() test method in TestDistCpUtils does not have @Test annotation and is not testable. I searched around and saw no discussion it was omitted, so I assume it was just unintentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9508) Fix NPE in MiniKMS.start()
[ https://issues.apache.org/jira/browse/HDFS-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-9508. --- Resolution: Invalid This should be filed under Hadoop Commons > Fix NPE in MiniKMS.start() > -- > > Key: HDFS-9508 > URL: https://issues.apache.org/jira/browse/HDFS-9508 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Labels: supportability > > Sometimes, KMS resource file can not be loaded. When this happens, an > InputStream variable will be a null pointer which will subsequently throw NPE. > This is a supportability JIRA that makes the error message more explicit, and > explain why NPE is thrown. Ultimately, leads us to understand why the > resource files can not be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9508) Fix NPE in MiniKMS.start()
Wei-Chiu Chuang created HDFS-9508: - Summary: Fix NPE in MiniKMS.start() Key: HDFS-9508 URL: https://issues.apache.org/jira/browse/HDFS-9508 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Sometimes, KMS resource file can not be loaded. When this happens, an InputStream variable will be a null pointer which will subsequently throw NPE. This is a supportability JIRA that makes the error message more explicit, and xplain why NPE is thrown. Ultimately, leads us to understand why the resource files can not be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail
Wei-Chiu Chuang created HDFS-9476: - Summary: TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail Key: HDFS-9476 URL: https://issues.apache.org/jira/browse/HDFS-9476 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang This test occasionally fail. For example, the most recent one is: https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/ Error Message {noformat} Cannot obtain block length for LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; getBlockSize()=1024; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} {noformat} Stacktrace {noformat} java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; getBlockSize()=1024; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
Wei-Chiu Chuang created HDFS-9466: - Summary: TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky Key: HDFS-9466 URL: https://issues.apache.org/jira/browse/HDFS-9466 Project: Hadoop HDFS Issue Type: Bug Components: fs, hdfs-client Reporter: Wei-Chiu Chuang This test is flaky and fails quite frequently in trunk. Error Message expected:<1> but was:<2> Stacktrace {noformat} java.lang.AssertionError: expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) {noformat} Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9361) Default block placement policy causes TestReplaceDataNodeOnFailure to fail intermittently
[ https://issues.apache.org/jira/browse/HDFS-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-9361. --- Resolution: Not A Problem I spent some time discussing the issue with [~walter.k.su] and I also agree this is not a problem. The test can be configured to ignore load factor. > Default block placement policy causes TestReplaceDataNodeOnFailure to fail > intermittently > - > > Key: HDFS-9361 > URL: https://issues.apache.org/jira/browse/HDFS-9361 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS > Reporter: Wei-Chiu Chuang > > TestReplaceDatanodeOnFailure sometimes fail (See HDFS-6101). > (For background information, the test case set up a cluster with three data > nodes, add two more data nodes, remove one data nodes, and verify that > clients can correctly recover from the failure and set up three replicas) > I traced down and found that some times a client only set up a pipeline with > only two data nodes, which is one less than configured in the test case, even > though the test case configures to always replace failed nodes. > Digging into the log, I saw: > {noformat} > 2015-11-02 12:07:38,634 [IPC Server handler 8 on 50673] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(355)) - Failed to place enough > replicas, still in nee > d of 1 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: > [ > Node /rack0/127.0.0.1:32931 [ > Datanode 127.0.0.1:32931 is not chosen since the rack has too many chosen > nodes . > ] > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:723) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:624) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:429) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:342) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:220) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:105) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:120) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1727) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2457) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:796) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299) > {noformat} > So from the log, it seems the policy causes the pipeline selection to give up > on the data node. > I wonder whether this is appropriate or not. If the load factor exceeds > certain threshold, but the file is insufficient of replicas, should it accept > it as is, or should it attempt to acquire more replicas? > I am filing
[jira] [Created] (HDFS-9451) TestFsPermission#testDeprecatedUmask is broken
Wei-Chiu Chuang created HDFS-9451: - Summary: TestFsPermission#testDeprecatedUmask is broken Key: HDFS-9451 URL: https://issues.apache.org/jira/browse/HDFS-9451 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang I noticed this test failed consistently since yesterday. The first failed jenkins job is https://builds.apache.org/job/Hadoop-common-trunk-Java8/723/changes, and from the change log: {noformat} Changes: [wheat9] HDFS-9402. Switch DataNode.LOG to use slf4j. Contributed by Walter Su. [wheat9] HADOOP-11218. Add TLSv1.1,TLSv1.2 to KMS, HttpFS, SSLFactory. [wheat9] HADOOP-12467. Respect user-defined JAVA_LIBRARY_PATH in Windows Hadoop [wheat9] HDFS-8914. Document HA support in the HDFS HdfsDesign.md. Contributed by [wheat9] HDFS-9153. Pretty-format the output for DFSIO. Contributed by Kai Zheng. [wheat9] HDFS-7796. Include X-editable for slick contenteditable fields in the [wheat9] HDFS-3302. Review and improve HDFS trash documentation. Contributed by [wheat9] HADOOP-12294. Remove the support of the deprecated dfs.umask. {noformat} HADOOP-12294 looks to be the most likely cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9358) TestNodeCount#testNodeCount timed out
Wei-Chiu Chuang created HDFS-9358: - Summary: TestNodeCount#testNodeCount timed out Key: HDFS-9358 URL: https://issues.apache.org/jira/browse/HDFS-9358 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang I have seen this test failure occurred a few times in trunk: Error Message Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 Stacktrace java.util.concurrent.TimeoutException: Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152) at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146) at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130) at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9347) Invariant assumption in TestQuorumJournalManager.shutdown() is wrong
Wei-Chiu Chuang created HDFS-9347: - Summary: Invariant assumption in TestQuorumJournalManager.shutdown() is wrong Key: HDFS-9347 URL: https://issues.apache.org/jira/browse/HDFS-9347 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang The code {code:title=TestTestQuorumJournalManager.java|borderStyle=solid} @After public void shutdown() throws IOException { IOUtils.cleanup(LOG, toClose.toArray(new Closeable[0])); // Should not leak clients between tests -- this can cause flaky tests. // (See HDFS-4643) GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*"); if (cluster != null) { cluster.shutdown(); } } {code} implicitly assumes when the call returns from IOUtils.cleanup() (which calls close() on QuorumJournalManager object), all IPC client connection threads are terminated. However, there is no internal implementation that enforces this assumption. Even if the bug reported in HADOOP-12532 is fixed, the internal code still only ensures IPC connections are terminated, but not the thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9309) Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()
Wei-Chiu Chuang created HDFS-9309: - Summary: Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig() Key: HDFS-9309 URL: https://issues.apache.org/jira/browse/HDFS-9309 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor When KeyStoreUtil.setupSSLConfig() is called, several files are created (ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). However, if they are not deleted upon exit, weird thing can happen to any subsequent files. For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message: {noformat} java.io.IOException: Unable to load OAuth2 connection factory. at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:146) at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164) at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81) at org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215) at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131) at org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138) at org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163) at org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147) {noformat} There are currently several tests that do not clean up: {noformat} 130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L "KeyStoreTestUtil\.cleanupSSLConfig" ./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestSecureNNWithQJM.java ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRespectsBindHostKeys.java ./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/TestHttpFSFWithSWebhdfsFileSystem.java {noformat} This JIRA is the effort to remove the bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space
Wei-Chiu Chuang created HDFS-9296: - Summary: ShellBasedUnixGroupMapping should support group names with space Key: HDFS-9296 URL: https://issues.apache.org/jira/browse/HDFS-9296 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang In a typical configuration, group name is obtained from AD through SSSD/LDAP. AD permits group names with space (e.g. "Domain Users"). Unfortunately, the present implementation of ShellBasedUnixGroupMapping parses the output of shell command "id -Gn", and assumes group names are separated by space. This could be achieved by using a combination of shell scripts, for example, bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut -d":" -f1' But I am still looking for a more compact form, and potentially more efficient one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space
[ https://issues.apache.org/jira/browse/HDFS-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-9296. --- Resolution: Duplicate I filed in the wrong category. A new one is filed as HADOOP-12505 > ShellBasedUnixGroupMapping should support group names with space > > > Key: HDFS-9296 > URL: https://issues.apache.org/jira/browse/HDFS-9296 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > > In a typical configuration, group name is obtained from AD through SSSD/LDAP. > AD permits group names with space (e.g. "Domain Users"). > Unfortunately, the present implementation of ShellBasedUnixGroupMapping > parses the output of shell command "id -Gn", and assumes group names are > separated by space. > This could be achieved by using a combination of shell scripts, for example, > bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut > -d":" -f1' > But I am still looking for a more compact form, and potentially more > efficient one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9286) HttpFs does not parse ACL syntax correctly for operation REMOVEACLENTRIES
Wei-Chiu Chuang created HDFS-9286: - Summary: HttpFs does not parse ACL syntax correctly for operation REMOVEACLENTRIES Key: HDFS-9286 URL: https://issues.apache.org/jira/browse/HDFS-9286 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Output from WebHdfs: curl -X PUT "http://weichiu.vpc.cloudera.com:50070/webhdfs/v1/a?aclspec=group:user:=REMOVEACLENTRIES=weichiu; Output from HttpFs: curl -X PUT "http://weichiu.vpc.cloudera.com:14000/webhdfs/v1/a?aclspec=group:user:=REMOVEACLENTRIES=weichiu; {"RemoteException":{"message":"Invalid : group:user:","exception":"HadoopIllegalArgumentException","javaClassName":"org.apache.hadoop.HadoopIllegalArgumentException"}} Effectively, what this means is that the behavior of HttpFs is not consistent with that of WebHdfs. Bug is reproducible if httpfs and acl are enabled, and reproducible on single-node cluster configuration. To reproduce, add into core-site.xml: dfs.webhdfs.enabled true dfs.namenode.acls.enabled true hadoop.proxyuser.#HTTPFSUSER#.hosts httpfs-host.foo.com hadoop.proxyuser.#HTTPFSUSER#.groups * restart name node, data node and httpfs daemon Credit to [~romainr] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9285) testTruncateWithDataNodesRestartImmediately occasionally fails
Wei-Chiu Chuang created HDFS-9285: - Summary: testTruncateWithDataNodesRestartImmediately occasionally fails Key: HDFS-9285 URL: https://issues.apache.org/jira/browse/HDFS-9285 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Priority: Minor https://builds.apache.org/job/Hadoop-Hdfs-trunk/2462/testReport/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestartImmediately/ Note that this is similar, but appears to be a different failure than HDFS-8729. Error Message inode should complete in ~3 ms. Expected: is but: was Stacktrace java.lang.AssertionError: inode should complete in ~3 ms. Expected: is but: was at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Assert.java:865) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1192) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1176) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1171) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:798) Log excerpt: 2015-10-22 06:34:47,281 [IPC Server handler 8 on 8020] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test/testTruncateWithDataNodesRestartImmediately dst=null perm=null proto=rpc 2015-10-22 06:34:47,382 [IPC Server handler 9 on 8020] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test/testTruncateWithDataNodesRestartImmediately dst=null perm=null proto=rpc 2015-10-22 06:34:47,484 [IPC Server handler 0 on 8020] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test/testTruncateWithDataNodesRestartImmediately dst=null perm=null proto=rpc 2015-10-22 06:34:47,585 [IPC Server handler 1 on 8020] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test/testTruncateWithDataNodesRestartImmediately dst=null perm=null proto=rpc 2015-10-22 06:34:47,689 [main] INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1889)) - Shutting down the Mini HDFS Cluster 2015-10-22 06:34:47,690 [main] INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdownDataNodes(1935)) - Shutting down DataNode 2 2015-10-22 06:34:47,690 [main] WARN datanode.DirectoryScanner (DirectoryScanner.java:shutdown(529)) - DirectoryScanner: shutdown has been called -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9268) JVM crashes when attempting to update a file in fuse file system using vim
Wei-Chiu Chuang created HDFS-9268: - Summary: JVM crashes when attempting to update a file in fuse file system using vim Key: HDFS-9268 URL: https://issues.apache.org/jira/browse/HDFS-9268 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor JVM crashes when users attempt to use vi to update a file on fuse file system with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to generate the bug, but the same bug is reproducible in trunk) The root cause is a segfault in a pdfs-fuse method To reproduce it do as follows: mkdir /mnt/fuse chmod 777 /mnt/fuse ulimit -c unlimited# to enable coredump hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse touch /mnt/fuse/y chmod 600 /mnt/fuse/y vim /mnt/fuse/y (in vim, :w to save the file) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600 # # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x127ad6] __tls_get_addr@@GLIBC_2.3+0x127ad6 # # Core dump written. Default location: /home/weichiu/core or core.26606 # # An error report file with more information is saved as: # /home/weichiu/hs_err_pid26606.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@ === The coredump shows the segfault comes from (gdb) bt #0 0x003b82e328e5 in raise () from /lib64/libc.so.6 #1 0x003b82e340c5 in abort () from /lib64/libc.so.6 #2 0x7f66fc924d75 in os::abort(bool) () from /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so #3 0x7f66fcaa76d7 in VMError::report_and_die() () from /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so #4 0x7f66fc929c8f in JVM_handle_linux_signal () from /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so #5 #6 0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6 #7 0x004039a0 in hdfsConnTree_RB_FIND () #8 0x00403e8f in fuseConnect () #9 0x004046db in dfs_chown () #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2 #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2 #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2 #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0 #14 0x003b82ee894d in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9269) Need to update the documentation and wrapper for hdfs-dfs
Wei-Chiu Chuang created HDFS-9269: - Summary: Need to update the documentation and wrapper for hdfs-dfs Key: HDFS-9269 URL: https://issues.apache.org/jira/browse/HDFS-9269 Project: Hadoop HDFS Issue Type: Improvement Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the wrapper script of hdfs-fuse, but found them super outdated. (the wrapper was last updated four years ago, and the hadoop project layout has dramatically changed since then) I am creating this JIRA to track the status of the update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-7464) TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 8
[ https://issues.apache.org/jira/browse/HDFS-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reopened HDFS-7464: --- I am seeing this today with Java 7 java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) Running org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.205 sec <<< FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA testRefreshSuperUserGroupsConfiguration(org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA) Time elapsed: 0.808 sec <<< FAILURE! java.lang.AssertionError: refreshSuperUserGroupsConfiguration: End of File Exception between local host is: "weichiu-MBP.local/172.16.1.61"; destination host is: "localhost":10872; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException expected:<0> but was:<-1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration(TestDFSAdminWithHA.java:235) Results : Failed tests: TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration:235 refreshSuperUserGroupsConfiguration: End of File Exception between local host is: "weichiu-MBP.local/172.16.1.61"; destination host is: "localhost":10872; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException expected:<0> but was:<-1> > TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java > 8 > --- > > Key: HDFS-7464 > URL: https://issues.apache.org/jira/browse/HDFS-7464 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > > From https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/23/ : > {code} > REGRESSION: > org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration > Error Message: > refreshSuperUserGroupsConfiguration: End of File Exception between local host > is: "asf908.gq1.ygridcore.net/67.195.81.152"; destination host is: > "localhost":12700; : java.io.EOFException; For more details see: > http://wiki.apache.org/hadoop/EOFException expected:<0> but was:<-1> > Stack Trace: > java.lang.AssertionError: refreshSuperUserGroupsConfiguration: End of File > Exception between local host is: "asf908.gq1.ygridcore.net/67.195.81.152"; > destination host is: "localhost":12700; : java.io.EOFException; For more > details see: http://wiki.apache.org/hadoop/EOFException expected:<0> but > was:<-1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration(TestDFSAdminWithHA.java:228) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.
Wei-Chiu Chuang created HDFS-9249: - Summary: NPE thrown if an IOException is thrown in NameNode. Key: HDFS-9249 URL: https://issues.apache.org/jira/browse/HDFS-9249 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor This issue was found when running test case TestBackupNode.testCheckpointNode, but upon closer look, the problem is not due to the test case. Looks like an IOException was thrown in try { initializeGenericKeys(conf, nsId, namenodeId); initialize(conf); try { haContext.writeLock(); state.prepareToEnterState(haContext); state.enterState(haContext); } finally { haContext.writeUnlock(); } causing the namenode to stop, but the namesystem was not yet properly instantiated, causing NPE. I tried to reproduce locally, but to no avail. Because I could not reproduce the bug, and the log does not indicate what caused the IOException, I suggest make this a supportability JIRA to log the exception for future improvement. Stacktrace java.lang.NullPointerException: null at org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906) at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827) at org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474) at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102) at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298) at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130) The last few lines of log: 2015-10-14 19:45:07,807 INFO namenode.NameNode (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint] 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started (again) 2015-10-14 19:45:07,808 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is hdfs://localhost:37835 2015-10-14 19:45:07,808 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use localhost:37835 to access this namenode/service. 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for active state 2015-10-14 19:45:07,811 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting 2015-10-14 19:45:07,811 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 4 SyncTimes(ms): 2 1 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, exiting 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001 -> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001 -> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-003 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(169)) - Shutting down CacheReplicationMonitor 2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping server on 37835 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC Server listener on 37835 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC Server Responder 2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager (BlockManager.java:run(3781)) - Stopping ReplicationMonitor. 2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager (DecommissionManager.java:run(78)) - M
[jira] [Created] (HDFS-9243) TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout
Wei-Chiu Chuang created HDFS-9243: - Summary: TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout Key: HDFS-9243 URL: https://issues.apache.org/jira/browse/HDFS-9243 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor This is happening on trunk org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks On my local Linux machine, this test case times out 6 out of 10 times. When it does not time out, this test takes about 20 seconds, otherwise it takes more than 60 seconds and then time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9181) Better handling of exceptions thrown during upgrade shutdown
Wei-Chiu Chuang created HDFS-9181: - Summary: Better handling of exceptions thrown during upgrade shutdown Key: HDFS-9181 URL: https://issues.apache.org/jira/browse/HDFS-9181 Project: Hadoop HDFS Issue Type: Improvement Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor Previously in HDFS-7533, a bug was fixed by suppressing exceptions during upgrade shutdown. It may be appropriate as a temporary fix, but it would be better if the exception is handled in some way. One way to handle it is by emitting a warning message. There could exist other ways to handle it. This lira is created to discuss how to handle this case better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9123) Validation of a path ended with a '/'
Wei-Chiu Chuang created HDFS-9123: - Summary: Validation of a path ended with a '/' Key: HDFS-9123 URL: https://issues.apache.org/jira/browse/HDFS-9123 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang Priority: Minor HDFS forbids copying from a directory to its subdirectory (e.g. hdfs dfs -cp /abc /abc/xyz) as otherwise it could cause infinite copying (/abc/xyz/xyz, /abc/xyz/xyz, /abc/xyz/xyz/xyz,... etc) However, if the source path is ended with a '/' path separator, the existing validation for sub-directories fails. For example, copying from / to /abc would cause infinite copying, until the disk space is filled up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)