[jira] [Created] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-02-02 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9744:
-

 Summary: TestDirectoryScanner#testThrottling occasionally time out 
after 300 seconds
 Key: HDFS-9744
 URL: https://issues.apache.org/jira/browse/HDFS-9744
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Priority: Minor


I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/

Looking at the log, it does not look like the test got stucked. On my local 
machine, this test took 219 seconds. It is likely that this test takes more 
than 300 seconds to complete on a busy jenkins slave. I think it is reasonable 
to set a longer time out value, or reduce the number of blocks to reduce the 
duration of the test.

Error Message
{noformat}
test timed out after 30 milliseconds
{noformat}
Stacktrace
{noformat}
java.lang.Exception: test timed out after 30 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at 
org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
at 
org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
at 
org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
at 
org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
at 
org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
at 
org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9648) Test TestStartup.testImageChecksum keeps failing

2016-01-13 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9648:
-

 Summary: Test TestStartup.testImageChecksum keeps failing 
 Key: HDFS-9648
 URL: https://issues.apache.org/jira/browse/HDFS-9648
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Jenkins
Reporter: Wei-Chiu Chuang


I saw the Jenkins log shows TestStartup.testImageChecksum has been failing 
consecutively 5 times.

https://builds.apache.org/job/Hadoop-Hdfs-trunk/2724/testReport/org.apache.hadoop.hdfs.server.namenode/TestStartup/testImageChecksum/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9640) Remove hsftp from DistCp

2016-01-11 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9640:
-

 Summary: Remove hsftp from DistCp
 Key: HDFS-9640
 URL: https://issues.apache.org/jira/browse/HDFS-9640
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Affects Versions: 3.0.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


Per discussion in HDFS-9638,
after HDFS-5570, hftp/hsftp are removed from Hadoop 3.0.0. But DistCp still 
makes references to hsftp via parameter -mapredSslConf. This parameter would be 
useless after Hadoop 3.0.0, and therefore should be removed, and document the 
changes.

This JIRA is intended to track the status of the code/docs change involving the 
removal of hsftp in DistCp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9638) Improve DistCp Help and documentation

2016-01-11 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9638:
-

 Summary: Improve DistCp Help and documentation
 Key: HDFS-9638
 URL: https://issues.apache.org/jira/browse/HDFS-9638
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp
Affects Versions: 3.0.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


For example,
-mapredSslConfConfiguration for ssl config file, to use with
hftps://

But this ssl config file should be in the classpath, which is not clearly 
stated.

http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
"When using the hsftp protocol with a source, the security- related properties 
may be specified in a config-file and passed to DistCp.  needs 
to be in the classpath. "

It is also not clear from the context if this ssl_conf_file should be at the 
client issuing the command. (I think the answer is yes)

Also, in: http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
"The following is an example of the contents of the contents of a SSL 
Configuration file:"
there's an extra "of the contents of the contents "



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9631) Restarting namenode after deleting a directory with snapshot will fail

2016-01-07 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9631:
-

 Summary: Restarting namenode after deleting a directory with 
snapshot will fail
 Key: HDFS-9631
 URL: https://issues.apache.org/jira/browse/HDFS-9631
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


I found a number of TestOpenFilesWithSnapshot tests failed quite frequently. 
These tests (testParentDirWithUCFileDeleteWithSnapshot, 
testOpenFilesWithRename, testWithCheckpoint) are unable to reconnect to the 
namenode after restart. It looks like the reconnection failed due to an 
EOFException between data node and the name node.

It appears that these three tests all call doWriteAndAbort(), which creates 
files and then abort, and then set the parent directory with a snapshot, and 
then delete the parent directory. 

Interestingly, if the parent directory does not have a snapshot, the tests will 
not fail.

The following test will fail intermittently:
{code:java}
public void testDeleteParentDirWithSnapShot() throws Exception {
Path path = new Path("/test");
fs.mkdirs(path);
fs.allowSnapshot(path);
Path file = new Path("/test/test/test2");
FSDataOutputStream out = fs.create(file);
for (int i = 0; i < 2; i++) {
  long count = 0;
  while (count < 1048576) {
out.writeBytes("hell");
count += 4;
  }
}
((DFSOutputStream) out.getWrappedStream()).hsync(EnumSet
.of(SyncFlag.UPDATE_LENGTH));
DFSTestUtil.abortStream((DFSOutputStream) out.getWrappedStream());

Path file2 = new Path("/test/test/test3");
FSDataOutputStream out2 = fs.create(file2);
for (int i = 0; i < 2; i++) {
  long count = 0;
  while (count < 1048576) {
out2.writeBytes("hell");
count += 4;
  }
}
((DFSOutputStream) out2.getWrappedStream()).hsync(EnumSet
.of(SyncFlag.UPDATE_LENGTH));
DFSTestUtil.abortStream((DFSOutputStream) out2.getWrappedStream());

fs.createSnapshot(path, "s1");
// delete parent directory
fs.delete(new Path("/test/test"), true);
cluster.restartNameNode();
  }
{code}

I am not sure if it's a test case issue, or something to do with snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9619) DataNode sometimes can not find blockpool for the correct namenode

2016-01-06 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9619:
-

 Summary: DataNode sometimes can not find blockpool for the correct 
namenode
 Key: HDFS-9619
 URL: https://issues.apache.org/jira/browse/HDFS-9619
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


We sometimes see TestBalancerWithMultipleNameNodes.testBalancer failed to 
replicate a file, because a data node is excluded.

{noformat}
File /tmp.txt could only be replicated to 0 nodes instead of minReplication 
(=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this 
operation.
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1745)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2390)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:797)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
{noformat}

Relevent logs suggest root cause is due to block pool not found.  
{noformat}
2016-01-03 22:11:43,174 [DataXceiver for client 
DFSClient_NONMAPREDUCE_849671738_1 at /127.0.0.1:47318 [Receiving block 
BP-1927700312-172.26.2.1-145188790:blk_1073741825_1001]] ERROR 
datanode.DataNode (DataXceiver.java:run(280)) - host0.foo.com:49997:DataXceiver 
error processing WRITE_BLOCK operation src: /127.0.0.1:47318 dst: 
/127.0.0.1:49997
java.io.IOException: Non existent blockpool 
BP-1927700312-172.26.2.1-145188790
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.getMap(SimulatedFSDataset.java:583)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createTemporary(SimulatedFSDataset.java:955)
at 
org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.createRbw(SimulatedFSDataset.java:941)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:203)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1235)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:678)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
at java.lang.Thread.run(Thread.java:745)
{noformat}

For a bit more context, this test starts a cluster with two name nodes and one 
data node. The block pools are added, but one of them is not found after added. 
The root cause is due to an undetected concurrent access in a hash map in 
SimulatedFSDataset. The solution would be to use a thread safe class instead, 
like ConcurrentHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9612) DistCp worker threads are not terminated after jobs are done.

2016-01-04 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9612:
-

 Summary: DistCp worker threads are not terminated after jobs are 
done.
 Key: HDFS-9612
 URL: https://issues.apache.org/jira/browse/HDFS-9612
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.8.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


In HADOOP-11827, a producer-consumer style thread pool was introduced to 
parallelize the task of listing files/directories.

We have a use case where a distcp job is run during the commit phase of a MR2 
job. However, it was found distcp does not terminate ProducerConsumer thread 
pools properly. Because threads are not terminated, those MR2 jobs never finish.

In a more typical use case where distcp is run as a standalone job, those 
threads are terminated forcefully when the java process is terminated. So these 
leaked threads did not become a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9594) DataNode threw NullPointerException

2015-12-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9594:
-

 Summary: DataNode threw NullPointerException
 Key: HDFS-9594
 URL: https://issues.apache.org/jira/browse/HDFS-9594
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


In a precommit jenkins, I saw multiple exceptions. 
https://builds.apache.org/job/PreCommit-HDFS-Build/13984/testReport/org.apache.hadoop.hdfs/TestDFSShell/testSymLinkReserved/

One of which is a null pointer exception in datanode.
{noformat}
2015-12-23 13:26:50,337 [DataNode: 
[[[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/1/dfs/data/data1/,
 
[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/1/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:38151] WARN  datanode.DataNode 
(BPServiceActor.java:run(859)) - Unexpected exception in block pool Block pool 
BP-1060337608-172.17.0.3-1450877209942 (Datanode Uuid 
6b120576-5c02-402f-ab38-079295bda597) service to localhost/127.0.0.1:38151
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1391)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:360)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:796)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:231)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9597) TestReplicationPolicyConsiderLoad#testChooseTargetWithDecomNodes is failing

2015-12-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9597:
-

 Summary: 
TestReplicationPolicyConsiderLoad#testChooseTargetWithDecomNodes is failing
 Key: HDFS-9597
 URL: https://issues.apache.org/jira/browse/HDFS-9597
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


It seems that HDFS-9034 broken this test.
This test has been failing since yesterday.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9591) FSImage.loadEdits threw NullPointerException

2015-12-21 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9591:
-

 Summary: FSImage.loadEdits threw NullPointerException
 Key: HDFS-9591
 URL: https://issues.apache.org/jira/browse/HDFS-9591
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs, ha, namenode
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


https://builds.apache.org/job/PreCommit-HDFS-Build/13963/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestFailureToReadEdits/testCheckpointStartingMidEditsFile_0_/

{noformat}
Error Message

Expected non-empty 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/name-0-3/current/fsimage_005

Stacktrace

java.lang.AssertionError: Expected non-empty 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/name-0-3/current/fsimage_005
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageTestUtil.assertNNHasCheckpoints(FSImageTestUtil.java:470)
at 
org.apache.hadoop.hdfs.server.namenode.ha.HATestUtil.waitForCheckpoint(HATestUtil.java:235)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits.testCheckpointStartingMidEditsFile(TestFailureToReadEdits.java:240)

{noformat}

{noformat}
Exception in thread "Edit log tailer" 
org.apache.hadoop.util.ExitUtil$ExitException: java.lang.NullPointerException
at com.google.common.base.Joiner.join(Joiner.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:818)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:812)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:371)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337)

at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:385)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails

2015-12-20 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9583:
-

 Summary: TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit 
occasionally fails
 Key: HDFS-9583
 URL: https://issues.apache.org/jira/browse/HDFS-9583
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang


https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/

Looking at the code, the test expects that replacing a block from one data node 
to another will issue a delete request to 
FsDatasetAsyncDiskService.deleteAsync(), which should have print log 
"Scheduling ... file ... for deletion", and it waits for 3 seconds. However, it 
never occurred.

I think the test needs a better way to determine if the delete request is 
executed, rather than using a fix time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9565) TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes is flaky

2015-12-16 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9565:
-

 Summary: 
TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes is flaky
 Key: HDFS-9565
 URL: https://issues.apache.org/jira/browse/HDFS-9565
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs, test
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes occasionally 
fails with the following error:
https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/699/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testLocatedFileStatusStorageIdsTypes/
{noformat}
FAILED:  
org.apache.hadoop.hdfs.TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes

Error Message:
Unexpected num storage ids expected:<2> but was:<1>

Stack Trace:
java.lang.AssertionError: Unexpected num storage ids expected:<2> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.hdfs.TestDistributedFileSystem.testLocatedFileStatusStorageIdsTypes(TestDistributedFileSystem.java:855)

{noformat}

It appears that this test failed due to race condition: it does not wait for 
the file replication to finish, before checking the file's status. 

This flaky test can be fixed by using DFSTestUtil.waitForReplication()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9549) TestCacheDirectives#testExceedsCapacity is flaky

2015-12-11 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9549:
-

 Summary: TestCacheDirectives#testExceedsCapacity is flaky
 Key: HDFS-9549
 URL: https://issues.apache.org/jira/browse/HDFS-9549
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: Jenkins
Reporter: Wei-Chiu Chuang


I have observed that this test (TestCacheDirectives.testExceedsCapacity) fails 
quite frequently in Jenkins (trunk, trunk-Java8)  

Error Message

Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, 
replication=1, mark=true}]

Stacktrace

java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not empty, 
[{blockId=1073741841, replication=1, mark=true}]
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9515) NPE in TestDFSZKFailoverController due to binding exception in MiniDFSCluster.initMiniDFSCluster()

2015-12-07 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9515:
-

 Summary: NPE in TestDFSZKFailoverController due to binding 
exception in MiniDFSCluster.initMiniDFSCluster()
 Key: HDFS-9515
 URL: https://issues.apache.org/jira/browse/HDFS-9515
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Jenkins
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


If MiniDFSCluster constructor throws an exception, the cluster object is not 
assigned, so shutdown() call not be called on the object.

I saw in a recent Jenkins job where binding error threw an exception, and later 
on the NPE of cluster.shutdown() hid the real cause of the test failure.

HDFS-9333 has a patch that fixes the bind error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9517) Make TestDistCpUtils.testUnpackAttributes testable

2015-12-07 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9517:
-

 Summary: Make TestDistCpUtils.testUnpackAttributes testable
 Key: HDFS-9517
 URL: https://issues.apache.org/jira/browse/HDFS-9517
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Affects Versions: 3.0.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


testUnpackAttributes() test method in TestDistCpUtils does not have @Test 
annotation and is not testable.

I searched around and saw no discussion it was omitted, so I assume it was just 
unintentional.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9508) Fix NPE in MiniKMS.start()

2015-12-04 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-9508.
---
Resolution: Invalid

This should be filed under Hadoop Commons

> Fix NPE in MiniKMS.start()
> --
>
> Key: HDFS-9508
> URL: https://issues.apache.org/jira/browse/HDFS-9508
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>        Reporter: Wei-Chiu Chuang
>    Assignee: Wei-Chiu Chuang
>  Labels: supportability
>
> Sometimes, KMS resource file can not be loaded. When this happens, an 
> InputStream variable will be a null pointer which will subsequently throw NPE.
> This is a supportability JIRA that makes the error message more explicit, and 
> explain why NPE is thrown. Ultimately, leads us to understand why the 
> resource files can not be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9508) Fix NPE in MiniKMS.start()

2015-12-04 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9508:
-

 Summary: Fix NPE in MiniKMS.start()
 Key: HDFS-9508
 URL: https://issues.apache.org/jira/browse/HDFS-9508
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


Sometimes, KMS resource file can not be loaded. When this happens, an 
InputStream variable will be a null pointer which will subsequently throw NPE.

This is a supportability JIRA that makes the error message more explicit, and 
xplain why NPE is thrown. Ultimately, leads us to understand why the resource 
files can not be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail

2015-11-29 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9476:
-

 Summary: TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage 
occasionally fail
 Key: HDFS-9476
 URL: https://issues.apache.org/jira/browse/HDFS-9476
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


This test occasionally fail. For example, the most recent one is:
https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/

Error Message
{noformat}
Cannot obtain block length for 
LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020;
 getBlockSize()=1024; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]}
{noformat}
Stacktrace
{noformat}
java.io.IOException: Cannot obtain block length for 
LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020;
 getBlockSize()=1024; corrupt=false; offset=0; 
locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]}
at 
org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011)
at 
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177)
at 
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213)
at 
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228)
at 
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600)
at 
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky

2015-11-24 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9466:
-

 Summary: 
TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
 Key: HDFS-9466
 URL: https://issues.apache.org/jira/browse/HDFS-9466
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs, hdfs-client
Reporter: Wei-Chiu Chuang


This test is flaky and fails quite frequently in trunk.
Error Message
expected:<1> but was:<2>
Stacktrace
{noformat}
java.lang.AssertionError: expected:<1> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
at 
org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
at 
org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
{noformat}

Thanks to [~xiaochen] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9361) Default block placement policy causes TestReplaceDataNodeOnFailure to fail intermittently

2015-11-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-9361.
---
Resolution: Not A Problem

I spent some time discussing the issue with [~walter.k.su] and I also agree 
this is not a problem. The test can be configured to ignore load factor.

> Default block placement policy causes TestReplaceDataNodeOnFailure to fail 
> intermittently
> -
>
> Key: HDFS-9361
> URL: https://issues.apache.org/jira/browse/HDFS-9361
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS
>    Reporter: Wei-Chiu Chuang
>
> TestReplaceDatanodeOnFailure sometimes fail (See HDFS-6101).
> (For background information, the test case set up a cluster with three data 
> nodes, add two more data nodes, remove one data nodes, and verify that 
> clients can correctly recover from the failure and set up three replicas)
> I traced down and found that some times a client only set up a pipeline with 
> only two data nodes, which is one less than configured in the test case, even 
> though the test case configures to always replace failed nodes.
> Digging into the log, I saw:
> {noformat}
> 2015-11-02 12:07:38,634 [IPC Server handler 8 on 50673] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(355)) - Failed to place enough 
> replicas, still in nee
> d of 1 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
>  [
> Node /rack0/127.0.0.1:32931 [
>   Datanode 127.0.0.1:32931 is not chosen since the rack has too many chosen 
> nodes .
> ]
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:723)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:624)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:429)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:342)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:220)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:105)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:120)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1727)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:299)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2457)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:796)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2305)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2301)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2299)
> {noformat}
> So from the log, it seems the policy causes the pipeline selection to give up 
> on the data node.
> I wonder whether this is appropriate or not. If the load factor exceeds 
> certain threshold, but the file is insufficient of replicas, should it accept 
> it as is, or should it attempt to acquire more replicas? 
> I am filing 

[jira] [Created] (HDFS-9451) TestFsPermission#testDeprecatedUmask is broken

2015-11-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9451:
-

 Summary: TestFsPermission#testDeprecatedUmask is broken
 Key: HDFS-9451
 URL: https://issues.apache.org/jira/browse/HDFS-9451
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


I noticed this test failed consistently since yesterday. The first failed 
jenkins job is 
https://builds.apache.org/job/Hadoop-common-trunk-Java8/723/changes, and from 
the change log:

{noformat}
Changes:

[wheat9] HDFS-9402. Switch DataNode.LOG to use slf4j. Contributed by Walter Su.

[wheat9] HADOOP-11218. Add TLSv1.1,TLSv1.2 to KMS, HttpFS, SSLFactory.

[wheat9] HADOOP-12467. Respect user-defined JAVA_LIBRARY_PATH in Windows Hadoop

[wheat9] HDFS-8914. Document HA support in the HDFS HdfsDesign.md. Contributed 
by

[wheat9] HDFS-9153. Pretty-format the output for DFSIO. Contributed by Kai 
Zheng.

[wheat9] HDFS-7796. Include X-editable for slick contenteditable fields in the

[wheat9] HDFS-3302. Review and improve HDFS trash documentation. Contributed by

[wheat9] HADOOP-12294. Remove the support of the deprecated dfs.umask.
{noformat}

HADOOP-12294 looks to be the most likely cause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9358) TestNodeCount#testNodeCount timed out

2015-11-02 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9358:
-

 Summary: TestNodeCount#testNodeCount timed out
 Key: HDFS-9358
 URL: https://issues.apache.org/jira/browse/HDFS-9358
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


I have seen this test failure occurred a few times in trunk:

Error Message

Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 
after 2 msec.  Last counts: live = 2, excess = 0, corrupt = 0

Stacktrace

java.util.concurrent.TimeoutException: Timeout: excess replica count not equal 
to 2 for block blk_1073741825_1001 after 2 msec.  Last counts: live = 2, 
excess = 0, corrupt = 0
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9347) Invariant assumption in TestQuorumJournalManager.shutdown() is wrong

2015-10-30 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9347:
-

 Summary: Invariant assumption in 
TestQuorumJournalManager.shutdown() is wrong
 Key: HDFS-9347
 URL: https://issues.apache.org/jira/browse/HDFS-9347
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


The code
{code:title=TestTestQuorumJournalManager.java|borderStyle=solid}
@After
  public void shutdown() throws IOException {
IOUtils.cleanup(LOG, toClose.toArray(new Closeable[0]));

// Should not leak clients between tests -- this can cause flaky tests.
// (See HDFS-4643)
GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*");

if (cluster != null) {
  cluster.shutdown();
}
  }
{code}
implicitly assumes when the call returns from IOUtils.cleanup() (which calls 
close() on QuorumJournalManager object), all IPC client connection threads are 
terminated. However, there is no internal implementation that enforces this 
assumption. Even if the bug reported in HADOOP-12532 is fixed, the internal 
code still only ensures IPC connections are terminated, but not the thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9309) Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()

2015-10-26 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9309:
-

 Summary: Tests that use KeyStoreUtil must call 
KeyStoreUtil.cleanupSSLConfig()
 Key: HDFS-9309
 URL: https://issues.apache.org/jira/browse/HDFS-9309
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


When KeyStoreUtil.setupSSLConfig() is called, several files are created 
(ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). 
However, if they are not deleted upon exit, weird thing can happen to any 
subsequent files.

For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, 
TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message:
{noformat}
java.io.IOException: Unable to load OAuth2 connection factory.
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163)
at 
org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)
{noformat}

There are currently several tests that do not clean up:

{noformat}

130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 
'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L 
"KeyStoreTestUtil\.cleanupSSLConfig"
./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestSecureNNWithQJM.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRespectsBindHostKeys.java
./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/TestHttpFSFWithSWebhdfsFileSystem.java
{noformat}

This JIRA is the effort to remove the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space

2015-10-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9296:
-

 Summary: ShellBasedUnixGroupMapping should support group names 
with space
 Key: HDFS-9296
 URL: https://issues.apache.org/jira/browse/HDFS-9296
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


In a typical configuration, group name is obtained from AD through SSSD/LDAP. 
AD permits group names with space (e.g. "Domain Users").

Unfortunately, the present implementation of ShellBasedUnixGroupMapping parses 
the output of shell command "id -Gn", and assumes group names are separated by 
space. 

This could be achieved by using a combination of shell scripts, for example, 

bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut -d":" 
-f1'

But I am still looking for a more compact form, and potentially more efficient 
one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space

2015-10-23 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-9296.
---
Resolution: Duplicate

I filed in the wrong category. A new one is filed as HADOOP-12505

> ShellBasedUnixGroupMapping should support group names with space
> 
>
> Key: HDFS-9296
> URL: https://issues.apache.org/jira/browse/HDFS-9296
> Project: Hadoop HDFS
>  Issue Type: Bug
>    Reporter: Wei-Chiu Chuang
>        Assignee: Wei-Chiu Chuang
>
> In a typical configuration, group name is obtained from AD through SSSD/LDAP. 
> AD permits group names with space (e.g. "Domain Users").
> Unfortunately, the present implementation of ShellBasedUnixGroupMapping 
> parses the output of shell command "id -Gn", and assumes group names are 
> separated by space. 
> This could be achieved by using a combination of shell scripts, for example, 
> bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut 
> -d":" -f1'
> But I am still looking for a more compact form, and potentially more 
> efficient one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9286) HttpFs does not parse ACL syntax correctly for operation REMOVEACLENTRIES

2015-10-22 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9286:
-

 Summary: HttpFs does not parse ACL syntax correctly for operation 
REMOVEACLENTRIES
 Key: HDFS-9286
 URL: https://issues.apache.org/jira/browse/HDFS-9286
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


Output from WebHdfs:
curl -X PUT  
"http://weichiu.vpc.cloudera.com:50070/webhdfs/v1/a?aclspec=group:user:=REMOVEACLENTRIES=weichiu;


Output from HttpFs:
curl -X PUT  
"http://weichiu.vpc.cloudera.com:14000/webhdfs/v1/a?aclspec=group:user:=REMOVEACLENTRIES=weichiu;

{"RemoteException":{"message":"Invalid  : 
group:user:","exception":"HadoopIllegalArgumentException","javaClassName":"org.apache.hadoop.HadoopIllegalArgumentException"}}

Effectively, what this means is that the behavior of HttpFs is not consistent 
with that of WebHdfs.

Bug is reproducible if httpfs and acl are enabled, and reproducible on 
single-node cluster configuration.

To reproduce, add into core-site.xml:

dfs.webhdfs.enabled
true


dfs.namenode.acls.enabled
true


hadoop.proxyuser.#HTTPFSUSER#.hosts
httpfs-host.foo.com
  
  
hadoop.proxyuser.#HTTPFSUSER#.groups
*
  

restart name node, data node and httpfs daemon

Credit to [~romainr] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9285) testTruncateWithDataNodesRestartImmediately occasionally fails

2015-10-22 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9285:
-

 Summary: testTruncateWithDataNodesRestartImmediately occasionally 
fails
 Key: HDFS-9285
 URL: https://issues.apache.org/jira/browse/HDFS-9285
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Priority: Minor


https://builds.apache.org/job/Hadoop-Hdfs-trunk/2462/testReport/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestartImmediately/
Note that this is similar, but appears to be a different failure than HDFS-8729.

Error Message

inode should complete in ~3 ms.
Expected: is 
 but: was 
Stacktrace

java.lang.AssertionError: inode should complete in ~3 ms.
Expected: is 
 but: was 
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.junit.Assert.assertThat(Assert.java:865)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1192)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1176)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1171)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:798)


Log excerpt:
2015-10-22 06:34:47,281 [IPC Server handler 8 on 8020] INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins 
(auth:SIMPLE)   ip=/127.0.0.1   cmd=open
src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
perm=null   proto=rpc
2015-10-22 06:34:47,382 [IPC Server handler 9 on 8020] INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins 
(auth:SIMPLE)   ip=/127.0.0.1   cmd=open
src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
perm=null   proto=rpc
2015-10-22 06:34:47,484 [IPC Server handler 0 on 8020] INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins 
(auth:SIMPLE)   ip=/127.0.0.1   cmd=open
src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
perm=null   proto=rpc
2015-10-22 06:34:47,585 [IPC Server handler 1 on 8020] INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditMessage(7358)) - allowed=trueugi=jenkins 
(auth:SIMPLE)   ip=/127.0.0.1   cmd=open
src=/test/testTruncateWithDataNodesRestartImmediately   dst=null
perm=null   proto=rpc
2015-10-22 06:34:47,689 [main] INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1889)) - Shutting down the Mini HDFS Cluster
2015-10-22 06:34:47,690 [main] INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdownDataNodes(1935)) - Shutting down DataNode 2
2015-10-22 06:34:47,690 [main] WARN  datanode.DirectoryScanner 
(DirectoryScanner.java:shutdown(529)) - DirectoryScanner: shutdown has been 
called



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9268) JVM crashes when attempting to update a file in fuse file system using vim

2015-10-20 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9268:
-

 Summary: JVM crashes when attempting to update a file in fuse file 
system using vim
 Key: HDFS-9268
 URL: https://issues.apache.org/jira/browse/HDFS-9268
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


JVM crashes when users attempt to use vi to update a file on fuse file system 
with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
generate the bug, but the same bug is reproducible in trunk)

The root cause is a segfault in a pdfs-fuse method

To reproduce it do as follows:
mkdir /mnt/fuse
chmod 777 /mnt/fuse
ulimit -c unlimited# to enable coredump
hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
touch /mnt/fuse/y
chmod 600 /mnt/fuse/y
vim /mnt/fuse/y
(in vim, :w to save the file)

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
#
# JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
#
# Core dump written. Default location: /home/weichiu/core or core.26606
#
# An error report file with more information is saved as:
# /home/weichiu/hs_err_pid26606.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
/usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core dumped) 
env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@

===
The coredump shows the segfault comes from 
(gdb) bt
#0  0x003b82e328e5 in raise () from /lib64/libc.so.6
#1  0x003b82e340c5 in abort () from /lib64/libc.so.6
#2  0x7f66fc924d75 in os::abort(bool) () from 
/etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
#3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
/etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
#4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
/etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
#5  
#6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
#7  0x004039a0 in hdfsConnTree_RB_FIND ()
#8  0x00403e8f in fuseConnect ()
#9  0x004046db in dfs_chown ()
#10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
#11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
#12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
#13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
#14 0x003b82ee894d in clone () from /lib64/libc.so.6





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9269) Need to update the documentation and wrapper for hdfs-dfs

2015-10-20 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9269:
-

 Summary: Need to update the documentation and wrapper for hdfs-dfs
 Key: HDFS-9269
 URL: https://issues.apache.org/jira/browse/HDFS-9269
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the 
wrapper script of hdfs-fuse, but found them super outdated. (the wrapper was 
last updated four years ago, and the hadoop project layout has dramatically 
changed since then) I am creating this JIRA to track the status of the update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-7464) TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 8

2015-10-19 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-7464:
---

I am seeing this today with Java 7
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

Running org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.205 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
testRefreshSuperUserGroupsConfiguration(org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA)
  Time elapsed: 0.808 sec  <<< FAILURE!
java.lang.AssertionError: refreshSuperUserGroupsConfiguration: End of File 
Exception between local host is: "weichiu-MBP.local/172.16.1.61"; destination 
host is: "localhost":10872; : java.io.EOFException; For more details see:  
http://wiki.apache.org/hadoop/EOFException expected:<0> but was:<-1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration(TestDFSAdminWithHA.java:235)


Results :

Failed tests:
  TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration:235 
refreshSuperUserGroupsConfiguration: End of File Exception between local host 
is: "weichiu-MBP.local/172.16.1.61"; destination host is: "localhost":10872; : 
java.io.EOFException; For more details see:  
http://wiki.apache.org/hadoop/EOFException expected:<0> but was:<-1>


> TestDFSAdminWithHA#testRefreshSuperUserGroupsConfiguration fails against Java 
> 8
> ---
>
> Key: HDFS-7464
> URL: https://issues.apache.org/jira/browse/HDFS-7464
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> From https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/23/ :
> {code}
> REGRESSION:  
> org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration
> Error Message:
> refreshSuperUserGroupsConfiguration: End of File Exception between local host 
> is: "asf908.gq1.ygridcore.net/67.195.81.152"; destination host is: 
> "localhost":12700; : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException expected:<0> but was:<-1>
> Stack Trace:
> java.lang.AssertionError: refreshSuperUserGroupsConfiguration: End of File 
> Exception between local host is: "asf908.gq1.ygridcore.net/67.195.81.152"; 
> destination host is: "localhost":12700; : java.io.EOFException; For more 
> details see:  http://wiki.apache.org/hadoop/EOFException expected:<0> but 
> was:<-1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at 
> org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testRefreshSuperUserGroupsConfiguration(TestDFSAdminWithHA.java:228)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-10-15 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9249:
-

 Summary: NPE thrown if an IOException is thrown in NameNode.
 Key: HDFS-9249
 URL: https://issues.apache.org/jira/browse/HDFS-9249
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


This issue was found when running test case TestBackupNode.testCheckpointNode, 
but upon closer look, the problem is not due to the test case.

Looks like an IOException was thrown in
try {
  initializeGenericKeys(conf, nsId, namenodeId);
  initialize(conf);
  try {
haContext.writeLock();
state.prepareToEnterState(haContext);
state.enterState(haContext);
  } finally {
haContext.writeUnlock();
  }
causing the namenode to stop, but the namesystem was not yet properly 
instantiated, causing NPE.

I tried to reproduce locally, but to no avail.

Because I could not reproduce the bug, and the log does not indicate what 
caused the IOException, I suggest make this a supportability JIRA to log the 
exception for future improvement.

Stacktrace
java.lang.NullPointerException: null
at org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
at org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
The last few lines of log:
2015-10-14 19:45:07,807 INFO namenode.NameNode 
(NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
(MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
(again)
2015-10-14 19:45:07,808 INFO namenode.NameNode 
(NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
hdfs://localhost:37835
2015-10-14 19:45:07,808 INFO namenode.NameNode 
(NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
localhost:37835 to access this namenode/service.
2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
(FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
active state
2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
(FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
(FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
(FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
syncs: 4 SyncTimes(ms): 2 1 
2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
(FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, exiting
2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
(FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
 -> 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
(FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
 -> 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-003
2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor 
(CacheReplicationMonitor.java:run(169)) - Shutting down CacheReplicationMonitor
2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping 
server on 37835
2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC 
Server listener on 37835
2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC 
Server Responder
2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager 
(BlockManager.java:run(3781)) - Stopping ReplicationMonitor.
2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager 
(DecommissionManager.java:run(78)) - M

[jira] [Created] (HDFS-9243) TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout

2015-10-14 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9243:
-

 Summary: 
TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout
 Key: HDFS-9243
 URL: https://issues.apache.org/jira/browse/HDFS-9243
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


This is happening on trunk 
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
 
On my local Linux machine, this test case times out 6 out of 10 times. When it 
does not time out, this test takes about 20 seconds, otherwise it takes more 
than 60 seconds and then time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9181) Better handling of exceptions thrown during upgrade shutdown

2015-09-30 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9181:
-

 Summary: Better handling of exceptions thrown during upgrade 
shutdown
 Key: HDFS-9181
 URL: https://issues.apache.org/jira/browse/HDFS-9181
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


Previously in HDFS-7533, a bug was fixed by suppressing exceptions during 
upgrade shutdown. It may be appropriate as a temporary fix, but it would be 
better if the exception is handled in some way.

One way to handle it is by emitting a warning message. There could exist other 
ways to handle it. This lira is created to discuss how to handle this case 
better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9123) Validation of a path ended with a '/'

2015-09-22 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9123:
-

 Summary: Validation of a path ended with a '/'
 Key: HDFS-9123
 URL: https://issues.apache.org/jira/browse/HDFS-9123
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


HDFS forbids copying from a directory to its subdirectory (e.g. hdfs dfs -cp 
/abc /abc/xyz) as otherwise it could cause infinite copying (/abc/xyz/xyz, 
/abc/xyz/xyz, /abc/xyz/xyz/xyz,... etc)

However, if the source path is ended with a '/' path separator, the existing 
validation for sub-directories fails. For example, copying from / to /abc would 
cause infinite copying, until the disk space is filled up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    3   4   5   6   7   8