[jira] [Updated] (HDFS-6143) HftpFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HDFS-6143: Attachment: HDFS-6143.v02.patch [~ste...@apache.org], thanks for the review. I tried to keep the patch as small as possible. Here is an updated v02 of the patch to accommodate your comments. I noticed that there are more issues with how server-side exceptions are translated in FileDataServlet, and made it more elaborate. HftpFileSystem open should throw FileNotFoundException for non-existing paths - Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6147) New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..)
Vinayakumar B created HDFS-6147: --- Summary: New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..) Key: HDFS-6147 URL: https://issues.apache.org/jira/browse/HDFS-6147 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B New blocks scanning will be delayed if old blocks deleted after datanode restart. Steps: 1. Write some blocks and wait till all scans over 2. Restart the datanode 3. Delete some of the blocks 4. Write new blocks which are less in size compared to deleted blocks. Problem: {{BlockPoolSliceScanner#updateBytesToScan(..)}} updates {{bytesLeft}} based on following comparison {code} if (lastScanTime currentPeriodStart) { bytesLeft += len; }{code} But in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} {{bytesLeft}} decremented using below comparison {code}if (now - entry.verificationTime scanPeriod) {{code} Hence when the old blocks are deleted {{bytesLeft}} going negative. new blocks will not be scanned until it becomes positive again. So in both places verificationtime should be compared against scanperiod. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6147) New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..)
[ https://issues.apache.org/jira/browse/HDFS-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6147: Attachment: HDFS-6147.patch New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..) --- Key: HDFS-6147 URL: https://issues.apache.org/jira/browse/HDFS-6147 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6147.patch New blocks scanning will be delayed if old blocks deleted after datanode restart. Steps: 1. Write some blocks and wait till all scans over 2. Restart the datanode 3. Delete some of the blocks 4. Write new blocks which are less in size compared to deleted blocks. Problem: {{BlockPoolSliceScanner#updateBytesToScan(..)}} updates {{bytesLeft}} based on following comparison {code} if (lastScanTime currentPeriodStart) { bytesLeft += len; }{code} But in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} {{bytesLeft}} decremented using below comparison {code}if (now - entry.verificationTime scanPeriod) {{code} Hence when the old blocks are deleted {{bytesLeft}} going negative. new blocks will not be scanned until it becomes positive again. So in both places verificationtime should be compared against scanperiod. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6147) New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..)
[ https://issues.apache.org/jira/browse/HDFS-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6147: Status: Patch Available (was: Open) New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..) --- Key: HDFS-6147 URL: https://issues.apache.org/jira/browse/HDFS-6147 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6147.patch New blocks scanning will be delayed if old blocks deleted after datanode restart. Steps: 1. Write some blocks and wait till all scans over 2. Restart the datanode 3. Delete some of the blocks 4. Write new blocks which are less in size compared to deleted blocks. Problem: {{BlockPoolSliceScanner#updateBytesToScan(..)}} updates {{bytesLeft}} based on following comparison {code} if (lastScanTime currentPeriodStart) { bytesLeft += len; }{code} But in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} {{bytesLeft}} decremented using below comparison {code}if (now - entry.verificationTime scanPeriod) {{code} Hence when the old blocks are deleted {{bytesLeft}} going negative. new blocks will not be scanned until it becomes positive again. So in both places verificationtime should be compared against scanperiod. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6133) Make Balancer support exclude specified path
[ https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-6133: --- Attachment: (was: HDFS-6133.patch) Make Balancer support exclude specified path Key: HDFS-6133 URL: https://issues.apache.org/jira/browse/HDFS-6133 Project: Hadoop HDFS Issue Type: Improvement Components: balancer, namenode Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-6133.patch Currently, run Balancer will destroying Regionserver's data locality. If getBlocks could exclude blocks belongs to files which have specific path prefix, like /hbase, then we can run Balancer without destroying Regionserver's data locality. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6133) Make Balancer support exclude specified path
[ https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-6133: --- Attachment: HDFS-6133.patch This patch support exclude multiple paths. Make Balancer support exclude specified path Key: HDFS-6133 URL: https://issues.apache.org/jira/browse/HDFS-6133 Project: Hadoop HDFS Issue Type: Improvement Components: balancer, namenode Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: HDFS-6133.patch Currently, run Balancer will destroying Regionserver's data locality. If getBlocks could exclude blocks belongs to files which have specific path prefix, like /hbase, then we can run Balancer without destroying Regionserver's data locality. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5196) Provide more snapshot information in WebUI
[ https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-5196: - Attachment: HDFS-5196-8.patch I attach the new patch file. Provide more snapshot information in WebUI -- Key: HDFS-5196 URL: https://issues.apache.org/jira/browse/HDFS-5196 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5196-2.patch, HDFS-5196-3.patch, HDFS-5196-4.patch, HDFS-5196-5.patch, HDFS-5196-6.patch, HDFS-5196-7.patch, HDFS-5196-8.patch, HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, snapshottable-directoryList.png, snapshotteddir.png The WebUI should provide more detailed information about snapshots, such as all snapshottable directories and corresponding number of snapshots (suggested in HDFS-4096). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) HftpFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944884#comment-13944884 ] Hadoop QA commented on HDFS-6143: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636306/HDFS-6143.v02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6471//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6471//console This message is automatically generated. HftpFileSystem open should throw FileNotFoundException for non-existing paths - Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: HDFS-6143.v01.patch, HDFS-6143.v02.patch HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6147) New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..)
[ https://issues.apache.org/jira/browse/HDFS-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944949#comment-13944949 ] Hadoop QA commented on HDFS-6147: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636312/HDFS-6147.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks org.apache.hadoop.hdfs.server.datanode.TestMultipleNNDataBlockScanner org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport org.apache.hadoop.hdfs.TestDatanodeBlockScanner {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6472//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6472//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6472//console This message is automatically generated. New blocks scanning will be delayed due to issue in BlockPoolSliceScanner#updateBytesToScan(..) --- Key: HDFS-6147 URL: https://issues.apache.org/jira/browse/HDFS-6147 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-6147.patch New blocks scanning will be delayed if old blocks deleted after datanode restart. Steps: 1. Write some blocks and wait till all scans over 2. Restart the datanode 3. Delete some of the blocks 4. Write new blocks which are less in size compared to deleted blocks. Problem: {{BlockPoolSliceScanner#updateBytesToScan(..)}} updates {{bytesLeft}} based on following comparison {code} if (lastScanTime currentPeriodStart) { bytesLeft += len; }{code} But in {{BlockPoolSliceScanner#assignInitialVerificationTimes()}} {{bytesLeft}} decremented using below comparison {code}if (now - entry.verificationTime scanPeriod) {{code} Hence when the old blocks are deleted {{bytesLeft}} going negative. new blocks will not be scanned until it becomes positive again. So in both places verificationtime should be compared against scanperiod. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-4929) [NNBench mark] Lease mismatch error when running with multiple mappers
[ https://issues.apache.org/jira/browse/HDFS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-4929: -- Assignee: Brahma Reddy Battula [NNBench mark] Lease mismatch error when running with multiple mappers -- Key: HDFS-4929 URL: https://issues.apache.org/jira/browse/HDFS-4929 Project: Hadoop HDFS Issue Type: Bug Components: benchmarks Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Command : ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 -bytesToWrite 102400 -baseDir /benchmarks/NNBench`hostname -s` -replicationFactorPerFile 3 -maps 100 -reduces 10 Trace : 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.105.214:36320: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5196) Provide more snapshot information in WebUI
[ https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944970#comment-13944970 ] Hadoop QA commented on HDFS-5196: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636316/HDFS-5196-8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6473//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6473//console This message is automatically generated. Provide more snapshot information in WebUI -- Key: HDFS-5196 URL: https://issues.apache.org/jira/browse/HDFS-5196 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5196-2.patch, HDFS-5196-3.patch, HDFS-5196-4.patch, HDFS-5196-5.patch, HDFS-5196-6.patch, HDFS-5196-7.patch, HDFS-5196-8.patch, HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, snapshottable-directoryList.png, snapshotteddir.png The WebUI should provide more detailed information about snapshots, such as all snapshottable directories and corresponding number of snapshots (suggested in HDFS-4096). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945142#comment-13945142 ] Yu Li commented on HDFS-6010: - The UT failure is caused by a bug of TestBalancer, here is detailed analysis: Let's look into the code logic of testUnevenDistribution: If number of datanode of the mini-cluster is 3(or larger), the replication factor will be set to 2(or more), and generateBlocks will generate a file with it, say the block number will equal to (targetSize/replicationFactor)/blockSize. Then distributeBlock will double the block number through below codes: {code} for(int i=0; iblocks.length; i++) { for(int j=0; jreplicationFactor; j++) { boolean notChosen = true; while(notChosen) { int chosenIndex = r.nextInt(usedSpace.length); if( usedSpace[chosenIndex]0 ) { notChosen = false; blockReports.get(chosenIndex).add(blocks[i].getLocalBlock()); usedSpace[chosenIndex] -= blocks[i].getNumBytes(); } } } } {code} Notice that this distribution cannot prevent replicated blocks on the same datanode. And then, while invoking the MiniDFSCluster#injectBlocks(actually SimulatedFSDataset#injectBlocks) method, the duplicated blocks would get removed according to below code segment {code:title=SimulatedFSDataset#injectBlocks} public synchronized void injectBlocks(String bpid, IterableBlock injectBlocks) throws IOException { ExtendedBlock blk = new ExtendedBlock(); if (injectBlocks != null) { for (Block b: injectBlocks) { // if any blocks in list is bad, reject list if (b == null) { throw new NullPointerException(Null blocks in block list); } blk.set(bpid, b); if (isValidBlock(blk)) { throw new IOException(Block already exists in block list); } } MapBlock, BInfo map = blockMap.get(bpid); if (map == null) { map = new HashMapBlock, BInfo(); blockMap.put(bpid, map); } for (Block b: injectBlocks) { BInfo binfo = new BInfo(bpid, b, false); map.put(binfo.theBlock, binfo); } } } {code} This will cause the used space less than what is expected thus cause testing failure. The issue was hidden because *in existing tests the datanode number was never set to larger than 2*. It would be easy to reproduce the issue simply by increasing the datanode number of TestBalancer#testBalancer1Internal from 2 to 3, like {code:title=TestBalancer#testBalancer1Internal} void testBalancer1Internal(Configuration conf) throws Exception { initConf(conf); testUnevenDistribution(conf, new long[] {90*CAPACITY/100, 50*CAPACITY/100, 10*CAPACITY/100}, new long[] {CAPACITY, CAPACITY, CAPACITY}, new String[] {RACK0, RACK1, RACK2}); } {code} I've tried to refine the distribution method, however I found it hard to make it general. To make sure no duplicated blocks assigned to the same datanode, we must make sure the largest distribution less than sum of the other distributions After a second thought, I even don't think it necessary to involve replication factor into the balancer testing. Maybe the UT designer was thinking about testing balancer manner when there's also replication ongoing, but unfortunately the current design cannot reveal this. So personally, I propose to always set replication factor to 1 in TestBalancer Make balancer able to balance data among specified servers -- Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Labels: balancer Attachments: HDFS-6010-trunk.patch Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new -servers option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Attachment: HDFS-6010-trunk_V2.patch Attach the new patch with fix of the UT failure as mentioned above, and resubmit patch for hadoop QA to test Make balancer able to balance data among specified servers -- Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Labels: balancer Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new -servers option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Status: Patch Available (was: Open) Make balancer able to balance data among specified servers -- Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Labels: balancer Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new -servers option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HDFS-6010: Status: Open (was: Patch Available) Make balancer able to balance data among specified servers -- Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Labels: balancer Attachments: HDFS-6010-trunk.patch Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new -servers option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
[ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945192#comment-13945192 ] Kihwal Lee commented on HDFS-3087: -- +1 The patch looks good. Decomissioning on NN restart can complete without blocks being replicated - Key: HDFS-3087 URL: https://issues.apache.org/jira/browse/HDFS-3087 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Kihwal Lee Priority: Critical Attachments: HDFS-3087.patch If a data node is added to the exclude list and the name node is restarted, the decomissioning happens right away on the data node registration. At this point the initial block report has not been sent, so the name node thinks the node has zero blocks and the decomissioning completes very quick, without replicating the blocks on that node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikola Vujic updated HDFS-5846: --- Attachment: hdfs-5846.patch Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945207#comment-13945207 ] Nikola Vujic commented on HDFS-5846: Hi Chris, I have fixed the patch according to your comments. Now I have two lines longer than 80 characters. That is because of long names for constants. Is this ok? I have implemented a unit test (testRejectUnresolvedDatanodes) in the TestDatanodeManager class since it seems as a more appropriate place for testing this particular thing. Also, in the same class I have changed the logger (org.mortbay.log.Log was used and now patch is using org.apache.commons.logging.Log). Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
[ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-3087: - Assignee: Rushabh S Shah Decomissioning on NN restart can complete without blocks being replicated - Key: HDFS-3087 URL: https://issues.apache.org/jira/browse/HDFS-3087 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Kihwal Lee Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-3087.patch If a data node is added to the exclude list and the name node is restarted, the decomissioning happens right away on the data node registration. At this point the initial block report has not been sent, so the name node thinks the node has zero blocks and the decomissioning completes very quick, without replicating the blocks on that node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
[ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-3087: - Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks for working on the fix, Rushabh. Decomissioning on NN restart can complete without blocks being replicated - Key: HDFS-3087 URL: https://issues.apache.org/jira/browse/HDFS-3087 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Kihwal Lee Assignee: Rushabh S Shah Priority: Critical Fix For: 3.0.0, 2.5.0 Attachments: HDFS-3087.patch If a data node is added to the exclude list and the name node is restarted, the decomissioning happens right away on the data node registration. At this point the initial block report has not been sent, so the name node thinks the node has zero blocks and the decomissioning completes very quick, without replicating the blocks on that node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3087) Decomissioning on NN restart can complete without blocks being replicated
[ https://issues.apache.org/jira/browse/HDFS-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945239#comment-13945239 ] Hudson commented on HDFS-3087: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5386 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5386/]) HDFS-3087. Decomissioning on NN restart can complete without blocks being replicated. Contributed by Rushabh S Shah. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580886) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java Decomissioning on NN restart can complete without blocks being replicated - Key: HDFS-3087 URL: https://issues.apache.org/jira/browse/HDFS-3087 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Kihwal Lee Assignee: Rushabh S Shah Priority: Critical Fix For: 3.0.0, 2.5.0 Attachments: HDFS-3087.patch If a data node is added to the exclude list and the name node is restarted, the decomissioning happens right away on the data node registration. At this point the initial block report has not been sent, so the name node thinks the node has zero blocks and the decomissioning completes very quick, without replicating the blocks on that node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945342#comment-13945342 ] Hadoop QA commented on HDFS-6010: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636334/HDFS-6010-trunk_V2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6474//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6474//console This message is automatically generated. Make balancer able to balance data among specified servers -- Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Labels: balancer Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new -servers option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6148) LeaseManager crashes while initiating block recovery
Kihwal Lee created HDFS-6148: Summary: LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6148) LeaseManager crashes while initiating block recovery
[ https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6148: - Description: While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} was: While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6148) LeaseManager crashes while initiating block recovery
[ https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6148: - Description: While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction. initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} was: While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction. initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5196) Provide more snapshot information in WebUI
[ https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945405#comment-13945405 ] Haohui Mai commented on HDFS-5196: -- The patch generally looks good. There are some minor comments needed to be addressed: {code} +'array_length' : function (v) { + var cnt = 0; + for (var i in v) { +cnt++; + } + return cnt; {code} You can follow what dust.js does (https://github.com/linkedin/dustjs/wiki/Dust-Tutorial#size_keyxxx___size_helper_Available_in_Dust_V11_release), and derives a new version that evaluates the key. Otherwise, I think that the following code will print {{Snapshottable directories:}} instead of {{Snapshottable directories:0}} {code} +div class=page-headerh1smallSnapshottable directories: {SnapshotStats.directory|array_length}/small/div {code} I guess for this version let's just remove {{array_length}} and {code} +div class=page-headerh1smallSnapshottable directories: {SnapshotStats.directory|array_length}/small/div +div class=page-headerh1smallSnapshotted directories: {SnapshotStats.snapshots|array_length}/small/div {code} Let's address it in a separate jira. {code} +var HELPERS = { + 'helper_to_permission': function (chunk, ctx, bodies, params) { +var p = ctx.current().permission; +var symbols = [ '---', '--x', '-w-', '-wx', 'r--', 'r-x', 'rw-', 'rwx' ]; +var sticky = p 1000; + +var res = ; +res = symbols[(p 6) 7] + symbols[(p 3) 7] + symbols[p 7]; + +if (sticky) { + var otherExec = ((ctx.current().permission % 10) 1) == 1; + res = res.substr(0, res.length - 1) + (otherExec ? 't' : 'T'); +} + +chunk.write('d' + res); +return chunk; + } +}; + {code} You can move it to the filter object in {{dfs-dust.js}} and remove the duplicated one in {{explorer.js}}. Nit: there are some trailing white spaces. Provide more snapshot information in WebUI -- Key: HDFS-5196 URL: https://issues.apache.org/jira/browse/HDFS-5196 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5196-2.patch, HDFS-5196-3.patch, HDFS-5196-4.patch, HDFS-5196-5.patch, HDFS-5196-6.patch, HDFS-5196-7.patch, HDFS-5196-8.patch, HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, snapshottable-directoryList.png, snapshotteddir.png The WebUI should provide more detailed information about snapshots, such as all snapshottable directories and corresponding number of snapshots (suggested in HDFS-4096). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API
[ https://issues.apache.org/jira/browse/HDFS-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945410#comment-13945410 ] Haohui Mai commented on HDFS-5978: -- bq. Just to be clear, this is an WebHDFS operation over HTTP, it is not part of the WebHDFS FileSystem HTTP REST API, right? I'm not exactly sure what you mean. The patch provides an offline image viewer that creates a HTTP server and exposes the same APIs of WebHDFS, so that the users can use other tools (like {{hadoop dfs -ls}} or the web UI) to inspect the fsimage. Other than that it has nothing to do with WebHDFS. Thanks for bringing it up. However, it might be better to update the description to avoid the confusion. Create a tool to take fsimage and expose read-only WebHDFS API -- Key: HDFS-5978 URL: https://issues.apache.org/jira/browse/HDFS-5978 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-5978.2.patch, HDFS-5978.3.patch, HDFS-5978.4.patch, HDFS-5978.patch Suggested in HDFS-5975. Add an option to exposes the read-only version of WebHDFS API for OfflineImageViewer. You can imagine it looks very similar to jhat. That way we can allow the operator to use the existing command-line tool, or even the web UI to debug the fsimage. It also allows the operator to interactively browsing the file system, figuring out what goes wrong. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6149) Running UTs with testKerberos profile has failures.
Jinghui Wang created HDFS-6149: -- Summary: Running UTs with testKerberos profile has failures. Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.3.0 UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does require credentials. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945431#comment-13945431 ] Hadoop QA commented on HDFS-5846: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636341/hdfs-5846.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDistributedFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6475//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6475//console This message is automatically generated. Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6149) Running UTs with testKerberos profile has failures.
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Description: UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials. was: UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does require credentials. Running UTs with testKerberos profile has failures. --- Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.3.0 UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6149) Running UTs with testKerberos profile has failures.
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945449#comment-13945449 ] Jinghui Wang commented on HDFS-6149: The test that cancels delegation token in testDelegationTokenHttpFSAccess assumes the operation does not require credentials. However the following block of code in HttpFSKerberosAuthenticationHandler shows that if there is no token, then the CANCELDELEGATIONTOKEN operation is not performed. Thus causing an assertion error as the the test expects a response code of 200 but getting 401 instead. else if (dtOp.requiresKerberosCredentials() token == null) { response.sendError(HttpServletResponse.SC_UNAUTHORIZED, MessageFormat.format( Operation [{0}] requires SPNEGO authentication established, dtOp)); requestContinues = false; } Running UTs with testKerberos profile has failures. --- Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.3.0 UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6148) LeaseManager crashes while initiating block recovery
[ https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945473#comment-13945473 ] Kihwal Lee commented on HDFS-6148: -- It may have something to do with loading fsimage + edits and processing under-construction files. LeaseManager crashes one hour after NN start-up. LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction. initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5846: Attachment: hdfs-5846.patch +1 for the patch. The test failure looks unrelated to me. I couldn't reproduce it locally. I'm re-uploading the same patch just to kick off another Jenkins run to confirm. Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5846: Component/s: namenode Target Version/s: 3.0.0, 2.4.0 Affects Version/s: 3.0.0 2.3.0 Hadoop Flags: Reviewed Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.3.0 Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6148) LeaseManager crashes while initiating block recovery
[ https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945426#comment-13945426 ] Kihwal Lee commented on HDFS-6148: -- replicas.size() was non-zero and there was a corresponding ReplicaUnderConstruction, but its expectedLocation seemed to be null. This can happen if setExpectedStorageLocations() were called with array of nulls. This might happen if a last block with null locations is turned into a BlockInfoUnderConstruction. There might be other ways though. LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction. initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6149) Running UTs with testKerberos profile has failures.
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Attachment: HDFS-6149.patch Running UTs with testKerberos profile has failures. --- Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.3.0 Attachments: HDFS-6149.patch UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6148) LeaseManager crashes while initiating block recovery
[ https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945530#comment-13945530 ] Kihwal Lee commented on HDFS-6148: -- Sorry, it was seen on a 2.3 cluster. I will verify wether we still have this bug in 2.4 LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction. initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6148) LeaseManager crashes while initiating block recovery
[ https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6148: - Affects Version/s: (was: 2.4.0) 2.3.0 LeaseManager crashes while initiating block recovery Key: HDFS-6148 URL: https://issues.apache.org/jira/browse/HDFS-6148 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee Priority: Blocker While running branch-2.4, the LeaseManager crashed with an NPE. This does not always happen on block recovery. {panel} Exception in thread org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$ ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction. initializeBlockRecovery(BlockInfoUnderConstruction.java:286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411) at java.lang.Thread.run(Thread.java:722) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945560#comment-13945560 ] Tsz Wo Nicholas Sze commented on HDFS-5138: --- +1 the branch-2 patch looks good. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945576#comment-13945576 ] Alejandro Abdelnur commented on HDFS-6134: -- (Cross-posting HADOOP-10150 HDFS-6134] [~avik_...@yahoo.com], I’ve just looked at the MAR/21 proposal in HADOOP-10150 (the patches uploaded on MAR/21 do not apply on trunk cleanly, so I cannot look at them easily. It seems to have missing pieces, like getXAttrs() and wiring to KeyProvider API. Would be possible to rebased them so they apply to trunk?) bq. do we need a new proposal for the work already being done on HADOOP-10150? HADOOP-10150 aims to provide encryption for any filesystem implementation as a decorator filesystem. While HDFS-6134 aims to provide encryption for HDFS. The 2 approaches differ on the level of transparency you get. The comparison table in the HDFS Data at Rest Encryption attachment (https://issues.apache.org/jira/secure/attachment/12635964/HDFSDataAtRestEncryption.pdf) highlights the differences. Particularly, the things I’m concerned the most with HADOOP-10150 are: * All clients (doing encryption/decryption) must have access the key management service. * Secure key propagation to tasks running in the cluster (i.e. mapper and reducer tasks) * Use of AES-CTR (instead of an authenticated encryption mode such as AES-GCM) * Not clear how hflush() bq. are there design choices in this proposal that are superior to the patch already provided on HADOOP-10150? IMO, a consolidated access/distribution of keys by the NN (as opposed to every client) improves the security of the system. bq. do you have additional requirement listed in this JIRA that could be incorporated in to HADOOP-10150, They are enumerated in the HDFS Data at Rest Encryption attachment. The ones I don’t see them address in HADOOP-10150 are: #6, #8.A. And it is not clear how #4 #5 can be achieved. bq. so we can collaborate and not duplicate? Definitely, I want to work together with you guys to leverage as much as posible. Either by unifying the 2 proposal or by sharing common code if we think both approaches have merits and we decide to move forward with both. Happy to jump on a call to discuss things and the report back to the community if you think that will speed up the discussion. -- By looking at the latest design doc of HADOOP-10150 I can see that things have been modified a bit (from the original design doc) bringing it a bit closer to some of the HDFS-6134 requirements. Still, it is not clear how transparency will be achieved for existing applications: HDFS URI changes, clients must connect to the Key store to retrieve the encryption key (clients will need key store principals). The encryption key must be propagated to jobs tasks (i.e. Mapper/Reducer processes) Requirement #4 Can decorate HDFS and all other file systems in Hadoop, and will not modify existing structure of file system, such as namenode and datanode structure if the wrapped file system is HDFS. This is contradicted by the design, in the Storage of IV and data key is stated So we implement extended information based on INode feature, and use it to store data key and IV. Requirement #5 Admin can configure encryption policies, such as which directory will be encrypted., this seems driven by HDFS client configuration file (hdfs-site.xml). This is not really admin driven as clients could break this by configuring their hdfs-site.xml file) Restrictions of move operations for files within an encrypted directory. The original design had something about it (not entirely correct), now is gone. (Mentioned before), how thing flush() operations will be handled as the encryption block will be cut short? How this is handled on writes? How this is handled on reads? Explicit auditing on encrypted files access does not seem handled. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation
[jira] [Commented] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back
[ https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945614#comment-13945614 ] Hadoop QA commented on HDFS-6135: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636293/HDFS-6135.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6476//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6476//console This message is automatically generated. In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back -- Key: HDFS-6135 URL: https://issues.apache.org/jira/browse/HDFS-6135 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Blocker Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, HDFS-6135.002.patch, HDFS-6135.test.txt While doing HDFS upgrade with HA setup, if the layoutversion gets changed in the upgrade, the rollback may trigger the following exception in JournalNodes (suppose the new software bumped the layoutversion from -55 to -56): {code} 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if roll back possible for one or more JournalNodes. 1 exceptions thrown: Unexpected version of storage directory /grid/1/tmp/journal/mycluster. Reported: -56. Expecting = -55. at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203) at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156) at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73) at org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228) {code} Looks like for rollback JN with old software cannot handle future layoutversion brought by new software. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release
[ https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945628#comment-13945628 ] Tsz Wo Nicholas Sze commented on HDFS-6130: --- Apache release also has this issue. Apache 1.0.4 upgrade to the trunk, you can reproduce this issue. Hi Fengdong, I just have tried it but cannot reproduce the NPE. There were a log of changes since Apache 1.0.4. I was using 1.3.0 in my test. Could you also try it? NPE during namenode upgrade from old release Key: HDFS-6130 URL: https://issues.apache.org/jira/browse/HDFS-6130 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, I can upgrade successfully if I don't configurage HA, but if HA enabled, there is NPE when I run ' hdfs namenode -initializeSharedEdits' {code} 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 60 millis 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/03/20 15:06:41 INFO util.GSet: VM type = 64-bit 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 275.3 KB 14/03/20 15:06:41 INFO util.GSet: capacity = 2^15 = 32768 entries 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false 14/03/20 15:06:41 INFO common.Storage: Lock on /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO common.Storage: Lock on /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected. 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes. 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360) 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176 / {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release
[ https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945642#comment-13945642 ] Tsz Wo Nicholas Sze commented on HDFS-6130: --- I believe that this is a duplicate of HDFS-5988. Hi [~wheat9], the stack trace posted here is indeed different from the one posted in HDFS-6021 (a dup of HDFS-5988). So it seems that this is a different issue. In this bug, FSImageFormatPBINode somehow passes a null inode to FSDirectory. Could you take a look? - Stack trace posted here {noformat} 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276) ... {noformat} - Stack trace posted in HDFS-6021 (a dup of HDFS-5988) {noformat} 2014-02-26 17:03:11,755 FATAL [main] namenode.NameNode (NameNode.java:main(1351)) - Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:227) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:169) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:225) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:802) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:792) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:624) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:593) at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:331) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:251) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:641) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:435) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:647) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:632) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1280) ... {noformat} NPE during namenode upgrade from old release Key: HDFS-6130 URL: https://issues.apache.org/jira/browse/HDFS-6130 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, I can upgrade successfully if I don't configurage HA, but if HA enabled, there is NPE when I run ' hdfs namenode -initializeSharedEdits' {code} 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 60 millis 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/03/20 15:06:41 INFO util.GSet: VM
[jira] [Created] (HDFS-6150) Add inode id information in the logs to make debugging easier
Suresh Srinivas created HDFS-6150: - Summary: Add inode id information in the logs to make debugging easier Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-6150: -- Attachment: HDFS-6150.patch Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-6150: -- Status: Patch Available (was: Open) Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back
[ https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6135: -- Component/s: journal-node Hadoop Flags: Reviewed +1 patch looks good. In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back -- Key: HDFS-6135 URL: https://issues.apache.org/jira/browse/HDFS-6135 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Blocker Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, HDFS-6135.002.patch, HDFS-6135.test.txt While doing HDFS upgrade with HA setup, if the layoutversion gets changed in the upgrade, the rollback may trigger the following exception in JournalNodes (suppose the new software bumped the layoutversion from -55 to -56): {code} 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if roll back possible for one or more JournalNodes. 1 exceptions thrown: Unexpected version of storage directory /grid/1/tmp/journal/mycluster. Reported: -56. Expecting = -55. at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203) at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156) at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73) at org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228) {code} Looks like for rollback JN with old software cannot handle future layoutversion brought by new software. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release
[ https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945653#comment-13945653 ] Haohui Mai commented on HDFS-6130: -- It would be very helpful if the corresponding fsimage is available. NPE during namenode upgrade from old release Key: HDFS-6130 URL: https://issues.apache.org/jira/browse/HDFS-6130 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, I can upgrade successfully if I don't configurage HA, but if HA enabled, there is NPE when I run ' hdfs namenode -initializeSharedEdits' {code} 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 60 millis 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/03/20 15:06:41 INFO util.GSet: VM type = 64-bit 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 275.3 KB 14/03/20 15:06:41 INFO util.GSet: capacity = 2^15 = 32768 entries 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false 14/03/20 15:06:41 INFO common.Storage: Lock on /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO common.Storage: Lock on /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected. 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes. 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360) 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176 / {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5910) Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text
[ https://issues.apache.org/jira/browse/HDFS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945654#comment-13945654 ] Suresh Srinivas commented on HDFS-5910: --- [~benoyantony], any updates on this? Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text Key: HDFS-5910 URL: https://issues.apache.org/jira/browse/HDFS-5910 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.2.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-5910.patch, HDFS-5910.patch It is possible to enable encryption of DataTransferProtocol. In some use cases, it is required to encrypt data transfer with some clients , but communicate in plain text with some other clients and data nodes. A sample use case will be that any data transfer inside a firewall can be in plain text whereas any data transfer from clients outside the firewall needs to be encrypted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6050) NFS does not handle exceptions correctly in a few places
[ https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6050: - Summary: NFS does not handle exceptions correctly in a few places (was: NFS OpenFileCtx does not handle exceptions correctly) NFS does not handle exceptions correctly in a few places Key: HDFS-6050 URL: https://issues.apache.org/jira/browse/HDFS-6050 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brock Noland Assignee: Brandon Li Attachments: HDFS-6050.002.patch, HDFS-6050.patch I noticed this file does not log exceptions appropriately in multiple locations. Not logging the stack of Throwable: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364 Printing exceptions to stderr: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149 Not logging the stack trace: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6151) HDFS should refuse to cache blocks =2GB
Andrew Wang created HDFS-6151: - Summary: HDFS should refuse to cache blocks =2GB Key: HDFS-6151 URL: https://issues.apache.org/jira/browse/HDFS-6151 Project: Hadoop HDFS Issue Type: Bug Components: caching, datanode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang If you try to cache a block that's =2GB, the DN will silently fail to cache it since {{MappedByteBuffer}} uses a signed int to represent size. Blocks this large are rare, but we should log or alert the user somehow. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6050) NFS does not handle exceptions correctly in a few places
[ https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945674#comment-13945674 ] Brandon Li commented on HDFS-6050: -- Thank you, Haohui, for the review. I've committed the patch. NFS does not handle exceptions correctly in a few places Key: HDFS-6050 URL: https://issues.apache.org/jira/browse/HDFS-6050 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brock Noland Assignee: Brandon Li Attachments: HDFS-6050.002.patch, HDFS-6050.patch I noticed this file does not log exceptions appropriately in multiple locations. Not logging the stack of Throwable: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364 Printing exceptions to stderr: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149 Not logging the stack trace: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6050) NFS does not handle exceptions correctly in a few places
[ https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945673#comment-13945673 ] Brandon Li commented on HDFS-6050: -- {quote} preOpAttr is always null here {quote} This may not be true later so I think it's better to keep preOpAttr. NFS does not handle exceptions correctly in a few places Key: HDFS-6050 URL: https://issues.apache.org/jira/browse/HDFS-6050 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brock Noland Assignee: Brandon Li Attachments: HDFS-6050.002.patch, HDFS-6050.patch I noticed this file does not log exceptions appropriately in multiple locations. Not logging the stack of Throwable: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364 Printing exceptions to stderr: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149 Not logging the stack trace: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5840: Attachment: HDFS-5840.001.patch Rebase [~atm]'s patch. Also add storage recovery logic for JN based on the discussion in the comments. Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5840.001.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6050) NFS does not handle exceptions correctly in a few places
[ https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6050: - Issue Type: Improvement (was: Bug) NFS does not handle exceptions correctly in a few places Key: HDFS-6050 URL: https://issues.apache.org/jira/browse/HDFS-6050 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Brock Noland Assignee: Brandon Li Attachments: HDFS-6050.002.patch, HDFS-6050.patch I noticed this file does not log exceptions appropriately in multiple locations. Not logging the stack of Throwable: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364 Printing exceptions to stderr: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149 Not logging the stack trace: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6050) NFS does not handle exceptions correctly in a few places
[ https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6050: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) NFS does not handle exceptions correctly in a few places Key: HDFS-6050 URL: https://issues.apache.org/jira/browse/HDFS-6050 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brock Noland Assignee: Brandon Li Attachments: HDFS-6050.002.patch, HDFS-6050.patch I noticed this file does not log exceptions appropriately in multiple locations. Not logging the stack of Throwable: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364 Printing exceptions to stderr: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149 Not logging the stack trace: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6050) NFS does not handle exceptions correctly in a few places
[ https://issues.apache.org/jira/browse/HDFS-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945680#comment-13945680 ] Hudson commented on HDFS-6050: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5391 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5391/]) HDFS-6050. NFS does not handle exceptions correctly in a few places. Contributed by Brandon Li (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581055) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3FileAttributes.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/Portmap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/mount/RpcProgramMountd.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/AsyncDataService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NFS does not handle exceptions correctly in a few places Key: HDFS-6050 URL: https://issues.apache.org/jira/browse/HDFS-6050 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Brock Noland Assignee: Brandon Li Attachments: HDFS-6050.002.patch, HDFS-6050.patch I noticed this file does not log exceptions appropriately in multiple locations. Not logging the stack of Throwable: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L364 Printing exceptions to stderr: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1160 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1149 Not logging the stack trace: https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L1062 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L966 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L961 https://github.com/apache/hadoop-common/blob/f567f09091368fe800f3f70605da38a69c953fe3/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java#L680 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945683#comment-13945683 ] Suresh Srinivas commented on HDFS-5840: --- [~jingzhao], thanks for doing the patch. Should JN throw an exception, if pre-upgrade or upgrade is retried, that upgrade is already in progress, restart the journal node before attempting namenode upgrade again? Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5840.001.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945688#comment-13945688 ] Jing Zhao commented on HDFS-5138: - Thanks for the quick review, Nicholas! I will commit the backport patch to branch-2 and 2.4. We can continue to fix remaining issues in HDFS-5840 and HDFS-6135. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6149) Running Httpfs UTs with testKerberos profile has failures.
[ https://issues.apache.org/jira/browse/HDFS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6149: --- Summary: Running Httpfs UTs with testKerberos profile has failures. (was: Running UTs with testKerberos profile has failures.) Running Httpfs UTs with testKerberos profile has failures. -- Key: HDFS-6149 URL: https://issues.apache.org/jira/browse/HDFS-6149 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 2.2.0 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 2.3.0 Attachments: HDFS-6149.patch UT failures in TestHttpFSWithKerberos. Tests using testDelegationTokenWithinDoAs fail because of the statically set keytab file. Test testDelegationTokenHttpFSAccess also fails due the incorrect assumption that CANCELDELEGATIONTOKEN does not require credentials. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945710#comment-13945710 ] Hadoop QA commented on HDFS-5846: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636402/hdfs-5846.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6477//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6477//console This message is automatically generated. Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.3.0 Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-5138. - Resolution: Fixed Fix Version/s: 2.4.0 I've merged this to branch-2 and branch-2.4.0. Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back
[ https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6135: Resolution: Fixed Fix Version/s: 2.4.0 Status: Resolved (was: Patch Available) Thanks for the review, Nicholas! I've committed this to trunk, branch-2, and branch-2.4.0. In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back -- Key: HDFS-6135 URL: https://issues.apache.org/jira/browse/HDFS-6135 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Blocker Fix For: 2.4.0 Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, HDFS-6135.002.patch, HDFS-6135.test.txt While doing HDFS upgrade with HA setup, if the layoutversion gets changed in the upgrade, the rollback may trigger the following exception in JournalNodes (suppose the new software bumped the layoutversion from -55 to -56): {code} 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if roll back possible for one or more JournalNodes. 1 exceptions thrown: Unexpected version of storage directory /grid/1/tmp/journal/mycluster. Reported: -56. Expecting = -55. at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203) at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156) at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73) at org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228) {code} Looks like for rollback JN with old software cannot handle future layoutversion brought by new software. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945728#comment-13945728 ] Jing Zhao commented on HDFS-5840: - Thanks for the comments, Suresh! Will update the patch to address the comments. Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5840.001.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
[ https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reopened HDFS-5807: - [~airbots], I found this test failing again in our nightly builds, Can you take a look into it again? {noformat} Error Message Rebalancing expected avg utilization to become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 4 msec. Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2 Key: HDFS-5807 URL: https://issues.apache.org/jira/browse/HDFS-5807 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5807.patch The test times out after some time. {noformat} java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA
[ https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945748#comment-13945748 ] Hudson commented on HDFS-5138: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5392 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5392/]) Move HDFS-5138 to 2.4.0 section in CHANGES.txt (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581074) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Support HDFS upgrade in HA -- Key: HDFS-5138 URL: https://issues.apache.org/jira/browse/HDFS-5138 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Kihwal Lee Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt With HA enabled, NN wo't start with -upgrade. Since there has been a layout version change between 2.0.x and 2.1.x, starting NN in upgrade mode was necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way to get around this was to disable HA and upgrade. The NN and the cluster cannot be flipped back to HA until the upgrade is finalized. If HA is disabled only on NN for layout upgrade and HA is turned back on without involving DNs, things will work, but finaliizeUpgrade won't work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade snapshots won't get removed. We will need a different ways of doing layout upgrade and upgrade snapshot. I am marking this as a 2.1.1-beta blocker based on feedback from others. If there is a reasonable workaround that does not increase maintenance window greatly, we can lower its priority from blocker to critical. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back
[ https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945749#comment-13945749 ] Hudson commented on HDFS-6135: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5392 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5392/]) HDFS-6135. In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581070) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back -- Key: HDFS-6135 URL: https://issues.apache.org/jira/browse/HDFS-6135 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Blocker Fix For: 2.4.0 Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, HDFS-6135.002.patch, HDFS-6135.test.txt While doing HDFS upgrade with HA setup, if the layoutversion gets changed in the upgrade, the rollback may trigger the following exception in JournalNodes (suppose the new software bumped the layoutversion from -55 to -56): {code} 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if roll back possible for one or more JournalNodes. 1 exceptions thrown: Unexpected version of storage directory /grid/1/tmp/journal/mycluster. Reported: -56. Expecting = -55. at org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203) at org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156) at org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202) at org.apache.hadoop.hdfs.qjournal.server.JNStorage.init(JNStorage.java:73) at org.apache.hadoop.hdfs.qjournal.server.Journal.init(Journal.java:142) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228) {code} Looks like for rollback JN with old software cannot handle future layoutversion brought by new software. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945782#comment-13945782 ] Hudson commented on HDFS-5846: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5393 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5393/]) HDFS-5846. Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks efficiently. Contributed by Nikola Vujic. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581091) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnresolvedTopologyException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.3.0 Reporter: Nikola Vujic Assignee: Nikola Vujic Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
[ https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945788#comment-13945788 ] Chen He commented on HDFS-5807: --- Working on it. TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2 Key: HDFS-5807 URL: https://issues.apache.org/jira/browse/HDFS-5807 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5807.patch The test times out after some time. {noformat} java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945790#comment-13945790 ] Larry McCay commented on HDFS-6134: --- Hi [~tucu00] - I like what I see here. We should file jira's for the KeyProvider API work that you mention in your document and discuss some of those aspects there. We have a number of common interests in that area. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
[ https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5846: Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) I committed this to trunk, branch-2 and branch-2.4. Nikola, thank you for contributing this code. Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency - Key: HDFS-5846 URL: https://issues.apache.org/jira/browse/HDFS-5846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.3.0 Reporter: Nikola Vujic Assignee: Nikola Vujic Fix For: 3.0.0, 2.4.0 Attachments: hdfs-5846.patch, hdfs-5846.patch, hdfs-5846.patch Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires careful handling. Null can be returned in two cases: • An error occurred with topology script execution (script crashes). • Script returns wrong number of values (other than expected) Critical handling is in the DN registration code. DN registration code is responsible for assigning proper topology paths to all registered datanodes. Existing code handles this NULL pointer on the following way ({{resolveNetworkLocation}} method): {code} / /resolve its network location ListString rName = dnsToSwitchMapping.resolve(names); String networkLocation; if (rName == null) { LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); networkLocation = NetworkTopology.DEFAULT_RACK; } else { networkLocation = rName.get(0); } return networkLocation; {code} The line of code that is assigning default rack: {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} can cause a serious problem. This means if somehow we got NULL, then the default rack will be assigned as a DN's network location and DN's registration will finish successfully. Under this circumstances, we will be able to load data into cluster which is working with a wrong topology. Wrong topology means that fault domains are not honored. For the end user, it means that two data replicas can end up in the same fault domain and a single failure can cause loss of two, or more, replicas. Cluster would be in the inconsistent state but it would not be aware of that and the whole thing would work as if everything was fine. We can notice that something wrong happened almost only by looking in the log for the error: {code} LOG.error(The resolve call returned null! Using + NetworkTopology.DEFAULT_RACK + for host + names); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6125) Cleanup unnecessary cast in HDFS code base
[ https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945811#comment-13945811 ] Chris Nauroth commented on HDFS-6125: - Thanks for doing these clean-ups, Suresh. I can start code reviewing this. Cleanup unnecessary cast in HDFS code base -- Key: HDFS-6125 URL: https://issues.apache.org/jira/browse/HDFS-6125 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-6125.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945823#comment-13945823 ] Tsz Wo Nicholas Sze commented on HDFS-6150: --- In the existing code, there are two places below that it says Failed to ... + src + on client + clientMachine. It seems saying the src is on the client machine. It should be for the client. Could you also fix it as well? {code} @@ -2292,8 +2292,8 @@ private void startFileInternal(FSPermissionChecker pc, String src, try { if (myFile == null) { if (!create) { - throw new FileNotFoundException(failed to overwrite non-existent file -+ src + on client + clientMachine); + throw new FileNotFoundException(Can't overwrite non-existent + + src + on client + clientMachine); } } else { if (overwrite) { @@ -2306,8 +2306,8 @@ private void startFileInternal(FSPermissionChecker pc, String src, } else { // If lease soft limit time is expired, recover the lease recoverLeaseInternal(myFile, src, holder, clientMachine, false); - throw new FileAlreadyExistsException(failed to create file + src - + on client + clientMachine + because the file exists); + throw new FileAlreadyExistsException(src + on client + + clientMachine + already exists); } } {code} Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6125) Cleanup unnecessary cast in HDFS code base
[ https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6125: Attachment: HDFS-6125.2.patch {{FSEditLogOp}} needed a very minor rebase to apply on current trunk. Instead of going back and forth with comments, I just made the change locally, and I'm uploading the new patch. I'm +1 for patch v2, pending Jenkins. Thanks again, Suresh. Cleanup unnecessary cast in HDFS code base -- Key: HDFS-6125 URL: https://issues.apache.org/jira/browse/HDFS-6125 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-6125.2.patch, HDFS-6125.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945828#comment-13945828 ] Alejandro Abdelnur commented on HDFS-6134: -- [~lmccay], gr8, I have done some work already on this area while prototyping, I'll create a few JIRAs later tonight and put up patches for the stuff I already have. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6125) Cleanup unnecessary cast in HDFS code base
[ https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6125: Component/s: test Target Version/s: 3.0.0, 2.5.0 Affects Version/s: 2.4.0 Hadoop Flags: Reviewed Cleanup unnecessary cast in HDFS code base -- Key: HDFS-6125 URL: https://issues.apache.org/jira/browse/HDFS-6125 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 2.4.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-6125.2.patch, HDFS-6125.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945840#comment-13945840 ] Hadoop QA commented on HDFS-6150: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636419/HDFS-6150.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestLeaseRecovery2 {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6478//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6478//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6478//console This message is automatically generated. Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6124) Add final modifier to class members
[ https://issues.apache.org/jira/browse/HDFS-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945865#comment-13945865 ] Arpit Agarwal commented on HDFS-6124: - +1 for the patch. I will commit it shortly. Add final modifier to class members --- Key: HDFS-6124 URL: https://issues.apache.org/jira/browse/HDFS-6124 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-6124.1.patch, HDFS-6124.patch Many of the member variables declaration for classes are missing final modifier in HDFS. This jira adds final modifier where possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified
Yongjun Zhang created HDFS-6152: --- Summary: distcp V2 doesn't preserve root dir's attributes when -p is specified Key: HDFS-6152 URL: https://issues.apache.org/jira/browse/HDFS-6152 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Two issues were observed with distcpV2 ISSUE 1. when copying a source dir to target dir with -pu option using command distcp -pu source-dir target-dir The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified. There are two scenarios with the above command: a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the last component of the source-dir path in the command line) at target file system, with all contents in source-dir copied to under target-dir/src-dir. The issue in this case is, the attributes of src-dir is not preserved. b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir copied to under target-dir. This issue in this case is, the attributes of source-dir is not carried over to target-dir. For multiple source cases, e.g., command distcp -pu source-dir1 source-dir2 target-dir No matter whether the target-dir exists or not, the multiple sources are copied to under the target dir (target-dir is created if it didn't exist). And their attributes are preserved. ISSUE 2. with the following command: distcp source-dir target-dir when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not copied, actually the command behaves like a no-op. However, when the source-dir is not empty, it would be copied and results in target-dir at the target file system containing a copy of source-dir's children. To be consistent, empty source dir should be copied too. Basically the above distcp command should cause target-dir get created at target file system, and the source-dir's attributes are preserved at target-dir when -p is passed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified
[ https://issues.apache.org/jira/browse/HDFS-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang reassigned HDFS-6152: --- Assignee: Yongjun Zhang distcp V2 doesn't preserve root dir's attributes when -p is specified - Key: HDFS-6152 URL: https://issues.apache.org/jira/browse/HDFS-6152 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Two issues were observed with distcpV2 ISSUE 1. when copying a source dir to target dir with -pu option using command distcp -pu source-dir target-dir The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified. There are two scenarios with the above command: a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the last component of the source-dir path in the command line) at target file system, with all contents in source-dir copied to under target-dir/src-dir. The issue in this case is, the attributes of src-dir is not preserved. b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir copied to under target-dir. This issue in this case is, the attributes of source-dir is not carried over to target-dir. For multiple source cases, e.g., command distcp -pu source-dir1 source-dir2 target-dir No matter whether the target-dir exists or not, the multiple sources are copied to under the target dir (target-dir is created if it didn't exist). And their attributes are preserved. ISSUE 2. with the following command: distcp source-dir target-dir when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not copied, actually the command behaves like a no-op. However, when the source-dir is not empty, it would be copied and results in target-dir at the target file system containing a copy of source-dir's children. To be consistent, empty source dir should be copied too. Basically the above distcp command should cause target-dir get created at target file system, and the source-dir's attributes are preserved at target-dir when -p is passed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified
[ https://issues.apache.org/jira/browse/HDFS-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6152: Attachment: HDFS-6152.001.patch distcp V2 doesn't preserve root dir's attributes when -p is specified - Key: HDFS-6152 URL: https://issues.apache.org/jira/browse/HDFS-6152 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6152.001.patch Two issues were observed with distcpV2 ISSUE 1. when copying a source dir to target dir with -pu option using command distcp -pu source-dir target-dir The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified. There are two scenarios with the above command: a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the last component of the source-dir path in the command line) at target file system, with all contents in source-dir copied to under target-dir/src-dir. The issue in this case is, the attributes of src-dir is not preserved. b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir copied to under target-dir. This issue in this case is, the attributes of source-dir is not carried over to target-dir. For multiple source cases, e.g., command distcp -pu source-dir1 source-dir2 target-dir No matter whether the target-dir exists or not, the multiple sources are copied to under the target dir (target-dir is created if it didn't exist). And their attributes are preserved. ISSUE 2. with the following command: distcp source-dir target-dir when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not copied, actually the command behaves like a no-op. However, when the source-dir is not empty, it would be copied and results in target-dir at the target file system containing a copy of source-dir's children. To be consistent, empty source dir should be copied too. Basically the above distcp command should cause target-dir get created at target file system, and the source-dir's attributes are preserved at target-dir when -p is passed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5840: Attachment: HDFS-5840.002.patch Update the patch to address Suresh's comments. The patch also switch the sequence of the doUpgrade calls on shared edits and on local storage. Now with the patch we have the following: # If the doPreUpgrade call on JNs fail (i.e., not all the JNs succeed), we can restart NN and JNs for recovery, and NN and JNs will go back to the status before upgrade. # If the doUpgrade call on JNs fail, some JNs may have both previous and current directories. Restarting JN cannot solve the issue. The issue has to be manually fixed. But the probability of this kind of failure is relatively low considering the doPreUpgrade call succeeds on all the JNs. Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Blocker Fix For: 3.0.0 Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5910) Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text
[ https://issues.apache.org/jira/browse/HDFS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-5910: --- Attachment: HDFS-5910.patch Thanks [~arpitagarwal] for the comments. I am attaching a new based on some of your comments. I request guidance on #1 and #4. {quote} 1. isSecureOnClient may also want to use the peer's address to make a decision. e.g. intra-cluster transfer vs. distcp to remote cluster. {quote} The ip address of namenode or datanode is not available at some of the client invocations. Please let me know if there is a way to get an ip address.. {quote} 2. Related to #1, isSecureOnClient and isSecureOnServer look awkward. How about replacing both with isTrustedChannel that takes the peer's IP address? We should probably avoid overloading the term secure in this context since there is a related concept ofPeer#hasSecureChannel(). {quote} I have renamed the class to TrustedChannelResolver and function to isTrusted() . {quote} 3. Could you please update the documentation {quote} done {quote} 4. Is the InetAddress.getByName call in DataXceiver#getClientAddress necessary? If it were necessary it would have been a security hole since DNS resolution may yield a different IP address than the one used by the client. It turns out for the kinds of Peers we are interested in this will be an IP address, so let's just remove the call. {quote} I wanted to use InetAddress as the argument to TrustedChannelResolver than a string-ip-address to maintain parity with _SaslPropertiesResolver_. To convert a string ip, I use InetAddress.getByName From the documentation of InetAddress.getByName(String host): The host name can either be a machine name, such as java.sun.com, or a textual representation of its IP address. If a literal IP address is supplied, only the validity of the address format is checked. So basically , if the argument is ip address, getByName doesn't do a DNS check. If there is a different way to get the InetAddress , we can definitely use that. Other option is to not care about the parity with _SaslPropertiesResolver_ and pass the string ip address. Yet another option will be to pass the Peer itself to TrustedChannelResolver so that the custom implementation can take care of getting the ip address etc. Will be great to get your opinion on this. Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text Key: HDFS-5910 URL: https://issues.apache.org/jira/browse/HDFS-5910 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.2.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-5910.patch, HDFS-5910.patch, HDFS-5910.patch It is possible to enable encryption of DataTransferProtocol. In some use cases, it is required to encrypt data transfer with some clients , but communicate in plain text with some other clients and data nodes. A sample use case will be that any data transfer inside a firewall can be in plain text whereas any data transfer from clients outside the firewall needs to be encrypted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5840: Fix Version/s: (was: 3.0.0) Target Version/s: 2.4.0 (was: 3.0.0) Affects Version/s: 2.4.0 Status: In Progress (was: Patch Available) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Blocker Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-5840: --- Assignee: Jing Zhao (was: Aaron T. Myers) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Aaron T. Myers Assignee: Jing Zhao Priority: Blocker Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5840: Status: Patch Available (was: In Progress) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Aaron T. Myers Assignee: Jing Zhao Priority: Blocker Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5840: Component/s: journal-node ha Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: ha, journal-node, namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Aaron T. Myers Assignee: Jing Zhao Priority: Blocker Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6124) Add final modifier to class members
[ https://issues.apache.org/jira/browse/HDFS-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6124: Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Target Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed the patch to trunk, branch-2 and branch-2.4. Thanks for the code cleanup Suresh! Add final modifier to class members --- Key: HDFS-6124 URL: https://issues.apache.org/jira/browse/HDFS-6124 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 3.0.0, 2.4.0 Attachments: HDFS-6124.1.patch, HDFS-6124.patch Many of the member variables declaration for classes are missing final modifier in HDFS. This jira adds final modifier where possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945899#comment-13945899 ] Hadoop QA commented on HDFS-5840: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636424/HDFS-5840.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestSafeMode org.apache.hadoop.hdfs.qjournal.TestNNWithQJM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6479//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6479//console This message is automatically generated. Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: ha, journal-node, namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Aaron T. Myers Assignee: Jing Zhao Priority: Blocker Attachments: HDFS-5840.001.patch, HDFS-5840.002.patch, HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6124) Add final modifier to class members
[ https://issues.apache.org/jira/browse/HDFS-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945912#comment-13945912 ] Hudson commented on HDFS-6124: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5395 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5395/]) HDFS-6124. Add final modifier to class members. (Contributed by Suresh Srinivas) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581124) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockMissingException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/CorruptFileBlockIterator.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSHedgedReadMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DomainSocketFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/StorageType.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DomainPeerServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockLocalPathInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/CorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeLocalInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsFileStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/RollingUpgradeStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PacketReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/RequestInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java *
[jira] [Commented] (HDFS-5910) Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text
[ https://issues.apache.org/jira/browse/HDFS-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945925#comment-13945925 ] Arpit Agarwal commented on HDFS-5910: - Thanks for making the changes Benoy. {quote} The ip address of namenode or datanode is not available at some of the client invocations. Please let me know if there is a way to get an ip address.. {quote} Just for my understanding - lacking the peer's IP address is it your intention to use configuration to decide the client's behavior? I looked through the usages of {{isTrusted}} and some of them already have the connected socket available, so it is fairly easy to query the remote's socket address and pass it to {{isTrusted}}. For the usage in getDataEncryptionKey(), we can refactor to pass a functor as the encryption key to e.g. {{getFileChecksum}}. However I am okay with doing the refactoring in a separate change. We can leave the parameter-less overload of {{isTrusted}} for now and just use it from {{getEcnryptionKey}} and file a separate Jira to fix it. {quote} I wanted to use InetAddress as the argument to TrustedChannelResolver than a string-ip-address to maintain parity with SaslPropertiesResolver. To convert a string ip, I use InetAddress.getByName {quote} Thanks for the explanation. Will [InetAddresses#forString|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/net/InetAddresses.html#forString%28java.lang.String%29] from Guava work for you? I just checked and it's available in our build. Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text Key: HDFS-5910 URL: https://issues.apache.org/jira/browse/HDFS-5910 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.2.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-5910.patch, HDFS-5910.patch, HDFS-5910.patch It is possible to enable encryption of DataTransferProtocol. In some use cases, it is required to encrypt data transfer with some clients , but communicate in plain text with some other clients and data nodes. A sample use case will be that any data transfer inside a firewall can be in plain text whereas any data transfer from clients outside the firewall needs to be encrypted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945938#comment-13945938 ] Suresh Srinivas commented on HDFS-6150: --- New patch fixes the test failures and address the comments. Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.1.patch, HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-6150: -- Attachment: HDFS-6150.1.patch Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.1.patch, HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-6150: -- Attachment: HDFS-6150.2.patch New patch to fix NPE. Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Attachments: HDFS-6150.1.patch, HDFS-6150.2.patch, HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release
[ https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945943#comment-13945943 ] Fengdong Yu commented on HDFS-6130: --- Thanks [~szetszwo]! [~wheat9], do you want only fsimage or both image and edit log? I'll reproduce today using 1.3.0 and the latest trunk, then I'll keep the corresponding fsimage and edit logs. NPE during namenode upgrade from old release Key: HDFS-6130 URL: https://issues.apache.org/jira/browse/HDFS-6130 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, I can upgrade successfully if I don't configurage HA, but if HA enabled, there is NPE when I run ' hdfs namenode -initializeSharedEdits' {code} 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 60 millis 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/03/20 15:06:41 INFO util.GSet: VM type = 64-bit 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 275.3 KB 14/03/20 15:06:41 INFO util.GSet: capacity = 2^15 = 32768 entries 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false 14/03/20 15:06:41 INFO common.Storage: Lock on /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO common.Storage: Lock on /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected. 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes. 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360) 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176 / {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6150) Add inode id information in the logs to make debugging easier
[ https://issues.apache.org/jira/browse/HDFS-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6150: -- Priority: Minor (was: Major) Assignee: Suresh Srinivas Hadoop Flags: Reviewed +1 HDFS-6150.2.patch looks good. Add inode id information in the logs to make debugging easier - Key: HDFS-6150 URL: https://issues.apache.org/jira/browse/HDFS-6150 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Minor Attachments: HDFS-6150.1.patch, HDFS-6150.2.patch, HDFS-6150.patch Inode information and path information are missing in the logs and exceptions. Adding this will help debug multi threading issues related to using incode INode ID information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release
[ https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945988#comment-13945988 ] Haohui Mai commented on HDFS-6130: -- Can you create a checkpoint so that upgrading from the checkpointed fsimage will triggered the bug? NPE during namenode upgrade from old release Key: HDFS-6130 URL: https://issues.apache.org/jira/browse/HDFS-6130 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, I can upgrade successfully if I don't configurage HA, but if HA enabled, there is NPE when I run ' hdfs namenode -initializeSharedEdits' {code} 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 60 millis 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/03/20 15:06:41 INFO util.GSet: VM type = 64-bit 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 275.3 KB 14/03/20 15:06:41 INFO util.GSet: capacity = 2^15 = 32768 entries 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false 14/03/20 15:06:41 INFO common.Storage: Lock on /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO common.Storage: Lock on /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected. 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes. 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360) 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176 / {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6130) NPE during namenode upgrade from old release
[ https://issues.apache.org/jira/browse/HDFS-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945992#comment-13945992 ] Fengdong Yu commented on HDFS-6130: --- OK, no problem, I can using rollingUpgrade -prepare to create check point. NPE during namenode upgrade from old release Key: HDFS-6130 URL: https://issues.apache.org/jira/browse/HDFS-6130 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, I can upgrade successfully if I don't configurage HA, but if HA enabled, there is NPE when I run ' hdfs namenode -initializeSharedEdits' {code} 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 60 millis 14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/03/20 15:06:41 INFO util.GSet: VM type = 64-bit 14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 275.3 KB 14/03/20 15:06:41 INFO util.GSet: capacity = 2^15 = 32768 entries 14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false 14/03/20 15:06:41 INFO common.Storage: Lock on /data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO common.Storage: Lock on /data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176 14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected. 14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes. 14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653) at org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360) 14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1 14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176 / {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified
[ https://issues.apache.org/jira/browse/HDFS-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6152: Description: Two issues were observed with distcpV2 ISSUE 1. when copying a source dir to target dir with -pu option using command distcp -pu source-dir target-dir The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified. There are two scenarios with the above command: a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the last component of the source-dir path in the command line) at target file system, with all contents in source-dir copied to under target-dir/src-dir. The issue in this case is, the attributes of src-dir is not preserved. b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir copied to under target-dir. This issue in this case is, the attributes of source-dir is not carried over to target-dir. For multiple source cases, e.g., command distcp -pu source-dir1 source-dir2 target-dir No matter whether the target-dir exists or not, the multiple sources are copied to under the target dir (target-dir is created if it didn't exist). And their attributes are preserved. ISSUE 2. with the following command: distcp source-dir target-dir when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not copied, actually the command behaves like a no-op. However, when the source-dir is not empty, it would be copied and results in target-dir at the target file system containing a copy of source-dir's children. To be consistent, empty source dir should be copied too. Basically the above distcp command should cause target-dir get created at target file system, and the source-dir's attributes are preserved at target-dir when -p is passed. was: Two issues were observed with distcpV2 ISSUE 1. when copying a source dir to target dir with -pu option using command distcp -pu source-dir target-dir The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified. There are two scenarios with the above command: a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the last component of the source-dir path in the command line) at target file system, with all contents in source-dir copied to under target-dir/src-dir. The issue in this case is, the attributes of src-dir is not preserved. b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir copied to under target-dir. This issue in this case is, the attributes of source-dir is not carried over to target-dir. For multiple source cases, e.g., command distcp -pu source-dir1 source-dir2 target-dir No matter whether the target-dir exists or not, the multiple sources are copied to under the target dir (target-dir is created if it didn't exist). And their attributes are preserved. ISSUE 2. with the following command: distcp source-dir target-dir when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not copied, actually the command behaves like a no-op. However, when the source-dir is not empty, it would be copied and results in target-dir at the target file system containing a copy of source-dir's children. To be consistent, empty source dir should be copied too. Basically the above distcp command should cause target-dir get created at target file system, and the source-dir's attributes are preserved at target-dir when -p is passed. distcp V2 doesn't preserve root dir's attributes when -p is specified - Key: HDFS-6152 URL: https://issues.apache.org/jira/browse/HDFS-6152 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6152.001.patch Two issues were observed with distcpV2 ISSUE 1. when copying a source dir to target dir with -pu option using command distcp -pu source-dir target-dir The source dir's owner is not preserved at target dir. Simiarly other attributes of source dir are not preserved. Supposedly they should be preserved when no -update and no -overwrite specified. There are two scenarios with the above command: a. when target-dir already exists. Issuing the above command will result in target-dir/source-dir (source-dir here refers to the
[jira] [Commented] (HDFS-6125) Cleanup unnecessary cast in HDFS code base
[ https://issues.apache.org/jira/browse/HDFS-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946015#comment-13946015 ] Hadoop QA commented on HDFS-6125: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636455/HDFS-6125.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 43 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6480//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6480//console This message is automatically generated. Cleanup unnecessary cast in HDFS code base -- Key: HDFS-6125 URL: https://issues.apache.org/jira/browse/HDFS-6125 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 2.4.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: HDFS-6125.2.patch, HDFS-6125.patch -- This message was sent by Atlassian JIRA (v6.2#6252)