[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with symlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966272#comment-13966272 ] Tsz Wo Nicholas Sze commented on HDFS-6233: --- Please also change HardLink.createHardLinkMult so that if the command fails, include hardLinkCommand to the exception message. Datanode upgrade in Windows from 1.x to 2.4 fails with symlink error. - Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:722) 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
[jira] [Created] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
Chris Nauroth created HDFS-6234: --- Summary: TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. Key: HDFS-6234 URL: https://issues.apache.org/jira/browse/HDFS-6234 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due to an invalid URI configured in {{dfs.datanode.data.dir}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
[ https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6234: Status: Patch Available (was: Open) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. -- Key: HDFS-6234 URL: https://issues.apache.org/jira/browse/HDFS-6234 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6234.1.patch {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due to an invalid URI configured in {{dfs.datanode.data.dir}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
[ https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6234: Attachment: HDFS-6234.1.patch I'm attaching a patch that sets a valid URI in {{dfs.datanode.data.dir}}. While I was in here, I also made some minor changes to make sure every created {{DataNode}} gets shut down. I ran the test successfully on Mac and Windows with this patch. TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. -- Key: HDFS-6234 URL: https://issues.apache.org/jira/browse/HDFS-6234 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6234.1.patch {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due to an invalid URI configured in {{dfs.datanode.data.dir}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6232: Attachment: HDFS-6232.patch OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966282#comment-13966282 ] Akira AJISAKA commented on HDFS-6232: - I reproduced the error. It occurs because {{XMLUtils.addSaxString}} can't handle null ACL entry name. The name is an optional value, so it can be null. I attached a patch to add null check. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6232: Status: Patch Available (was: Open) OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.4.0, 3.0.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
Chris Nauroth created HDFS-6235: --- Summary: TestFileJournalManager can fail on Windows due to file locking if tests run out of order. Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6235: Attachment: HDFS-6235.1.patch The easiest thing to do here is simply to make sure that each test in the suite uses a unique storage directory. That way, there is no chance of collision on locked files between multiple tests in the suite. At the end of the test suite, all of these file handles will get released automatically during process exit. I'm attaching a patch that changes the storage directory names to match the names of the individual tests. TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6235: Priority: Trivial (was: Major) TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6235: Status: Patch Available (was: Open) TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966298#comment-13966298 ] Jing Zhao commented on HDFS-6235: - +1 pending Jenkins. TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6229) Race condition in failover can cause RetryCache fail to work
[ https://issues.apache.org/jira/browse/HDFS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966308#comment-13966308 ] Suresh Srinivas commented on HDFS-6229: --- +1 for the patch. Thanks Jing for working on this. Race condition in failover can cause RetryCache fail to work Key: HDFS-6229 URL: https://issues.apache.org/jira/browse/HDFS-6229 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-6229.000.patch, retrycache-race.patch Currently when NN failover happens, the old SBN first sets its state to active, then starts the active services (including tailing all the remaining editlog and building a complete retry cache based on the editlog). If a retry request, which has already succeeded in the old ANN (but the client fails to receive the response), comes in between, this retry may still get served by the new ANN but miss the retry cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
[ https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966348#comment-13966348 ] Hadoop QA commented on HDFS-6234: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639740/HDFS-6234.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6650//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6650//console This message is automatically generated. TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. -- Key: HDFS-6234 URL: https://issues.apache.org/jira/browse/HDFS-6234 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6234.1.patch {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due to an invalid URI configured in {{dfs.datanode.data.dir}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966358#comment-13966358 ] Hadoop QA commented on HDFS-6232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639741/HDFS-6232.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1293 javac compiler warnings (more than the trunk's current 1287 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6651//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6651//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6651//console This message is automatically generated. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966361#comment-13966361 ] Colin Patrick McCabe commented on HDFS-6110: {code} + public static final int DFS_SLOW_IO_WARNING_THRESHOLD_DEFAULT = 300; {code} It's odd that this is an int, given that we retrieve the threshold as a long later on. This seems likely to lead to confusion-- can we just make this a long everywhere? +1 after that's addressed adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966367#comment-13966367 ] Hadoop QA commented on HDFS-6235: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639744/HDFS-6235.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.fsdataset.TestAvailableSpaceVolumeChoosingPolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6652//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6652//console This message is automatically generated. TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6231) DFSClient hangs infinitely if using hedged reads and all eligible datanodes die.
[ https://issues.apache.org/jira/browse/HDFS-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966406#comment-13966406 ] Hudson commented on HDFS-6231: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #537 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/537/]) HDFS-6231. DFSClient hangs infinitely if using hedged reads and all eligible datanodes die. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586551) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java DFSClient hangs infinitely if using hedged reads and all eligible datanodes die. Key: HDFS-6231 URL: https://issues.apache.org/jira/browse/HDFS-6231 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6231.1.patch When using hedged reads, and all eligible datanodes for the read get flagged as dead or ignored, then the client is supposed to refetch block locations from the NameNode to retry the read. Instead, we've seen that the client can hang indefinitely. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6224) Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent
[ https://issues.apache.org/jira/browse/HDFS-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966416#comment-13966416 ] Hudson commented on HDFS-6224: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #537 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/537/]) Undo accidental FSNamesystem change introduced in HDFS-6224 commit. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586515) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java HDFS-6224. Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent. Contributed by Charles Lamb. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586490) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent --- Key: HDFS-6224 URL: https://issues.apache.org/jira/browse/HDFS-6224 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6224.001.patch, HDFS-6224.002.patch, HDFS-6224.003.patch, HDFS-6224.004.patch Add a unit test which verifies behavior of HADOOP-9155. Specifically, ensure that during a setPermission operation the permission returned is the one that was just set, not the permission before the operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message
[ https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966411#comment-13966411 ] Hudson commented on HDFS-5669: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #537 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/537/]) HDFS-5669. Storage#tryLock() should check for null before logging successfull message. Contributed by Vinayakumar B (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586392) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java Storage#tryLock() should check for null before logging successfull message -- Key: HDFS-5669 URL: https://issues.apache.org/jira/browse/HDFS-5669 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 3.0.0, 2.5.0 Attachments: HDFS-5669.patch, HDFS-5669.patch In the following code in Storage#tryLock(), there is a possibility that {{file.getChannel().tryLock()}} returns null if the lock is acquired by some other process. In that case even though return value is null, a successfull message confuses. {code}try { res = file.getChannel().tryLock(); file.write(jvmName.getBytes(Charsets.UTF_8)); LOG.info(Lock on + lockF + acquired by nodename + jvmName); } catch(OverlappingFileLockException oe) {{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6203) check other namenode's state before HAadmin transitionToActive
[ https://issues.apache.org/jira/browse/HDFS-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HDFS-6203. -- Resolution: Duplicate check other namenode's state before HAadmin transitionToActive -- Key: HDFS-6203 URL: https://issues.apache.org/jira/browse/HDFS-6203 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.3.0 Reporter: patrick white Assignee: Kihwal Lee Current behavior is that the HAadmin -transitionToActive command will complete the transition to Active even if the other namenode is already in Active state. This is not an allowed condition and should be handled by fencing, however setting both namenode's active can happen accidentally with relative ease, especially in a production environment when performing manual maintenance operations. If this situation does occur it is very serious and will likely cause data loss, or best case, require a difficult recovery to avoid data loss. This is requesting an enhancement to haadmin's -transitionToActive command, to have HAadmin check the Active state of the other namenode before completing the transition. If the other namenode is Active, then fail the request due to other nn already-active. Not sure if there is a scenario where both namenode's being Active is valid or desired, but to maintain functional compatibility a 'force' parameter could be added to override this check and allow previous behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain
[ https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-2949: Assignee: Kihwal Lee HA: Add check to active state transition to prevent operator-induced split brain Key: HDFS-2949 URL: https://issues.apache.org/jira/browse/HDFS-2949 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: Kihwal Lee Currently, if the administrator mistakenly calls -transitionToActive on one NN while the other one is still active, all hell will break loose. We can add a simple check by having the NN make a getServiceState() RPC to its peer with a short (~1 second?) timeout. If the RPC succeeds and indicates the other node is active, it should refuse to enter active mode. If the RPC fails or indicates standby, it can proceed. This is just meant as a preventative safety check - we still expect users to use the -failover command which has other checks plus fencing built in. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain
[ https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-2949: - Target Version/s: 2.5.0 (was: 0.24.0) HA: Add check to active state transition to prevent operator-induced split brain Key: HDFS-2949 URL: https://issues.apache.org/jira/browse/HDFS-2949 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 0.24.0 Reporter: Todd Lipcon Assignee: Kihwal Lee Currently, if the administrator mistakenly calls -transitionToActive on one NN while the other one is still active, all hell will break loose. We can add a simple check by having the NN make a getServiceState() RPC to its peer with a short (~1 second?) timeout. If the RPC succeeds and indicates the other node is active, it should refuse to enter active mode. If the RPC fails or indicates standby, it can proceed. This is just meant as a preventative safety check - we still expect users to use the -failover command which has other checks plus fencing built in. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966479#comment-13966479 ] Akira AJISAKA commented on HDFS-6232: - The javac warnings in the test code are due to the use of {{com.sun.org.apache.xml.internal.serialize.XMLSerializer}} and {{com.sun.org.apache.xml.internal.serialize.OutputFormat}}. They are already used by OfflineEditsViewer. I suggest to use Xerces instead. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6224) Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent
[ https://issues.apache.org/jira/browse/HDFS-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966505#comment-13966505 ] Hudson commented on HDFS-6224: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1729 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1729/]) Undo accidental FSNamesystem change introduced in HDFS-6224 commit. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586515) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java HDFS-6224. Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent. Contributed by Charles Lamb. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586490) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent --- Key: HDFS-6224 URL: https://issues.apache.org/jira/browse/HDFS-6224 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6224.001.patch, HDFS-6224.002.patch, HDFS-6224.003.patch, HDFS-6224.004.patch Add a unit test which verifies behavior of HADOOP-9155. Specifically, ensure that during a setPermission operation the permission returned is the one that was just set, not the permission before the operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966582#comment-13966582 ] Akira AJISAKA commented on HDFS-6232: - bq. I suggest to use Xerces instead. I tried to use {{org.apache.xml.serialize.XMLSerializer}} in Apache Xerces, but it was deprecated. I'm thinking we should use another library or write our own code. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message
[ https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966613#comment-13966613 ] Hudson commented on HDFS-5669: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1754/]) HDFS-5669. Storage#tryLock() should check for null before logging successfull message. Contributed by Vinayakumar B (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586392) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java Storage#tryLock() should check for null before logging successfull message -- Key: HDFS-5669 URL: https://issues.apache.org/jira/browse/HDFS-5669 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 3.0.0, 2.5.0 Attachments: HDFS-5669.patch, HDFS-5669.patch In the following code in Storage#tryLock(), there is a possibility that {{file.getChannel().tryLock()}} returns null if the lock is acquired by some other process. In that case even though return value is null, a successfull message confuses. {code}try { res = file.getChannel().tryLock(); file.write(jvmName.getBytes(Charsets.UTF_8)); LOG.info(Lock on + lockF + acquired by nodename + jvmName); } catch(OverlappingFileLockException oe) {{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6224) Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent
[ https://issues.apache.org/jira/browse/HDFS-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966618#comment-13966618 ] Hudson commented on HDFS-6224: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1754/]) Undo accidental FSNamesystem change introduced in HDFS-6224 commit. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586515) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java HDFS-6224. Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent. Contributed by Charles Lamb. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586490) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java Add a unit test to TestAuditLogger for file permissions passed to logAuditEvent --- Key: HDFS-6224 URL: https://issues.apache.org/jira/browse/HDFS-6224 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6224.001.patch, HDFS-6224.002.patch, HDFS-6224.003.patch, HDFS-6224.004.patch Add a unit test which verifies behavior of HADOOP-9155. Specifically, ensure that during a setPermission operation the permission returned is the one that was just set, not the permission before the operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6231) DFSClient hangs infinitely if using hedged reads and all eligible datanodes die.
[ https://issues.apache.org/jira/browse/HDFS-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966608#comment-13966608 ] Hudson commented on HDFS-6231: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1754 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1754/]) HDFS-6231. DFSClient hangs infinitely if using hedged reads and all eligible datanodes die. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586551) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java DFSClient hangs infinitely if using hedged reads and all eligible datanodes die. Key: HDFS-6231 URL: https://issues.apache.org/jira/browse/HDFS-6231 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6231.1.patch When using hedged reads, and all eligible datanodes for the read get flagged as dead or ignored, then the client is supposed to refetch block locations from the NameNode to retry the read. Instead, we've seen that the client can hang indefinitely. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966638#comment-13966638 ] Daryn Sharp commented on HDFS-6143: --- bq. [...] It may not work since the seek offset is missing in open. There is no way to calculate the redirect. I have an internal 0.23 patch that I'm reworking for 2.x that will actually use the block locations for opens - HDFS-6221. The current incarnation fetches block locations for just the offset when the http connection stream is opened, but it could easily be changed to fetch all the block locations when the webhdfs open is called - which will elicit a FNF - and then use the locations for a given offset when the http connection stream is opened. bq. Short term, looking at the ByteRangeInputStream, it's inefficient in that for even a single byte forward seek (seek(getPos()+1), it closes the connection and re-opens it [...] read-ahead for short range seeks, which is a lot more efficient Yes, I've already tinkered with fixing this very problem. Internally we found that a fraction of jobs actually perform seeks after the split offset seek, and those that did seek would only do so maybe 1-2 times so it was deemed a low priority fix. WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths Key: HDFS-6143 URL: https://issues.apache.org/jira/browse/HDFS-6143 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths. - 'open', does not really open anything, i.e., it does not contact the server, and therefore cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] is an example of the code that's broken because of this. - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
[ https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966655#comment-13966655 ] Hudson commented on HDFS-6234: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5500 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5500/]) HDFS-6234. TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586682) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeConfig.java TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. -- Key: HDFS-6234 URL: https://issues.apache.org/jira/browse/HDFS-6234 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6234.1.patch {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due to an invalid URI configured in {{dfs.datanode.data.dir}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6214) Webhdfs has poor throughput for files 2GB
[ https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966656#comment-13966656 ] Kihwal Lee commented on HDFS-6214: -- Related discussion at http://stackoverflow.com/questions/9031311/slow-transfers-in-jetty-with-chunked-transfer-encoding-at-certain-buffer-size So, if io.file.buffer.size is small enough, like 4K (the default), it may be overall slower, but there will be no difference for files 2GB. Do you know what the response buffer size is for this type of webhdfs responses from datanodes? Webhdfs has poor throughput for files 2GB -- Key: HDFS-6214 URL: https://issues.apache.org/jira/browse/HDFS-6214 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-6214.patch For the DN's open call, jetty returns a Content-Length header for files 2GB, and uses chunking for files 2GB. A bug in jetty's buffer handling results in a ~8X reduction in throughput. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6234) TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
[ https://issues.apache.org/jira/browse/HDFS-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6234: Resolution: Fixed Fix Version/s: 2.4.1 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this to trunk, branch-2 and branch-2.4. Thanks for the review, Jing! TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path. -- Key: HDFS-6234 URL: https://issues.apache.org/jira/browse/HDFS-6234 Project: Hadoop HDFS Issue Type: Bug Components: datanode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6234.1.patch {{TestDatanodeConfig#testMemlockLimit}} fails to initialize a {{DataNode}} due to an invalid URI configured in {{dfs.datanode.data.dir}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1393#comment-1393 ] Chris Nauroth commented on HDFS-6235: - The failure in {{TestAvailableSpaceVolumeChoosingPolicy}} looks unrelated, and I can't repro it. I'm going to commit this. TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6214) Webhdfs has poor throughput for files 2GB
[ https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1392#comment-1392 ] Kihwal Lee commented on HDFS-6214: -- Is flush() necessary for the non-chunked case? Wouldn't it hurt performance in some cases? Webhdfs has poor throughput for files 2GB -- Key: HDFS-6214 URL: https://issues.apache.org/jira/browse/HDFS-6214 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-6214.patch For the DN's open call, jetty returns a Content-Length header for files 2GB, and uses chunking for files 2GB. A bug in jetty's buffer handling results in a ~8X reduction in throughput. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6235: Resolution: Fixed Fix Version/s: 2.4.1 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Jing, thank you for the code review. I committed this to trunk, branch-2 and branch-2.4. TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6235) TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
[ https://issues.apache.org/jira/browse/HDFS-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966677#comment-13966677 ] Hudson commented on HDFS-6235: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5502 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5502/]) HDFS-6235. TestFileJournalManager can fail on Windows due to file locking if tests run out of order. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586692) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java TestFileJournalManager can fail on Windows due to file locking if tests run out of order. - Key: HDFS-6235 URL: https://issues.apache.org/jira/browse/HDFS-6235 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6235.1.patch {{TestFileJournalManager}} has multiple tests that reuse the same storage directory: /filejournaltest2. The last test in the suite intentionally leaves a file open to test behavior of an unclosed edit log. On some environments though, tests within a suite execute out of order. In this case, a lock is still held on /filejournaltest2, and subsequent tests fail trying to delete the directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6189) Multiple HDFS tests fail on Windows attempting to use a test root path containing a colon.
[ https://issues.apache.org/jira/browse/HDFS-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966682#comment-13966682 ] Colin Patrick McCabe commented on HDFS-6189: Thanks for fixing this, [~cnauroth]. Multiple HDFS tests fail on Windows attempting to use a test root path containing a colon. -- Key: HDFS-6189 URL: https://issues.apache.org/jira/browse/HDFS-6189 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.4.1 Attachments: HDFS-6189.1.patch Some HDFS tests are attempting to use a test root path based on the test.root.dir that we've defined for use on the local file system. This doesn't work on Windows because of the drive spec, i.e. C:. HDFS rejects paths containing a colon as invalid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6168) Remove deprecated methods in DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966692#comment-13966692 ] Tsz Wo Nicholas Sze commented on HDFS-6168: --- When talking about API compatibility, it is only about public APIs but not private (or LimitedPrivate) APIs. DistributedFileSystem is similar to DFSClient. They are not public APIs. User applications should not use them directly. If they do, they should expect to change their code across releases since these classes are unstable. BTW, the methods removed in this JIRA were deprecated for a long time. Remove deprecated methods in DistributedFileSystem -- Key: HDFS-6168 URL: https://issues.apache.org/jira/browse/HDFS-6168 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Fix For: 2.5.0 Attachments: h6168_20140327.patch, h6168_20140327b.patch Some methods in DistributedFileSystem are already deprecated for a long time. They should be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
[ https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966723#comment-13966723 ] Mit Desai commented on HDFS-2734: - I see that that there is no activity on this Jira since a long time. [~andreina], Is this still reproducible on your side? If this is still an issue, can you provide the information [~qwertymaniac] requested? For the analysis that Harsh did, I think this is not reproducable on his side and I have not seen anyone else raising this concern. In that case, if I do not hear back by 4/17/14, I will go ahead and close this issue as Not A Problem. -Mit Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered Key: HDFS-2734 URL: https://issues.apache.org/jira/browse/HDFS-2734 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1, 0.23.0 Reporter: J.Andreina Priority: Minor Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6229) Race condition in failover can cause RetryCache fail to work
[ https://issues.apache.org/jira/browse/HDFS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6229: Resolution: Fixed Fix Version/s: 2.4.1 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Suresh! I've committed this to trunk, branch-2, and branch-2.4. Race condition in failover can cause RetryCache fail to work Key: HDFS-6229 URL: https://issues.apache.org/jira/browse/HDFS-6229 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.4.1 Attachments: HDFS-6229.000.patch, retrycache-race.patch Currently when NN failover happens, the old SBN first sets its state to active, then starts the active services (including tailing all the remaining editlog and building a complete retry cache based on the editlog). If a retry request, which has already succeeded in the old ANN (but the client fails to receive the response), comes in between, this retry may still get served by the new ANN but miss the retry cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6229) Race condition in failover can cause RetryCache fail to work
[ https://issues.apache.org/jira/browse/HDFS-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966777#comment-13966777 ] Hudson commented on HDFS-6229: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5503 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5503/]) HDFS-6229. Race condition in failover can cause RetryCache fail to work. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586714) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RetryCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java Race condition in failover can cause RetryCache fail to work Key: HDFS-6229 URL: https://issues.apache.org/jira/browse/HDFS-6229 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.4.1 Attachments: HDFS-6229.000.patch, retrycache-race.patch Currently when NN failover happens, the old SBN first sets its state to active, then starts the active services (including tailing all the remaining editlog and building a complete retry cache based on the editlog). If a retry request, which has already succeeded in the old ANN (but the client fails to receive the response), comes in between, this retry may still get served by the new ANN but miss the retry cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966843#comment-13966843 ] Colin Patrick McCabe commented on HDFS-6232: It looks like you are trying to make {{XMLUtils#addSaxString}} treat a {{null}} value as equal to the empty string. This seems confusing to me, since when we read the XML file and create edits again, we can't distinguish between null and the empty string. While it's possible that null and empty really are interchangeable here, I would rather leave it to the caller to make this determination. Why not just do the simple thing and fix the ACL code so it doesn't pass {{null}} to this function? bq. I tried to use org.apache.xml.serialize.XMLSerializer in Apache Xerces, but it was deprecated. I'm thinking we should use another library or write our own code. Yeah. The deprecation warnings aren't relevant to this JIRA. Check out HDFS-4629 if you're interested in a solution to that. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6227) Short circuit read failed due to ClosedChannelException
[ https://issues.apache.org/jira/browse/HDFS-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966878#comment-13966878 ] Colin Patrick McCabe commented on HDFS-6227: bq. I quickly checked the code, in ShortCircuitCache#unref, we close the replica when the ref count is 0, but I did not find the corresponding code to remove the replica object. Thus is it possible that the cause of the issue is a closed ShortCircuitReplica object was still retrieved from the ShortCircuitCache and used for reading? Colin Patrick McCabe, could you provide some input? That should not be possible. See this code in {{ShortCircuitCache#unref}}: {code} int newRefCount = --replica.refCount; if (newRefCount == 0) { // Close replica, since there are no remaining references to it. Preconditions.checkArgument(replica.purged, Replica + replica + reached a refCount of 0 without + being purged); replica.close(); {code} Notice that we would throw a precondition exception if the replica hadn't been purged. (Purged means that it has been removed from the cache and will not be handed out to new readers.) There is no other path to calling {{ShortCircuitReplica#close}}. Can you say a little bit more about the platform and version that you saw this on? Can you reproduce it? Also, are there any other messages in the log? Short circuit read failed due to ClosedChannelException --- Key: HDFS-6227 URL: https://issues.apache.org/jira/browse/HDFS-6227 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jing Zhao While running tests in a single node cluster, where short circuit read is enabled and multiple threads may read the same file concurrently, one of the read got ClosedChannelException and failed. Full exception trace see comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3128) Unit tests should not use a test root in /tmp
[ https://issues.apache.org/jira/browse/HDFS-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966914#comment-13966914 ] Tsz Wo Nicholas Sze commented on HDFS-3128: --- However, other parts of this change altered where tests created files in HDFS MiniDFSClusters. These parts were not needed,... These changes actually are incorrect since test.build.data is a conf for the local file system. It should not be used in HDFS MiniDFSClusters anyway. Overall, I think this problem is bound to keep occurring until we get Windows build slaves. ... Sure, if there were Windows Jenkins builds, the bug would be caught earlier. Or if the contributors/reviewers could spend more time to understand the code first, the bug would also be caught. Simply searching and replacing the code without first understanding them is something that we should avoid. Unit tests should not use a test root in /tmp - Key: HDFS-3128 URL: https://issues.apache.org/jira/browse/HDFS-3128 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Eli Collins Assignee: Andrew Wang Priority: Minor Fix For: 2.4.0 Attachments: hdfs-3128-1.patch Saw this on jenkins, TestResolveHdfsSymlink#testFcResolveAfs creates /tmp/alpha which interferes with other executors on the same machine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-241) ch{mod,own,grp} -R to do recursion at the name node
[ https://issues.apache.org/jira/browse/HDFS-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966929#comment-13966929 ] Colin Patrick McCabe commented on HDFS-241: --- +1 for closing this. The more recursive / long-running RPCs we support, the harder it will be to scale the NN. ch{mod,own,grp} -R to do recursion at the name node --- Key: HDFS-241 URL: https://issues.apache.org/jira/browse/HDFS-241 Project: Hadoop HDFS Issue Type: New Feature Reporter: Robert Chansler Performance. No need to maintain {{distch}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6208) DataNode caching can leak file descriptors.
[ https://issues.apache.org/jira/browse/HDFS-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966952#comment-13966952 ] Colin Patrick McCabe commented on HDFS-6208: thanks for this fix, Chris DataNode caching can leak file descriptors. --- Key: HDFS-6208 URL: https://issues.apache.org/jira/browse/HDFS-6208 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6208.1.patch In the DataNode, management of mmap'd/mlock'd block files can leak file descriptors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6232: Attachment: HDFS-6232.2.patch OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.2.patch, HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966977#comment-13966977 ] Akira AJISAKA commented on HDFS-6232: - Thanks Colin for the comment. I updated the patch to distinguish null and the empty string. bq. Yeah. The deprecation warnings aren't relevant to this JIRA. Check out HDFS-4629 if you're interested in a solution to that. Thanks again for the linking. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Attachments: HDFS-6232.2.patch, HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6233: -- Summary: Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. (was: Datanode upgrade in Windows from 1.x to 2.4 fails with symlink error.) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. -- Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:722) 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
[jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI
[ https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967118#comment-13967118 ] Mohammad Kamrul Islam commented on HDFS-6180: - bq. The changes turn out to be a lot bigger than I anticipated. It might be risky to put it in at the very last moment. Moving it to a blocker of 2.5.0. What about for the release 2.4.1? It could be coming soon. dead node count / listing is very broken in JMX and old GUI --- Key: HDFS-6180 URL: https://issues.apache.org/jira/browse/HDFS-6180 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Haohui Mai Priority: Blocker Fix For: 2.5.0 Attachments: HDFS-6180.000.patch, HDFS-6180.001.patch, HDFS-6180.002.patch, HDFS-6180.003.patch, HDFS-6180.004.patch, dn.log After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on the new GUI, but showed up properly in the datanodes tab. Some nodes are also being double reported in the deadnode and inservice section (22 show up dead, 565 show up alive, 9 duplicated nodes). From /jmx (confirmed that it's the same in jconsole): {noformat} { name : Hadoop:service=NameNode,name=FSNamesystemState, modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem, CapacityTotal : 5477748687372288, CapacityUsed : 24825720407, CapacityRemaining : 5477723861651881, TotalLoad : 565, SnapshotStats : {\SnapshottableDirectories\:0,\Snapshots\:0}, BlocksTotal : 21065, MaxObjects : 0, FilesTotal : 25454, PendingReplicationBlocks : 0, UnderReplicatedBlocks : 0, ScheduledReplicationBlocks : 0, FSState : Operational, NumLiveDataNodes : 565, NumDeadDataNodes : 0, NumDecomLiveDataNodes : 0, NumDecomDeadDataNodes : 0, NumDecommissioningDataNodes : 0, NumStaleDataNodes : 1 }, {noformat} I'm not going to include deadnode/livenodes because the list is huge, but I've confirmed there are 9 nodes showing up in both deadnodes and livenodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967145#comment-13967145 ] Arpit Agarwal commented on HDFS-6233: - +1 from me, however perhaps it will be appropriate for another committer to +1 it too. I've tested the updated patch on OS X and JDK and it fixes the hang. Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. -- Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:722) 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at
[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967147#comment-13967147 ] Hadoop QA commented on HDFS-6233: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639865/HDFS-6233.02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6654//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6654//console This message is automatically generated. Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. -- Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at
[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967153#comment-13967153 ] Jing Zhao commented on HDFS-6233: - +1 the patch looks good to me. Thanks for the fix [~cnauroth] and [~arpitagarwal]! Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. -- Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:722) 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:722)
[jira] [Comment Edited] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967145#comment-13967145 ] Arpit Agarwal edited comment on HDFS-6233 at 4/11/14 10:05 PM: --- +1 from me, however perhaps it will be appropriate for another committer to +1 it too. I've tested the updated patch on OS X and Windows and it fixes the hang. was (Author: arpitagarwal): +1 from me, however perhaps it will be appropriate for another committer to +1 it too. I've tested the updated patch on OS X and JDK and it fixes the hang. Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. -- Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:722) 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at
[jira] [Updated] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6232: --- Resolution: Fixed Fix Version/s: 2.4.1 Status: Resolved (was: Patch Available) committed. Thanks, Akira. OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Fix For: 2.4.1 Attachments: HDFS-6232.2.patch, HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6232) OfflineEditsViewer throws a NPE on edits containing ACL modifications
[ https://issues.apache.org/jira/browse/HDFS-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967213#comment-13967213 ] Hudson commented on HDFS-6232: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5507 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5507/]) HDFS-6232. OfflineEditsViewer throws a NPE on edits containing ACL modifications (ajisakaa via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1586790) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java OfflineEditsViewer throws a NPE on edits containing ACL modifications - Key: HDFS-6232 URL: https://issues.apache.org/jira/browse/HDFS-6232 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 3.0.0, 2.4.0 Reporter: Stephen Chu Assignee: Akira AJISAKA Fix For: 2.4.1 Attachments: HDFS-6232.2.patch, HDFS-6232.patch The OfflineEditsViewer using the XML parser will through a NPE when using an edit with a SET_ACL op. {code} [root@hdfs-nfs current]# hdfs oev -i edits_001-007 -o fsedits.out 14/04/10 14:14:18 ERROR offlineEditsViewer.OfflineEditsBinaryLoader: Got RuntimeException at position 505 Encountered exception. Exiting: null java.lang.NullPointerException at org.apache.hadoop.hdfs.util.XMLUtils.mangleXmlString(XMLUtils.java:122) at org.apache.hadoop.hdfs.util.XMLUtils.addSaxString(XMLUtils.java:193) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.appendAclEntriesToXml(FSEditLogOp.java:4085) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.access$3300(FSEditLogOp.java:132) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$SetAclOp.toXml(FSEditLogOp.java:3528) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp.outputToXml(FSEditLogOp.java:3928) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:116) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:80) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237) [root@hdfs-nfs current]# {code} This is reproducible by setting an acl on a file and then running the OEV on the editsinprogress file. The stats and binary parsers run OK. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk
[ https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967252#comment-13967252 ] Jing Zhao edited comment on HDFS-4114 at 4/11/14 11:50 PM: --- bq. There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira. Sounds good to me. There are still a couple of methods that need to be cleaned, such as endCheckpoint. But we can do it in subsequent jiras. The 001 patch looks good to me as a first step. +1. was (Author: jingzhao): bq. There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira. Sounds good to me. There are still a couple of methods that need to be cleaned, such as endCheckpoint. But we can do it in subsequence jiras. The 001 patch looks good to me as a first step. +1. Remove the BackupNode and CheckpointNode from trunk --- Key: HDFS-4114 URL: https://issues.apache.org/jira/browse/HDFS-4114 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Assignee: Suresh Srinivas Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the BackupNode and CheckpointNode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk
[ https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967252#comment-13967252 ] Jing Zhao commented on HDFS-4114: - bq. There are a couple TODOs that require changes in the protobuf files. I'll clean them up in a subsequent jira. Sounds good to me. There are still a couple of methods that need to be cleaned, such as endCheckpoint. But we can do it in subsequence jiras. The 001 patch looks good to me as a first step. +1. Remove the BackupNode and CheckpointNode from trunk --- Key: HDFS-4114 URL: https://issues.apache.org/jira/browse/HDFS-4114 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Assignee: Suresh Srinivas Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the BackupNode and CheckpointNode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4114) Remove the BackupNode and CheckpointNode from trunk
[ https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967261#comment-13967261 ] Haohui Mai commented on HDFS-4114: -- Thanks [~jingzhao] for the review. I'll commit it to trunk this weekend. Remove the BackupNode and CheckpointNode from trunk --- Key: HDFS-4114 URL: https://issues.apache.org/jira/browse/HDFS-4114 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Assignee: Suresh Srinivas Attachments: HDFS-4114.000.patch, HDFS-4114.001.patch, HDFS-4114.patch Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the BackupNode and CheckpointNode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.
Chris Nauroth created HDFS-6236: --- Summary: ImageServlet should use Time#monotonicNow to measure latency. Key: HDFS-6236 URL: https://issues.apache.org/jira/browse/HDFS-6236 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency measurements and pass them to the metrics system. It would be preferrable to use {{Time#monotonicNow}} so that we're using the most precise available system timer, and we're not subject to odd bugs that could result in negative latency measurements, like resetting the system clock. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.
[ https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6236: Attachment: HDFS-6236.1.patch I noticed this due to a failure in {{TestCheckpoint#testCheckpoint}} on Windows. There are assertions that the metrics are greater than zero, and these assertions were failing. Windows in particular has a low-precision implementation of {{System#currentTimeMillis}}. This patch fixes the issue. Tests are passing consistently for me on both Mac and Windows now. I also discovered a problem inside {{TestCheckpoint#testReformatNNBetweenCheckpoints}}. Restarting a new {{MiniDFSCluster}} ended up trying to use the same storage directory as a {{SecondaryNameNode}} that was intentionally left running in the background. The new cluster would fail during initialization due to file locks while trying to delete the storage directory. To solve this, I've cloned the configuration and set a different storage dir for use by the 2NN. ImageServlet should use Time#monotonicNow to measure latency. - Key: HDFS-6236 URL: https://issues.apache.org/jira/browse/HDFS-6236 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-6236.1.patch {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency measurements and pass them to the metrics system. It would be preferrable to use {{Time#monotonicNow}} so that we're using the most precise available system timer, and we're not subject to odd bugs that could result in negative latency measurements, like resetting the system clock. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.
[ https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6236: Status: Patch Available (was: Open) ImageServlet should use Time#monotonicNow to measure latency. - Key: HDFS-6236 URL: https://issues.apache.org/jira/browse/HDFS-6236 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-6236.1.patch {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency measurements and pass them to the metrics system. It would be preferrable to use {{Time#monotonicNow}} so that we're using the most precise available system timer, and we're not subject to odd bugs that could result in negative latency measurements, like resetting the system clock. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.
[ https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967284#comment-13967284 ] Haohui Mai commented on HDFS-6236: -- +1 pending jenkins. ImageServlet should use Time#monotonicNow to measure latency. - Key: HDFS-6236 URL: https://issues.apache.org/jira/browse/HDFS-6236 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-6236.1.patch {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency measurements and pass them to the metrics system. It would be preferrable to use {{Time#monotonicNow}} so that we're using the most precise available system timer, and we're not subject to odd bugs that could result in negative latency measurements, like resetting the system clock. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6233) Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error.
[ https://issues.apache.org/jira/browse/HDFS-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967341#comment-13967341 ] Tsz Wo Nicholas Sze commented on HDFS-6233: --- If there are bugs in getLinkMultArgLength but the bugs do not affect upgrade, we may fix them separately. Datanode upgrade in Windows from 1.x to 2.4 fails with hardlink error. -- Key: HDFS-6233 URL: https://issues.apache.org/jira/browse/HDFS-6233 Project: Hadoop HDFS Issue Type: Bug Components: datanode, tools Affects Versions: 2.4.0 Environment: Windows Reporter: Huan Huang Assignee: Arpit Agarwal Attachments: HDFS-6233.01.patch, HDFS-6233.02.patch I try to upgrade Hadoop from 1.x and 2.4, but DataNode failed to start due to hard link exception. Repro steps: *Installed Hadoop 1.x *hadoop dfsadmin -safemode enter *hadoop dfsadmin -saveNamespace *hadoop namenode -finalize *Stop all services *Uninstall Hadoop 1.x *Install Hadoop 2.4 *Start namenode with -upgrade option *Try to start datanode, begin to see Hardlink exception in datanode service log. {code} 2014-04-10 22:47:11,655 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8010: starting 2014-04-10 22:47:11,656 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-04-10 22:47:11,999 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -56 2014-04-10 22:47:12,008 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on d:\hadoop\data\hdfs\dn\in_use.lock acquired by nodename 7268@myhost 2014-04-10 22:47:12,011 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory D:\hadoop\data\hdfs\dn from previous upgrade 2014-04-10 22:47:12,017 INFO org.apache.hadoop.hdfs.server.common.Storage: Upgrading storage directory d:\hadoop\data\hdfs\dn. old LV = -44; old CTime = 0. new LV = -55; new CTime = 1397168400373 2014-04-10 22:47:12,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-39008719-10.0.0.1-1397168400092 directory d:\hadoop\data\hdfs\dn\current\BP-39008719-10.0.0.1-1397168400092\current 2014-04-10 22:47:12,254 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 java.io.IOException: Usage: hardlink create [LINKNAME] [FILENAME] |Incorrect command line arguments. at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:479) at org.apache.hadoop.fs.HardLink.createHardLinkMult(HardLink.java:416) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:816) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkAllBlocks(DataStorage.java:759) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doUpgrade(DataStorage.java:566) at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:486) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:722) 2014-04-10 22:47:12,258 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool registering (Datanode Uuid unassigned) service to myhost/10.0.0.1:8020 2014-04-10 22:47:12,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)
[jira] [Commented] (HDFS-6236) ImageServlet should use Time#monotonicNow to measure latency.
[ https://issues.apache.org/jira/browse/HDFS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967367#comment-13967367 ] Hadoop QA commented on HDFS-6236: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639894/HDFS-6236.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6655//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6655//console This message is automatically generated. ImageServlet should use Time#monotonicNow to measure latency. - Key: HDFS-6236 URL: https://issues.apache.org/jira/browse/HDFS-6236 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: HDFS-6236.1.patch {{ImageServlet}} currently uses {{Time#now}} to get file transfer latency measurements and pass them to the metrics system. It would be preferrable to use {{Time#monotonicNow}} so that we're using the most precise available system timer, and we're not subject to odd bugs that could result in negative latency measurements, like resetting the system clock. -- This message was sent by Atlassian JIRA (v6.2#6252)