[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure
[ https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197810#comment-14197810 ] Hudson commented on HDFS-7218: -- FAILURE: Integrated in Hadoop-trunk-Commit #6449 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6449/]) HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. (clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt FSNamesystem ACL operations should write to audit log on failure Key: HDFS-7218 URL: https://issues.apache.org/jira/browse/HDFS-7218 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch Various Acl methods in FSNamesystem do not write to the audit log when the operation is not successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7359: Attachment: HDFS-7359.1.patch Here is a patch that fixes the bug by catching the error in {{GetJournalEditServlet}}. I considered just removing the addition of the SecondaryNameNode principal, since I've never heard of this usage in practice. However, I suppose it would be considered a backwards-incompatible change if someone out there was running a non-HA cluster and just had chosen to offload edits to the JournalNodes for consumption by the SecondaryNameNode. Catching it is probably the safer change. {{TestSecureNNWithQJM}} is a new test suite that covers usage of QJM in a secured cluster. While I was working on this, I also spotted a typo in {{TestNNWithQJM}}, which I'm correcting in this patch. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7359: Status: Patch Available (was: Open) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure
[ https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7218: - Resolution: Fixed Fix Version/s: 2.6.0 Target Version/s: 2.6.0 (was: 2.7.0) Status: Resolved (was: Patch Available) Commit to trunk, branch-2, branch-2.6 Thanks Charles for contribution and Chris for review. FSNamesystem ACL operations should write to audit log on failure Key: HDFS-7218 URL: https://issues.apache.org/jira/browse/HDFS-7218 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch Various Acl methods in FSNamesystem do not write to the audit log when the operation is not successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()
[ https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197886#comment-14197886 ] Konstantin Boudnik commented on HDFS-7333: -- +1 patch looks good (hopefully, my expertise is sufficient for approving this?) Improve log message in Storage.tryLock() Key: HDFS-7333 URL: https://issues.apache.org/jira/browse/HDFS-7333 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: logging.patch Confusing log message in Storage.tryLock(). It talks about namenode, while this is a common part of NameNode and DataNode storage. The log message should include the directory path and the exception. Also fix the long line in tryLock(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197934#comment-14197934 ] Hadoop QA commented on HDFS-7347: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679475/HDFS-7347-20141104.patch against trunk revision 73068f6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestBlockStoragePolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8654//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8654//console This message is automatically generated. Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7347-20141104.patch HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197961#comment-14197961 ] Hadoop QA commented on HDFS-7279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679470/HDFS-7279.006.patch against trunk revision 73068f6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.security.ssl.TestReloadingX509TrustManager org.apache.hadoop.hdfs.TestFetchImage org.apache.hadoop.hdfs.TestRollingUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8653//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8653//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8653//console This message is automatically generated. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197962#comment-14197962 ] Hadoop QA commented on HDFS-7359: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679492/HDFS-7359.1.patch against trunk revision 73e6012. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8655//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8655//console This message is automatically generated. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198297#comment-14198297 ] Hudson commented on HDFS-7334: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/734/]) HDFS-7334. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures. Contributed by Charles Lamb. (wheat9: rev d0449bd2fd0b03765bef78b2d7952b799f06575b) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
[ https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198303#comment-14198303 ] Hudson commented on HDFS-7233: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/734/]) HDFS-7233. NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException. Contributed by Rushabh S Shah. (jing9: rev 5bd3a569f941ffcfc425a55288bec78a37a75aa1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException --- Key: HDFS-7233 URL: https://issues.apache.org/jira/browse/HDFS-7233 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.6.0 Attachments: HDFS-7233.patch Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS. Each time a symlink is accessed the NN will throw an UnresolvedPathException to have the client resolve it. This shouldn't be logged in the NN log and we could have really large NN logs if we don't fix this since every MR job on the cluster will access this symlink and cause a stacktrace to be logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure
[ https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198305#comment-14198305 ] Hudson commented on HDFS-7218: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/734/]) HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. (clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt FSNamesystem ACL operations should write to audit log on failure Key: HDFS-7218 URL: https://issues.apache.org/jira/browse/HDFS-7218 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch Various Acl methods in FSNamesystem do not write to the audit log when the operation is not successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198307#comment-14198307 ] Hudson commented on HDFS-7356: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/734/]) HDFS-7356. Use DirectoryListing.hasMore() directly in nfs. Contributed by Li Lu. (jing9: rev 27f106e2261d0dfdb04e3d08dfd84ca4fdfad244) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Haohui Mai Assignee: Li Lu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7356-110414.patch In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198304#comment-14198304 ] Hudson commented on HDFS-7355: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/734/]) HDFS-7355. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Contributed by Chris Nauroth. (wheat9: rev 99d710348a20ff99044207df4b92ab3bff31bd69) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 2.6.0 Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198308#comment-14198308 ] Hudson commented on HDFS-7340: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #734 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/734/]) HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.6.0 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198405#comment-14198405 ] Hudson commented on HDFS-7356: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/]) HDFS-7356. Use DirectoryListing.hasMore() directly in nfs. Contributed by Li Lu. (jing9: rev 27f106e2261d0dfdb04e3d08dfd84ca4fdfad244) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Haohui Mai Assignee: Li Lu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7356-110414.patch In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
[ https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198401#comment-14198401 ] Hudson commented on HDFS-7233: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/]) HDFS-7233. NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException. Contributed by Rushabh S Shah. (jing9: rev 5bd3a569f941ffcfc425a55288bec78a37a75aa1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException --- Key: HDFS-7233 URL: https://issues.apache.org/jira/browse/HDFS-7233 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.6.0 Attachments: HDFS-7233.patch Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS. Each time a symlink is accessed the NN will throw an UnresolvedPathException to have the client resolve it. This shouldn't be logged in the NN log and we could have really large NN logs if we don't fix this since every MR job on the cluster will access this symlink and cause a stacktrace to be logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198402#comment-14198402 ] Hudson commented on HDFS-7355: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/]) HDFS-7355. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Contributed by Chris Nauroth. (wheat9: rev 99d710348a20ff99044207df4b92ab3bff31bd69) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 2.6.0 Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure
[ https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198403#comment-14198403 ] Hudson commented on HDFS-7218: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/]) HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. (clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java FSNamesystem ACL operations should write to audit log on failure Key: HDFS-7218 URL: https://issues.apache.org/jira/browse/HDFS-7218 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch Various Acl methods in FSNamesystem do not write to the audit log when the operation is not successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198395#comment-14198395 ] Hudson commented on HDFS-7334: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/]) HDFS-7334. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures. Contributed by Charles Lamb. (wheat9: rev d0449bd2fd0b03765bef78b2d7952b799f06575b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198406#comment-14198406 ] Hudson commented on HDFS-7340: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1923 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1923/]) HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.6.0 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
[ https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198474#comment-14198474 ] Rushabh S Shah commented on HDFS-7233: -- Thanks [~jingzhao] for committing the patch. NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException --- Key: HDFS-7233 URL: https://issues.apache.org/jira/browse/HDFS-7233 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.6.0 Attachments: HDFS-7233.patch Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS. Each time a symlink is accessed the NN will throw an UnresolvedPathException to have the client resolve it. This shouldn't be logged in the NN log and we could have really large NN logs if we don't fix this since every MR job on the cluster will access this symlink and cause a stacktrace to be logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198496#comment-14198496 ] Hudson commented on HDFS-7356: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/]) HDFS-7356. Use DirectoryListing.hasMore() directly in nfs. Contributed by Li Lu. (jing9: rev 27f106e2261d0dfdb04e3d08dfd84ca4fdfad244) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Haohui Mai Assignee: Li Lu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7356-110414.patch In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure
[ https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198494#comment-14198494 ] Hudson commented on HDFS-7218: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/]) HDFS-7218. FSNamesystem ACL operations should write to audit log on failure. (clamb via yliu) (yliu: rev 73e601259fed0646f115b09112995b51ffef3468) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogger.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt FSNamesystem ACL operations should write to audit log on failure Key: HDFS-7218 URL: https://issues.apache.org/jira/browse/HDFS-7218 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch Various Acl methods in FSNamesystem do not write to the audit log when the operation is not successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
[ https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198492#comment-14198492 ] Hudson commented on HDFS-7233: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/]) HDFS-7233. NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException. Contributed by Rushabh S Shah. (jing9: rev 5bd3a569f941ffcfc425a55288bec78a37a75aa1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException --- Key: HDFS-7233 URL: https://issues.apache.org/jira/browse/HDFS-7233 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Fix For: 2.6.0 Attachments: HDFS-7233.patch Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS. Each time a symlink is accessed the NN will throw an UnresolvedPathException to have the client resolve it. This shouldn't be logged in the NN log and we could have really large NN logs if we don't fix this since every MR job on the cluster will access this symlink and cause a stacktrace to be logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198497#comment-14198497 ] Hudson commented on HDFS-7340: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/]) HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.6.0 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198493#comment-14198493 ] Hudson commented on HDFS-7355: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/]) HDFS-7355. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Contributed by Chris Nauroth. (wheat9: rev 99d710348a20ff99044207df4b92ab3bff31bd69) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 2.6.0 Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198486#comment-14198486 ] Hudson commented on HDFS-7334: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1948 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1948/]) HDFS-7334. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures. Contributed by Charles Lamb. (wheat9: rev d0449bd2fd0b03765bef78b2d7952b799f06575b) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198707#comment-14198707 ] Konstantin Shvachko commented on HDFS-7335: --- I am +1. TestBalancerWithNodeGroup failure is not related to the patch. Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Attachments: HDFS-7335.patch, HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7347: Attachment: HDFS-7347-20141105.patch This patch extends {{TestBlockStoragePolicy}} to be aware of the new {{EC}} policy. Thanks [~vinayrpet] for reviewing. [~jingzhao] Does the patch look OK to you (in the context of this HDFS-EC branch)? Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7335: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) I just committed this. Congratulations Milan! Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Fix For: 2.7.0 Attachments: HDFS-7335.patch, HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198736#comment-14198736 ] Hudson commented on HDFS-7335: -- FAILURE: Integrated in Hadoop-trunk-Commit #6452 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6452/]) HDFS-7335. Redundant checkOperation() in FSN.analyzeFileState(). Contributed by Milan Desai. (shv: rev 6e8722e49c29a19dd13e161001d2464bb1f22189) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Fix For: 2.7.0 Attachments: HDFS-7335.patch, HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198742#comment-14198742 ] Konstantin Shvachko commented on HDFS-7357: --- I don't see this patch committed to trunk. Only to branch-2. FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7333) Improve log message in Storage.tryLock()
[ https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7333: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) I just committed this. Improve log message in Storage.tryLock() Key: HDFS-7333 URL: https://issues.apache.org/jira/browse/HDFS-7333 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: logging.patch Confusing log message in Storage.tryLock(). It talks about namenode, while this is a common part of NameNode and DataNode storage. The log message should include the directory path and the exception. Also fix the long line in tryLock(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7335: -- Hadoop Flags: Reviewed Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Fix For: 2.7.0 Attachments: HDFS-7335.patch, HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7333) Improve log message in Storage.tryLock()
[ https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7333: -- Hadoop Flags: Reviewed Improve log message in Storage.tryLock() Key: HDFS-7333 URL: https://issues.apache.org/jira/browse/HDFS-7333 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: logging.patch Confusing log message in Storage.tryLock(). It talks about namenode, while this is a common part of NameNode and DataNode storage. The log message should include the directory path and the exception. Also fix the long line in tryLock(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198755#comment-14198755 ] Colin Patrick McCabe commented on HDFS-3107: Thanks, I will take a look at HDFS-7056. I suppose this means we can mark HDFS-7341 as a duplicate. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()
[ https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198760#comment-14198760 ] Hudson commented on HDFS-7333: -- FAILURE: Integrated in Hadoop-trunk-Commit #6453 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6453/]) HDFS-7333. Improve logging in Storage.tryLock(). Contributed by Konstantin Shvachko. (shv: rev 203c63030f625866e220656a8efdf05109dc7627) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve log message in Storage.tryLock() Key: HDFS-7333 URL: https://issues.apache.org/jira/browse/HDFS-7333 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: logging.patch Confusing log message in Storage.tryLock(). It talks about namenode, while this is a common part of NameNode and DataNode storage. The log message should include the directory path and the exception. Also fix the long line in tryLock(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7336) Unused member DFSInputStream.buffersize
[ https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198759#comment-14198759 ] Konstantin Shvachko commented on HDFS-7336: --- And there is an unused import of AtomicLong. Unused member DFSInputStream.buffersize --- Key: HDFS-7336 URL: https://issues.apache.org/jira/browse/HDFS-7336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.5.1 Reporter: Konstantin Shvachko {{DFSInputStream.buffersize}} is not used anywhere in the stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198765#comment-14198765 ] Colin Patrick McCabe commented on HDFS-7314: HDFS-7314-2.patch just seems to rename {{abort}} to {{abortOpenFiles}}. What I was suggesting was creating a separate function, different from {{abort}}, which the {{LeaseRenewer}} would call. Actually, looking at it, I wonder if the lease renewer can just call {{closeAllFilesBeingWritten}}? I haven't looked at it in detail so maybe there's something else the lease renewer needs to do, but this at least looks like a good start. We don't need all this {{boolean removeFromFactory}} stuff. {{getInstance}} will re-add the {{DFSClient}} to the map later if needed. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198766#comment-14198766 ] Haohui Mai commented on HDFS-7357: -- Thanks for the heads up -- I just pushed the missing commit to trunk. FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7199: --- Summary: DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception (was: DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception - Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198775#comment-14198775 ] Hudson commented on HDFS-7357: -- FAILURE: Integrated in Hadoop-trunk-Commit #6454 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6454/]) HDFS-7357. FSNamesystem.checkFileProgress should log file path. Contributed by Tsz Wo Nicholas Sze. (wheat9: rev 18312804e9c86c0ea6a259e288994fea6fa366ef) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7325) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll
[ https://issues.apache.org/jira/browse/HDFS-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7325: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll - Key: HDFS-7325 URL: https://issues.apache.org/jira/browse/HDFS-7325 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7325.001.patch Currently ByteArrayManager wakes all waiting threads whenever a byte array is released and count == limit. However, only one thread can proceed.With a large number of waiters, this will cause a thundering herd problem. (See http://en.wikipedia.org/wiki/Thundering_herd_problem.) We should avoid this by only waking a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7325) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll
[ https://issues.apache.org/jira/browse/HDFS-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198776#comment-14198776 ] Colin Patrick McCabe commented on HDFS-7325: bq. The above should be =. One tricky thing here is that the patch moves this block after the {{numAllocated--}}. So I believe this should be correct... bq. How about simply including the change in HDFS-7358 and resolving this? OK. Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll - Key: HDFS-7325 URL: https://issues.apache.org/jira/browse/HDFS-7325 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7325.001.patch Currently ByteArrayManager wakes all waiting threads whenever a byte array is released and count == limit. However, only one thread can proceed.With a large number of waiters, this will cause a thundering herd problem. (See http://en.wikipedia.org/wiki/Thundering_herd_problem.) We should avoid this by only waking a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198779#comment-14198779 ] Haohui Mai commented on HDFS-7359: -- It looks to me that simply removing the checks is equivalent to the current proposed patch, correct? NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198790#comment-14198790 ] Jing Zhao commented on HDFS-7359: - Removing that several lines means we no longer recognize SNN as a valid requestor. I guess in some scenario (maybe even in the future) we can still allow SNN to download journals from JN. The current patch looks good to me. +1. I will commit it shortly. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198791#comment-14198791 ] Colin Patrick McCabe commented on HDFS-7017: bq. Unfortunately we are the 0.0001% users who disable memory overcommit. And we also observed the std::bad_alloc in our stress test. So it is important to not let the library die in this case and give the application opportunity to handle it. For instance the C API of libhdfs3 return an error flag and set errno to ENOMEM and the application will abort the query to free the memory. Thanks, [~wangzw]. That is an interesting data point. Turning off memory overcommit tends not to work too well on UNIX, since when an application tries to fork(), the memory required doubles briefly. The new child process may never use any of that memory reservation (and copy-on-write means the overhead may be 0), but the system can't know that at the time the {{fork}} call is made. Even if the next thing the process wants to do is exec() a tiny program, a strict no-overcommit system (like Linux with certain configurations) will deny the fork(). This happens a lot in Hadoop because our big Java processes fork and exec small utility programs like groups, id, and so forth. We have been gradually adding JNI versions for all these use-cases, but some still remain. bq. You are right about that we should write some log instead of exit the lease renewer thread quietly. Adding another try ... catch block is a good suggestion. +1. [~wheat9], did you want to look at this before it gets committed? Let me know, otherwise I'll commit in a day or two. Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7279: - Attachment: HDFS-7279.007.patch Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7359: Attachment: HDFS-7359.2.patch Here is patch v2. We need one more change in {{ImageServlet}} to prevent the problem from happening during bootstrapStandby. bq. It looks to me that simply removing the checks is equivalent to the current proposed patch, correct? bq. Removing that several lines means we no longer recognize SNN as a valid requestor. I guess in some scenario (maybe even in the future) we can still allow SNN to download journals from JN. Thanks for reviewing, Haohui and Jing. Right, doing it this way preserves existing behavior if anyone out there is trying to use the SNN as requestor. It would be a little odd to do this, and I haven't seen it in practice, but I think it would be a backwards-incompatible change if we dropped it. Jing, are you still +1 for the v2 patch (pending fresh Jenkins run)? NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198811#comment-14198811 ] Hadoop QA commented on HDFS-7279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679588/HDFS-7279.007.patch against trunk revision 1831280. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8657//console This message is automatically generated. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198816#comment-14198816 ] Jing Zhao commented on HDFS-7359: - Thanks for the update, Chris! For ImageServlet I have a question. Because ImageServlet is also used by Secondary NN for checkpointing. With the change in v2 is it possible that we can no longer detect wrong configuration for SNN during the startup? NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7329) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails.
[ https://issues.apache.org/jira/browse/HDFS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong reassigned HDFS-7329: Assignee: Byron Wong MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails. --- Key: HDFS-7329 URL: https://issues.apache.org/jira/browse/HDFS-7329 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Byron Wong Labels: newbie When createNameNodesAndSetConf() call fails MiniDFSCluster logs an ERROR. Would be good to add the actual exception in the log. Otherwise the actual reason of the failure is obscured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198831#comment-14198831 ] Jing Zhao commented on HDFS-7347: - Yeah, the patch looks good to me. +1 Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198834#comment-14198834 ] Zhe Zhang commented on HDFS-7347: - [~jingzhao] Thanks for the review. Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198839#comment-14198839 ] Chris Nauroth commented on HDFS-7359: - That's a good question. I believe we'll still have debugging information in that case thanks to this code in {{ImageServlet}}: {code} LOG.info(ImageServlet rejecting: + remoteUser); {code} {code} if (UserGroupInformation.isSecurityEnabled() !isValidRequestor(context, request.getUserPrincipal().getName(), conf)) { String errorMsg = Only Namenode, Secondary Namenode, and administrators may access + this servlet; response.sendError(HttpServletResponse.SC_FORBIDDEN, errorMsg); LOG.warn(Received non-NN/SNN/administrator request for image or edits from + request.getUserPrincipal().getName() + at + request.getRemoteHost()); throw new IOException(errorMsg); } {code} I guess another possibility would be to change the new debug log message in the catch block to warn level and include the values of {{DFS_SECONDARY_NAMENODE_KERBEROS_PRINCIPAL_KEY}} and {{DFS_NAMENODE_SECONDARY_HTTP_ADDRESS_KEY}}. Let me know your thoughts, and if necessary, I can upload a v3. Thanks again! NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7330) Unclosed RandomAccessFile warnings in FSDatasetIml.
[ https://issues.apache.org/jira/browse/HDFS-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai reassigned HDFS-7330: - Assignee: Milan Desai Unclosed RandomAccessFile warnings in FSDatasetIml. --- Key: HDFS-7330 URL: https://issues.apache.org/jira/browse/HDFS-7330 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie RandomAccessFile is opened as an underline file for FileInputStream. It should be closed when the stream is closed. So to fix these 2 warning (in getBlockInputStream() and getTmpInputStreams()) we just need suppress them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7199: --- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Committed to trunk and 2.7. Thanks, [~shahrs87]. DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception - Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Fix For: 2.7.0 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198855#comment-14198855 ] Haohui Mai commented on HDFS-7017: -- I'm with Colin in terms of {{std::bad_alloc}}. At this point I'm more concerned about the correctness of the code. Taking care on {{std::bad_alloc}} seems a pretty low priority to me, and it is still up to debate whether the exception itself should be used. Took a quick skim of the code. Some comments: {code} + +class LeaseRenewer { +public: +LeaseRenewer(); +virtual ~LeaseRenewer(); + +virtual void StartRenew(shared_ptrFileSystemImpl filesystem) = 0; +virtual void StopRenew(shared_ptrFileSystemImpl filesystem) = 0; + +public: +static LeaseRenewer GetLeaseRenewer(); +static void CreateSingleton(); + +private: +LeaseRenewer(const LeaseRenewer other); +LeaseRenewer operator=(const LeaseRenewer other); + +static once_flag once; +static shared_ptrLeaseRenewer renewer; +}; + {code} It might be better to expose an {{instance()}} method directly in the class to reflect the fact this is a singleton. {code} + +LeaseRenewer::LeaseRenewer() { +} + {code} This is dead code. {code} +LeaseRenewerImpl::~LeaseRenewerImpl() { +stop = true; +cond.notify_all(); + +if (worker.joinable()) { +worker.join(); +} +} + {code} It looks like the above code will never execute as the LeaseRenewerImpl never get freed. {code} +class LeaseRenewerImpl : public LeaseRenewer { +public: +LeaseRenewerImpl(); +~LeaseRenewerImpl(); +int getInterval() const; +void setInterval(int interval); +void StartRenew(shared_ptrFileSystemImpl filesystem); +void StopRenew(shared_ptrFileSystemImpl filesystem); + +private: +void renewer(); + +private: +LeaseRenewerImpl(const LeaseRenewerImpl other); +LeaseRenewerImpl operator=(const LeaseRenewerImpl other); + +atomicbool stop; +condition_variable cond; +int interval; +mutex mut; +std::mapstd::string, shared_ptrFileSystemImpl maps; +thread worker; +}; +} {code} Since {{LeaseRenewer}} is a private class / interface, it works better to combine {{LeaseRenewer}} and {{LeaseRenewerImpl}} {code} +void OutputStreamImpl::append(const char *buf, int64_t size) { {code} should {{size}} be unsigned? What is maximum value of the size? {code} +void OutputStreamImpl::completeFile(bool throwError) { {code} You can return a {{Status}} object and let the caller to decide whether to throw the exception. {code} +shared_ptrPacket PacketPool::getPacket(int pktSize, int chunksPerPkt, + int64_t offsetInBlock, int64_t seqno, + int checksumSize) { +if (packets.empty()) { +return shared_ptrPacket(new Packet( +pktSize, chunksPerPkt, offsetInBlock, seqno, checksumSize)); +} else { +shared_ptrPacket retval = packets.front(); +packets.pop_front(); +retval-reset(pktSize, chunksPerPkt, offsetInBlock, seqno, + checksumSize); +return retval; +} +} {code} The pool might need to block to guard against overcommit (it can be addressed in a separate jira). And to really avoid the cost of allocation, the pool needs to be backed by untyped arenas. I suggest removing it for now to simplify the code. Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-7199. Resolution: Fixed Fix Version/s: (was: 2.7.0) 2.6.0 Committed to 2.6 DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception - Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reopened HDFS-7199: DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception - Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7360) Test libhdfs3 against MiniDFSCluster
Haohui Mai created HDFS-7360: Summary: Test libhdfs3 against MiniDFSCluster Key: HDFS-7360 URL: https://issues.apache.org/jira/browse/HDFS-7360 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Priority: Critical Currently the branch has enough code to interact with HDFS servers. We should test the code against MiniDFSCluster to ensure the correctness of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198866#comment-14198866 ] Rushabh S Shah commented on HDFS-7199: -- Thanks [~cmccabe] for reviewing and committing the patch. DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception - Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7199) DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198869#comment-14198869 ] Hudson commented on HDFS-7199: -- FAILURE: Integrated in Hadoop-trunk-Commit #6455 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6455/]) HDFS-7199. DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception (rushabhs via cmccabe) (cmccabe: rev 56257fab1d5a7f66bebd9149c7df0436c0a57adb) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CHANGES.txt. Move HDFS-7199 to branch-2.6 (cmccabe: rev 7b07acb0a51d20550f62ba29bf09120684b4097b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DFSOutputStream should not silently drop data if DataStreamer crashes with an unchecked exception - Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Fix For: 2.6.0 Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198867#comment-14198867 ] Jing Zhao commented on HDFS-7359: - bq. I guess another possibility would be to change the new debug log message in the catch block to warn level and include the values of DFS_SECONDARY_NAMENODE_KERBEROS_PRINCIPAL_KEY and DFS_NAMENODE_SECONDARY_HTTP_ADDRESS_KEY. Yeah, that will be helpful for debugging the issue. +1 after this change. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7359: Attachment: HDFS-7359.3.patch Here is patch v3 with the improved logging. I still retained logging of the full stack trace at debug level in case we ever need to find that. Thanks again, Jing. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7325) Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll
[ https://issues.apache.org/jira/browse/HDFS-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198900#comment-14198900 ] Tsz Wo Nicholas Sze commented on HDFS-7325: --- One tricky thing here is that the patch moves this block after the numAllocated--. ... Ah, you are correct. We actually do not need the if since numAllocated maxAllocated is always true. Prevent thundering herd problem in ByteArrayManager by using notify not notifyAll - Key: HDFS-7325 URL: https://issues.apache.org/jira/browse/HDFS-7325 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7325.001.patch Currently ByteArrayManager wakes all waiting threads whenever a byte array is released and count == limit. However, only one thread can proceed.With a large number of waiters, this will cause a thundering herd problem. (See http://en.wikipedia.org/wiki/Thundering_herd_problem.) We should avoid this by only waking a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7357: -- Description: There is a log message in FSNamesystem.checkFileProgress for incomplete blocks. However, the log message does not include the file path. (was: There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path.) FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.7.0 Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for incomplete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198928#comment-14198928 ] Jitendra Nath Pandey commented on HDFS-7359: +1 NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198998#comment-14198998 ] Ming Ma commented on HDFS-7314: --- Thanks, Colin. Here are more explanations for the changes. Please let me know your thoughts. Appreciate your input. 1. {{abort}} is only used for this scenario. After we have {{LeaseRenewer}} call {{abortOpenFiles}}, {{abort}} won't be called by any functions. 2. In addition to have {{DFSClient}} call {{closeAllFilesBeingWritten}}, {{LeaseRenewer}} also needs to remove the {{DFSClient}} from its list via {{dfsclients.remove(dfsc);}} so that {{DFSClient}} doesn't renew release when there are no files opened. This is achieved via {{LeaseRenewer}}'s {{closeClient}}. 3. Whether {{LeaseRenewer}} should be removed from the factory when it gets SocketTimeoutException. Given {{LeaseRenewer}} thread won't exit when it gets SocketTimeoutException as part of the fix, if {{LeaseRenewer}} object is removed from the factory, then it could leak the {{LeaseRenewer}} thread even though the old {{LeaseRenewer}} object isn't used by other objects. In reality, {{LeaseRenewer}} won't be removed from the factory inside {{closeClient}} given given {{isRenewerExpired()}} will return false. So {{removeFromFactory}} is there mostly for the semantics, not necessary. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7336) Unused member DFSInputStream.buffersize
[ https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai reassigned HDFS-7336: - Assignee: Milan Desai Unused member DFSInputStream.buffersize --- Key: HDFS-7336 URL: https://issues.apache.org/jira/browse/HDFS-7336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai {{DFSInputStream.buffersize}} is not used anywhere in the stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7336) Unused member DFSInputStream.buffersize
[ https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7336: -- Attachment: HDFS-7336.patch Removed buffersize parameter from DFSInputStream and DFSClient constructor/method signatures and fixed side effects. Unused member DFSInputStream.buffersize --- Key: HDFS-7336 URL: https://issues.apache.org/jira/browse/HDFS-7336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Attachments: HDFS-7336.patch {{DFSInputStream.buffersize}} is not used anywhere in the stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7336) Unused member DFSInputStream.buffersize
[ https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7336: -- Status: Patch Available (was: In Progress) Unused member DFSInputStream.buffersize --- Key: HDFS-7336 URL: https://issues.apache.org/jira/browse/HDFS-7336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Attachments: HDFS-7336.patch {{DFSInputStream.buffersize}} is not used anywhere in the stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7336) Unused member DFSInputStream.buffersize
[ https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7336 started by Milan Desai. - Unused member DFSInputStream.buffersize --- Key: HDFS-7336 URL: https://issues.apache.org/jira/browse/HDFS-7336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Attachments: HDFS-7336.patch {{DFSInputStream.buffersize}} is not used anywhere in the stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7336) Unused member DFSInputStream.buffersize
[ https://issues.apache.org/jira/browse/HDFS-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199042#comment-14199042 ] Hadoop QA commented on HDFS-7336: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679618/HDFS-7336.patch against trunk revision bc80251. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8663//console This message is automatically generated. Unused member DFSInputStream.buffersize --- Key: HDFS-7336 URL: https://issues.apache.org/jira/browse/HDFS-7336 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Attachments: HDFS-7336.patch {{DFSInputStream.buffersize}} is not used anywhere in the stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7329) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails.
[ https://issues.apache.org/jira/browse/HDFS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HDFS-7329: - Status: Patch Available (was: Open) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails. --- Key: HDFS-7329 URL: https://issues.apache.org/jira/browse/HDFS-7329 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Byron Wong Labels: newbie Attachments: HDFS-7329.patch When createNameNodesAndSetConf() call fails MiniDFSCluster logs an ERROR. Would be good to add the actual exception in the log. Otherwise the actual reason of the failure is obscured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7329) MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails.
[ https://issues.apache.org/jira/browse/HDFS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HDFS-7329: - Attachment: HDFS-7329.patch Added patch. MiniDFSCluster should log the exception when createNameNodesAndSetConf() fails. --- Key: HDFS-7329 URL: https://issues.apache.org/jira/browse/HDFS-7329 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Byron Wong Labels: newbie Attachments: HDFS-7329.patch When createNameNodesAndSetConf() call fails MiniDFSCluster logs an ERROR. Would be good to add the actual exception in the log. Otherwise the actual reason of the failure is obscured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199072#comment-14199072 ] Hadoop QA commented on HDFS-7347: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679577/HDFS-7347-20141105.patch against trunk revision a7fbd4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.util.TestByteArrayManager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8656//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8656//console This message is automatically generated. Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199077#comment-14199077 ] stack commented on HDFS-7358: - See how we are 'Waiting for ack for: 42' twice in the below log snippet though we wrote out a seq no. 43. At about same time the allocate/recycle numbering goes 'off' at this time because its waiting on an ack that doesn't ever arrive so there is an outstanding allocation with a corresponding recycle that will never come (Should + one.releaseBuffer(byteArrayManager); be inside a finally block?) If I run with one thread only, I don't see this issue. It is only with two or more. My little program has 5 threads writing and calling sync. I turned this feature off and see that we are skipping ack numbers from time to time so this is problem is not brought on by this feature but you can't use this feature till its fixed. Looking... {code} ... 2014-11-05 11:16:47,293 DEBUG [sync.0] util.ByteArrayManager: allocate(65565): count=43, aboveThreshold, [131072: 1/10, free=1], recycled? true 2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: DFSClient writeChunk allocating new packet seqno=41, src=/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083, packetSize=65532, chunksPerPacket=127, bytesCurBlock=31232 2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: DFSClient flush() : bytesCurBlock 32088 lastFlushOffset 31579 2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: Queued packet 41 2014-11-05 11:16:47,293 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 41 2014-11-05 11:16:47,293 DEBUG [DataStreamer for file /user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083 block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] hdfs.DFSClient: DataStreamer block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940 sending packet packet seqno:41 offsetInBlock:31232 lastPacketInBlock:false lastByteOffsetInBlock: 32088 2014-11-05 11:16:47,294 DEBUG [ResponseProcessor for block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] hdfs.DFSClient: DFSClient seqno: 40 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 487791 2014-11-05 11:16:47,294 DEBUG [ResponseProcessor for block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] util.ByteArrayManager: recycle: array.length=131072, [131072: 2/10, free=0], freeQueue.offer, freeQueueSize=1 2014-11-05 11:16:47,294 DEBUG [ResponseProcessor for block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] hdfs.DFSClient: DFSClient seqno: 41 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 465086 2014-11-05 11:16:47,294 DEBUG [sync.1] util.ByteArrayManager: allocate(65565): count=44, aboveThreshold, [131072: 1/10, free=1], recycled? true 2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: DFSClient writeChunk allocating new packet seqno=42, src=/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083, packetSize=65532, chunksPerPacket=127, bytesCurBlock=31744 2014-11-05 11:16:47,295 DEBUG [ResponseProcessor for block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] util.ByteArrayManager: recycle: array.length=131072, [131072: 2/10, free=0], freeQueue.offer, freeQueueSize=1 2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: DFSClient flush() : bytesCurBlock 32853 lastFlushOffset 32088 2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: Queued packet 42 2014-11-05 11:16:47,295 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 42 2014-11-05 11:16:47,295 DEBUG [DataStreamer for file /user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083 block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940] hdfs.DFSClient: DataStreamer block BP-410607956-10.20.84.26-1391491814882:blk_1075488801_1099513376940 sending packet packet seqno:42 offsetInBlock:31744 lastPacketInBlock:false lastByteOffsetInBlock: 32853 2014-11-05 11:16:47,295 DEBUG [sync.0] util.ByteArrayManager: allocate(65565): count=45, aboveThreshold, [131072: 1/10, free=1], recycled? true 2014-11-05 11:16:47,295 DEBUG [sync.0] hdfs.DFSClient: DFSClient writeChunk allocating new packet seqno=43, src=/user/stack/test-data/2256ed2b-6cc1-4144-88a5-227baf11842c/HLogPerformanceEvaluation/wals/hlog.1415215004083, packetSize=65532, chunksPerPacket=127, bytesCurBlock=32768 2014-11-05 11:16:47,295 DEBUG [sync.0] hdfs.DFSClient: DFSClient flush() : bytesCurBlock 32853 lastFlushOffset 32853 2014-11-05 11:16:47,295 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 42 2014-11-05 11:16:47,296 DEBUG [ResponseProcessor for block
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199095#comment-14199095 ] Hadoop QA commented on HDFS-7359: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679589/HDFS-7359.2.patch against trunk revision 1831280. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.balancer.TestBalancer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8658//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8658//console This message is automatically generated. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199124#comment-14199124 ] Chris Nauroth commented on HDFS-7359: - The test failures are unrelated. {{TestBalancer}} has been flaky. It's passing for me locally. The {{TestCheckpoint}} failure repros on current trunk even without this patch. We're still waiting on the Jenkins run for patch v3, which is currently in progress. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199141#comment-14199141 ] Tsz Wo Nicholas Sze commented on HDFS-7358: --- ... (Should + one.releaseBuffer(byteArrayManager); be inside a finally block?) ... You make a good point that the array may not be released when the pipeline eventually fails. We cannot call releaseBuffer(..) in a finally block since, for the usual error cases, client will reconstruct the pipeline and retry sending the same packets. I will think about how to fix it. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199146#comment-14199146 ] Zhe Zhang commented on HDFS-7347: - {{TestByteArrayManager}} is unrelated and passes locally. Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7347-20141104.patch, HDFS-7347-20141105.patch HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199163#comment-14199163 ] stack commented on HDFS-7358: - Looking at packet sequence numbers, it seems like this just how it works -- that a later seqnumber acks outstanding ones (I don't know enough to call it otherwise -- maybe you know [~szetszwo]?) -- and if so, we will have outstanding allocations and our counts will be off. Thanks. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.
Chris Nauroth created HDFS-7361: --- Summary: TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation. Key: HDFS-7361 URL: https://issues.apache.org/jira/browse/HDFS-7361 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Reporter: Chris Nauroth Priority: Minor HDFS-7333 changed the log message related to locking violation on a storage directory. There is an assertion in {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing since that change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.
[ https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199171#comment-14199171 ] Chris Nauroth commented on HDFS-7361: - Here is the output from a failed test run. {code} Running org.apache.hadoop.hdfs.server.namenode.TestCheckpoint Tests run: 38, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 40.681 sec FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestCheckpoint testStorageAlreadyLockedErrorMessage(org.apache.hadoop.hdfs.server.namenode.TestCheckpoint) Time elapsed: 0.079 sec FAILURE! java.lang.AssertionError: Log output does not contain expected log message: It appears that another namenode 28733@Chriss-MacBook-Pro.local has already locked the storage directory at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testStorageAlreadyLockedErrorMessage(TestCheckpoint.java:867) {code} TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation. --- Key: HDFS-7361 URL: https://issues.apache.org/jira/browse/HDFS-7361 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Reporter: Chris Nauroth Priority: Minor HDFS-7333 changed the log message related to locking violation on a storage directory. There is an assertion in {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing since that change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()
[ https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199174#comment-14199174 ] Chris Nauroth commented on HDFS-7333: - This patch introduced a test failure in {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}}. I filed HDFS-7361 to track it. [~shv], would you please take a look? Thank you. Improve log message in Storage.tryLock() Key: HDFS-7333 URL: https://issues.apache.org/jira/browse/HDFS-7333 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: logging.patch Confusing log message in Storage.tryLock(). It talks about namenode, while this is a common part of NameNode and DataNode storage. The log message should include the directory path and the exception. Also fix the long line in tryLock(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199176#comment-14199176 ] Chris Nauroth commented on HDFS-7359: - The {{TestCheckpoint}} failure was introduced in HDFS-7333. I filed HDFS-7361 to track fixing it. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199182#comment-14199182 ] Tsz Wo Nicholas Sze commented on HDFS-7358: --- ... that a later seqnumber acks outstanding ones ... The pipeline expects an ack for every packets. It won't have acks with skipped seq no. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199189#comment-14199189 ] stack commented on HDFS-7358: - Makes sense. There is a bug in dfsoutputstream then? I can get skipping of seqno without this feature enabled. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7362) Proxy user refresh won't modify or remove existing groups or hosts from super user list
Eric Payne created HDFS-7362: Summary: Proxy user refresh won't modify or remove existing groups or hosts from super user list Key: HDFS-7362 URL: https://issues.apache.org/jira/browse/HDFS-7362 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne 2.x added a new DefaultImpersonationProvider class for reading the superuser configuration. In this class, once the host and group properties for a proxyuser are defined, they cannot be removed or modified without bouncing the daemon. As long as the config is updated correctly the first time, this problem won't manifest itself. Once defined, these properties don't tend to change. However, if the properties are mis-entered the first time, restarting the NN/RM/JHS/etc will be necessary to correctly re-read the config. An admin refresh won't do it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.
[ https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199209#comment-14199209 ] Konstantin Shvachko commented on HDFS-7361: --- Sure will fix this. Wonder why Jenkins didn't fail for HDFS-7333. TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation. --- Key: HDFS-7361 URL: https://issues.apache.org/jira/browse/HDFS-7361 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Reporter: Chris Nauroth Priority: Minor HDFS-7333 changed the log message related to locking violation on a storage directory. There is an assertion in {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing since that change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7363) Pluggable algorithms to form block groups in erasure coding
Zhe Zhang created HDFS-7363: --- Summary: Pluggable algorithms to form block groups in erasure coding Key: HDFS-7363 URL: https://issues.apache.org/jira/browse/HDFS-7363 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199228#comment-14199228 ] Hadoop QA commented on HDFS-7359: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679600/HDFS-7359.3.patch against trunk revision bc80251. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCheckpoint The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.util.TestExactSizeInputStream org.apache.haTests org.apache.hadoop.hdfs.web.TestAuthFilter org.apache.hadoop.hdfs.web.TestWebTests org.apache.hadoop.hdfs.TesTests org.apacheTests org.apache.hadoop.hdfs.TestFSInputChecker org.apache.hadoop.hdfs.serveTests org.apache.hadoop.hdfs.server.Tests org.apache.hadoop.hdfs.sTests org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.TestClientReportBadBlock {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8661//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8661//console This message is automatically generated. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199229#comment-14199229 ] Hadoop QA commented on HDFS-7279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679588/HDFS-7279.007.patch against trunk revision b4c951a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.TestRollingUpgrade org.apache.hadoop.hdfs.TestParallelUnixDomainRead The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.util.TestExactSizeInputStream org.apache.haTests org.apache.hadoop.hdfs.web.TestAuthFilter org.apache.hadoop.hdfs.web.TestWebTests org.apache.hadoop.hdfs.TesTests org.apacheTests org.apache.hadoop.hdfs.TestFSInputChecker org.apache.hadoop.hdfs.serveTests org.apache.hadoop.hdfs.server.Tests org.apache.hadoop.hdfs.sTests org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.TestClientReportBadBlock {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8660//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8660//console This message is automatically generated. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199246#comment-14199246 ] stack commented on HDFS-7358: - Without ByteArrayManager enabled, using tip of 2.6 logging at DEBUG level grepping 'Waiting for ack' I see us skipping packet seqnos. See below. See doubled '8', '22', and '26'. {code} 2014-11-05 14:08:57,240 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 1 2014-11-05 14:08:57,243 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 2 2014-11-05 14:08:57,245 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 3 2014-11-05 14:08:57,246 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 4 2014-11-05 14:08:57,246 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 5 2014-11-05 14:08:57,249 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 6 2014-11-05 14:08:57,250 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 7 2014-11-05 14:08:57,252 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 8 2014-11-05 14:08:57,252 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 8 2014-11-05 14:08:57,253 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 10 2014-11-05 14:08:57,254 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 11 2014-11-05 14:08:57,255 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 12 2014-11-05 14:08:57,255 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 13 2014-11-05 14:08:57,257 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 14 2014-11-05 14:08:57,258 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 15 2014-11-05 14:08:57,258 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 16 2014-11-05 14:08:57,259 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 17 2014-11-05 14:08:57,261 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 18 2014-11-05 14:08:57,262 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 19 2014-11-05 14:08:57,263 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 20 2014-11-05 14:08:57,264 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 21 2014-11-05 14:08:57,265 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 22 2014-11-05 14:08:57,265 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 22 2014-11-05 14:08:57,267 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 24 2014-11-05 14:08:57,267 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 25 2014-11-05 14:08:57,268 DEBUG [sync.0] hdfs.DFSClient: Waiting for ack for: 26 2014-11-05 14:08:57,268 DEBUG [sync.1] hdfs.DFSClient: Waiting for ack for: 26 2014-11-05 14:08:57,270 DEBUG [sync.2] hdfs.DFSClient: Waiting for ack for: 28 2014-11-05 14:08:57,270 DEBUG [sync.3] hdfs.DFSClient: Waiting for ack for: 29 2014-11-05 14:08:57,271 DEBUG [sync.4] hdfs.DFSClient: Waiting for ack for: 30 ... {code} Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.
[ https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7361: -- Attachment: HDFS-7361.patch Here is the patch that fixes TestCheckpoint. Also wrapped long lines in testStorageAlreadyLockedErrorMessage(). TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation. --- Key: HDFS-7361 URL: https://issues.apache.org/jira/browse/HDFS-7361 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Reporter: Chris Nauroth Priority: Minor Attachments: HDFS-7361.patch HDFS-7333 changed the log message related to locking violation on a storage directory. There is an assertion in {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing since that change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7361) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation.
[ https://issues.apache.org/jira/browse/HDFS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7361: -- Assignee: Konstantin Shvachko Status: Patch Available (was: Open) TestCheckpoint#testStorageAlreadyLockedErrorMessage fails after change of log message related to locking violation. --- Key: HDFS-7361 URL: https://issues.apache.org/jira/browse/HDFS-7361 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Reporter: Chris Nauroth Assignee: Konstantin Shvachko Priority: Minor Attachments: HDFS-7361.patch HDFS-7333 changed the log message related to locking violation on a storage directory. There is an assertion in {{TestCheckpoint#testStorageAlreadyLockedErrorMessage}} that has been failing since that change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7333) Improve log message in Storage.tryLock()
[ https://issues.apache.org/jira/browse/HDFS-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199261#comment-14199261 ] Konstantin Shvachko commented on HDFS-7333: --- Sounds like Jenkins build is very much broken. It is one thing when a build gives you false negatives. But false positives make it broken, imho. Improve log message in Storage.tryLock() Key: HDFS-7333 URL: https://issues.apache.org/jira/browse/HDFS-7333 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: logging.patch Confusing log message in Storage.tryLock(). It talks about namenode, while this is a common part of NameNode and DataNode storage. The log message should include the directory path and the exception. Also fix the long line in tryLock(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7359: Hadoop Flags: Reviewed I think something confused the string parsing Jenkins does to search for timed out tests. I reviewed the console output, and I didn't see any evidence that these tests had timed out. I reran locally, and they were all fine. I'll commit this later today. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7359) NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address.
[ https://issues.apache.org/jira/browse/HDFS-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199264#comment-14199264 ] Hadoop QA commented on HDFS-7359: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679600/HDFS-7359.3.patch against trunk revision bc80251. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCheckpoint {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8662//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8662//console This message is automatically generated. NameNode in secured HA cluster fails to start if dfs.namenode.secondary.http-address cannot be interpreted as a network address. Key: HDFS-7359 URL: https://issues.apache.org/jira/browse/HDFS-7359 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-7359.1.patch, HDFS-7359.2.patch, HDFS-7359.3.patch In a secured cluster, the JournalNode validates that the caller is one of a valid set of principals. One of the principals considered is that of the SecondaryNameNode. This involves checking {{dfs.namenode.secondary.http-address}} and trying to interpret it as a network address. If a user has specified a value for this property that cannot be interpeted as a network address, such as null, then this causes the JournalNode operation to fail, and ultimately the NameNode cannot start. The JournalNode should not have a hard dependency on {{dfs.namenode.secondary.http-address}} like this. It is not typical to run a SecondaryNameNode in combination with JournalNodes. There is even a check in SecondaryNameNode that aborts if HA is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)