[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203335#comment-13203335 ] Uma Maheswara Rao G commented on HDFS-2911: --- Suresh, yes you are right. Thinking again, how can we do this(fast fail) in client code? That will run along with the several kind of applications right. And that will be again upto user interest to fastfail on OOME or not. We will have ipc threads and streamer threads running at clinet side. am i missing? > Gracefully handle OutOfMemoryErrors > --- > > Key: HDFS-2911 > URL: https://issues.apache.org/jira/browse/HDFS-2911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Eli Collins > > We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. > We should catch them in a high-level handler, cleanly fail the RPC (vs > sending back the OOM stackrace) or background thread, and shutdown the NN or > DN. Currently the process is left in a not well-test tested state > (continuously fails RPCs and internal threads, may or may not recover and > doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-654) HDFS needs to support new rename introduced for FileContext
[ https://issues.apache.org/jira/browse/HDFS-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203329#comment-13203329 ] Uma Maheswara Rao G commented on HDFS-654: -- {quote} The count changes when the destination is removed. FSDirectory.removeChild(dstInode) and FSNamesystem.removePathAndBlocks() decrements the total INode count and the number of blocks. Also the lease to the removed destination is also removed. {quote} Here in new rename api, we are removing the blocks and adding to invalidates. We did not synced the edit log before adding to invalidates. This can leads to miss the blocks, as i explained the scenario in HDFS-2815. I did not verify this yet. Will file a separate JIRA, once i confirm this as a bug. > HDFS needs to support new rename introduced for FileContext > --- > > Key: HDFS-654 > URL: https://issues.apache.org/jira/browse/HDFS-654 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 0.21.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.21.0 > > Attachments: HDFS-654.patch, hdfs-654.1.patch, hdfs-654.2.patch, > hdfs-654.3.patch, hdfs-654.5.patch, hdfs-654.5.patch, hdfs-654.7.patch, > hdfs-654.9.patch > > > New rename functionality with different semantics to overwrite the existing > destination was introduced for use in FileContext. Currently the default > implementation in FileSystem is not atomic. This change implements atomic > rename operation for use by FileContext. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203313#comment-13203313 ] Suresh Srinivas commented on HDFS-2911: --- bq. that a reasonable application should not try to catch Nicholas, I think what this means is, an application should not try to catch it for recovery purpose. I think failing fast instead of trying to recover seems like a reasonable choice. @Uma bq. I too agree. You are agreeing with Eli? > Gracefully handle OutOfMemoryErrors > --- > > Key: HDFS-2911 > URL: https://issues.apache.org/jira/browse/HDFS-2911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Eli Collins > > We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. > We should catch them in a high-level handler, cleanly fail the RPC (vs > sending back the OOM stackrace) or background thread, and shutdown the NN or > DN. Currently the process is left in a not well-test tested state > (continuously fails RPCs and internal threads, may or may not recover and > doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203308#comment-13203308 ] Hudson commented on HDFS-2764: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1704 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1704/]) HDFS-2764. TestBackupNode is racy. Contributed by Aaron T. Myers. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241780 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2764.patch, HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203293#comment-13203293 ] Hudson commented on HDFS-2764: -- Integrated in Hadoop-Common-trunk-Commit #1693 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1693/]) HDFS-2764. TestBackupNode is racy. Contributed by Aaron T. Myers. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241780 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2764.patch, HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203292#comment-13203292 ] Hudson commented on HDFS-2764: -- Integrated in Hadoop-Hdfs-trunk-Commit #1768 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1768/]) HDFS-2764. TestBackupNode is racy. Contributed by Aaron T. Myers. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241780 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2764.patch, HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-2764. -- Resolution: Fixed Fix Version/s: 0.24.0 Hadoop Flags: Reviewed I've just committed this to trunk. > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2764.patch, HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Attachment: HDFS-2764.patch Thanks a lot for the review, Eli. Here's a patch which adds the comment per your suggestion. I'll commit this momentarily based on your +1 since it's just a comment change. > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch, HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Status: Open (was: Patch Available) > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203281#comment-13203281 ] Hudson commented on HDFS-2786: -- Integrated in Hadoop-Mapreduce-0.23-Commit #524 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/524/]) Merged r1241766 from trunk for HDFS-2786. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241768 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2579) Starting delegation token manager during safemode fails
[ https://issues.apache.org/jira/browse/HDFS-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203271#comment-13203271 ] Jitendra Nath Pandey commented on HDFS-2579: bq. The issue is that the "stopSecretManager" call is holding the FSNamesystem lock, but the secret manager thread is waiting on the same lock. Another possible approach: Secret manager acquires namesystem write lock using tryLock with a timeout, in a loop and checks the "running" flag before attempting tryLock. Since it is not a deadlock situation, stopSecretManager will be able to mark running as false. > Starting delegation token manager during safemode fails > --- > > Key: HDFS-2579 > URL: https://issues.apache.org/jira/browse/HDFS-2579 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node, security >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2579.txt, hdfs-2579.txt, hdfs-2579.txt > > > I noticed this on the HA branch, but it seems to actually affect non-HA > branch 0.23 if security is enabled. When the NN starts up, if security is > enabled, we start the delegation token secret manager, which then tries to > call {{logUpdateMasterKey}}. This fails because the edit logs may not be > written while in safe-mode. > It seems to me that there's not any necessary reason that you have to make a > new master key at startup, since you've loaded the old key when you load the > FSImage. You'd only be lacking a DT master key on a fresh cluster, in which > case we could have it generate one at format time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203270#comment-13203270 ] Hudson commented on HDFS-2786: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1703 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1703/]) HDFS-2786. Fix host-based token incompatibilities in DFSUtil. Contributed by Kihwal Lee. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241766 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203265#comment-13203265 ] Jitendra Nath Pandey commented on HDFS-2914: Standby doesn't need to enter safe mode because it is not writing any transactions anyway. When it transitions to active, that's when a check for available resources to write logs should be performed. > HA: Standby stuck in safemode when shared edits directory is bounced > > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203261#comment-13203261 ] Hudson commented on HDFS-2786: -- Integrated in Hadoop-Common-0.23-Commit #520 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/520/]) Merged r1241766 from trunk for HDFS-2786. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241768 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203257#comment-13203257 ] Hudson commented on HDFS-2786: -- Integrated in Hadoop-Hdfs-0.23-Commit #509 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/509/]) Merged r1241766 from trunk for HDFS-2786. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241768 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203251#comment-13203251 ] Hudson commented on HDFS-2786: -- Integrated in Hadoop-Common-trunk-Commit #1692 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1692/]) HDFS-2786. Fix host-based token incompatibilities in DFSUtil. Contributed by Kihwal Lee. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241766 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203250#comment-13203250 ] Hudson commented on HDFS-2786: -- Integrated in Hadoop-Hdfs-trunk-Commit #1767 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1767/]) HDFS-2786. Fix host-based token incompatibilities in DFSUtil. Contributed by Kihwal Lee. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241766 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-2786: --- Resolution: Fixed Fix Version/s: 0.23.1 0.24.0 Target Version/s: 0.24.0, 0.23.1 (was: 0.23.1, 0.24.0) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks to Kihwal. > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2887) Define a FSVolume interface
[ https://issues.apache.org/jira/browse/HDFS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203243#comment-13203243 ] Hadoop QA commented on HDFS-2887: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513748/h2887_20120207.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1854//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1854//console This message is automatically generated. > Define a FSVolume interface > --- > > Key: HDFS-2887 > URL: https://issues.apache.org/jira/browse/HDFS-2887 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h2887_20120203.patch, h2887_20120207.patch > > > FSVolume is an inner class in FSDataset. It is actually a part of the > implementation of FSDatasetInterface. It is better to define a new > interface, namely FSVolumeInterface, to capture the abstraction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203241#comment-13203241 ] Jitendra Nath Pandey commented on HDFS-2786: +1. lgtm > Fix host-based token incompatibilities in DFSUtil > - > > Key: HDFS-2786 > URL: https://issues.apache.org/jira/browse/HDFS-2786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node, security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Daryn Sharp >Assignee: Kihwal Lee > Attachments: hdfs-2786.patch, hdfs-2786.patch > > > DFSUtil introduces new static methods that duplicate functionality in > NetUtils. These new methods lack the logic necessary for host-based tokens > to work. After speaking with Suresh, the approach being taken is: > * DFSUtil.getSocketAddress will be removed. Callers will be reverted to > using the NetUtils version. > * DFSUtil.getDFSClient will changed to take accept a uri/host:port string > instead of an InetSocketAddress. The method will internal call > NetUtils.createSocketAddr. This alleviates the callers from being required to > call NetUtils.createSocketAddr and reduce the opportunity for error that will > break host-based tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203229#comment-13203229 ] Hudson commented on HDFS-2572: -- Integrated in Hadoop-Mapreduce-0.23-Commit #522 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/522/]) HDFS-2572. Removed since it's only committed to trunk, not 0.23.0. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241747 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203227#comment-13203227 ] Hudson commented on HDFS-2572: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1701 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1701/]) HDFS-2572. Moved to trunk section from 0.23.1 acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241746 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2905) HA: Standby NN NPE when shared edits dir is deleted
[ https://issues.apache.org/jira/browse/HDFS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203220#comment-13203220 ] Jitendra Nath Pandey commented on HDFS-2905: +1. I have committed this. Thanks to Bikas. > HA: Standby NN NPE when shared edits dir is deleted > --- > > Key: HDFS-2905 > URL: https://issues.apache.org/jira/browse/HDFS-2905 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2905.HDFS-1623.patch, HDFS-2905.HDFS-1623.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2905) HA: Standby NN NPE when shared edits dir is deleted
[ https://issues.apache.org/jira/browse/HDFS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HDFS-2905. Resolution: Fixed Hadoop Flags: Reviewed > HA: Standby NN NPE when shared edits dir is deleted > --- > > Key: HDFS-2905 > URL: https://issues.apache.org/jira/browse/HDFS-2905 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2905.HDFS-1623.patch, HDFS-2905.HDFS-1623.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203217#comment-13203217 ] Todd Lipcon commented on HDFS-2912: --- I think the issue is this -- previously the abort logic was to only do Runtime.exit(1) when a _sync_ fails. We figured this was sufficient since it guards against data loss. But, as you've pointed out in the JIRAs today, there are some other cases where we should abort to avoid getting into an inconsistent state. The old code (which is verified by the tests Aaron mentioned above -- look for mock(Runtime.class) ) does the abort by catching the IOException thrown by mapJournalsAndReportErrors and aborting at that point. The particular call site is logSync() in FSEditLog. So we either need to do as you did (and abort from mapJournalsAndReportErrors itself) or change _all_ of the call sites to do the abort in case an exception is thrown. > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2909.HDFS-1623.patch > > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203210#comment-13203210 ] Bikas Saha commented on HDFS-2912: -- Could you please point me to the test that verifies the LOG.Fatal section that was added to JournalSet.mapJournalsAndReportErrors()? I should ideally be modifying that test to verify the new change to that piece of code. > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2909.HDFS-1623.patch > > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2579) Starting delegation token manager during safemode fails
[ https://issues.apache.org/jira/browse/HDFS-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2579: -- Attachment: hdfs-2579.txt The solution to the above problem turned out to be a little more complicated. The issue is that, once I just made it use lockInterruptibly, I ran into another race where the thread would get interrupted just before logSync() was called. If you interrupt a thread while it's in this critical edit log code, it can actually abort the whole NN. So, I had to add some locking around the interrupt to ensure that the DTSM thread doesn't get interrupted during logsync, etc. > Starting delegation token manager during safemode fails > --- > > Key: HDFS-2579 > URL: https://issues.apache.org/jira/browse/HDFS-2579 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node, security >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2579.txt, hdfs-2579.txt, hdfs-2579.txt > > > I noticed this on the HA branch, but it seems to actually affect non-HA > branch 0.23 if security is enabled. When the NN starts up, if security is > enabled, we start the delegation token secret manager, which then tries to > call {{logUpdateMasterKey}}. This fails because the edit logs may not be > written while in safe-mode. > It seems to me that there's not any necessary reason that you have to make a > new master key at startup, since you've loaded the old key when you load the > FSImage. You'd only be lacking a DT master key on a fresh cluster, in which > case we could have it generate one at format time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203208#comment-13203208 ] Eli Collins commented on HDFS-2764: --- +1 nice find. I'd add a comment like the following: {code} // The checkpoint is not done until the nn has received it from the bn thisCheckpointTxId = cluster.getNameNode().getFSImage().getStorage() {code} > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2614) hadoop dist tarball is missing hdfs headers
[ https://issues.apache.org/jira/browse/HDFS-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2614: -- Affects Version/s: (was: 0.24.0) > hadoop dist tarball is missing hdfs headers > --- > > Key: HDFS-2614 > URL: https://issues.apache.org/jira/browse/HDFS-2614 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 0.23.1 >Reporter: Bruno Mahé >Assignee: Alejandro Abdelnur > Labels: bigtop > Fix For: 0.23.1 > > Attachments: HDFS-2614.patch > > > It would be nice to provide hdfs header so one could easily write programs to > be linked against that library and access HDFS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203203#comment-13203203 ] Hudson commented on HDFS-2572: -- Integrated in Hadoop-Hdfs-0.23-Commit #507 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/507/]) HDFS-2572. Removed since it's only committed to trunk, not 0.23.0. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241747 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2893) The start/stop scripts don't start/stop the 2NN when using the default configuration
[ https://issues.apache.org/jira/browse/HDFS-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2893: Fix Version/s: 0.23.1 > The start/stop scripts don't start/stop the 2NN when using the default > configuration > > > Key: HDFS-2893 > URL: https://issues.apache.org/jira/browse/HDFS-2893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.1 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Fix For: 0.23.1 > > Attachments: hdfs-2893.txt > > > HDFS-1703 changed the behavior of the start/stop scripts so that the masters > file is no longer used to indicate which hosts to start the 2NN on. The 2NN > is now started, when using start-dfs.sh, on hosts only when > dfs.namenode.secondary.http-address is configured with a non-wildcard IP. > This means you can not start a NN using an http-address specified using a > wildcard IP. We should allow a 2NN to be started with the default config, ie > start-dfs.sh should start a NN, 2NN and DN. The packaging already works this > way (it doesn't use start-dfs.sh, it uses hadoop-daemon.sh directly w/o first > checking getconf) so let's bring start-dfs.sh in line with this behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2886) CreateEditLogs should generate a realistic edit log.
[ https://issues.apache.org/jira/browse/HDFS-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2886: Target Version/s: 0.24.0, 0.23.1, 0.22.1 (was: 0.22.1, 0.23.1, 0.24.0) Fix Version/s: (was: 0.23.1) (was: 0.24.0) > CreateEditLogs should generate a realistic edit log. > > > Key: HDFS-2886 > URL: https://issues.apache.org/jira/browse/HDFS-2886 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 0.24.0, 0.23.1, 0.22.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.1 > > Attachments: createLog-0.22.patch, createLog-trunk.patch > > > CreateEditsLog generates non-standard transactions. In real life first > transaction that creates a file does not contain blocks. While CreateEditsLog > adds blocks to this transaction. Change CreateEditsLog to produce real-life > transaction. > Also cleanup unused parameters for {{FSDirectory.updateFile()}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit
[ https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2877: Target Version/s: 0.23.1, 1.1.0, 0.22.1 (was: 0.22.1, 1.1.0, 0.23.1) Fix Version/s: (was: 0.23.1) (was: 0.24.0) > If locking of a storage dir fails, it will remove the other NN's lock file on > exit > -- > > Key: HDFS-2877 > URL: https://issues.apache.org/jira/browse/HDFS-2877 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0, 0.24.0, 1.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 1.1.0, 0.22.1 > > Attachments: hdfs-2877.txt > > > In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of > whether we successfully lock the directory. So, if another NN has the > directory locked, then we'll fail to lock it the first time we start another > NN. But our failed start attempt will still remove the other NN's lockfile, > and a second attempt will erroneously start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203194#comment-13203194 ] Hudson commented on HDFS-2572: -- Integrated in Hadoop-Hdfs-trunk-Commit #1765 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1765/]) HDFS-2572. Moved to trunk section from 0.23.1 acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241746 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2718) Optimize OP_ADD in edits loading
[ https://issues.apache.org/jira/browse/HDFS-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2718: Target Version/s: 0.24.0, 0.23.1, 0.22.1 (was: 0.22.1, 0.23.1, 0.24.0) Fix Version/s: (was: 0.23.1) (was: 0.24.0) > Optimize OP_ADD in edits loading > > > Key: HDFS-2718 > URL: https://issues.apache.org/jira/browse/HDFS-2718 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0, 0.24.0, 1.0.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.1 > > Attachments: editsLoader-0.22.patch, editsLoader-0.22.patch, > editsLoader-0.22.patch, editsLoader-trunk.patch, editsLoader-trunk.patch, > editsLoader-trunk.patch, editsLoader-trunk.patch > > > During loading the edits journal FSEditLog.loadEditRecords() processes OP_ADD > inefficiently. It first removes the existing INodeFile from the directory > tree, then adds it back as a regular INodeFile, and then replaces it with > INodeFileUnderConstruction if files is not closed. This slows down edits > loading. OP_ADD should be done in one shot and retain previously existing > data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2887) Define a FSVolume interface
[ https://issues.apache.org/jira/browse/HDFS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2887: - Attachment: h2887_20120207.patch h2887_20120207.patch: - moves the methods in BlockPoolSliceInterface to FSVolumeInterface so that BlockPoolSliceInterface becomes unnecessary; - moves the static utility methods from FSDataset to DatanodeUtil; > Define a FSVolume interface > --- > > Key: HDFS-2887 > URL: https://issues.apache.org/jira/browse/HDFS-2887 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h2887_20120203.patch, h2887_20120207.patch > > > FSVolume is an inner class in FSDataset. It is actually a part of the > implementation of FSDatasetInterface. It is better to define a new > interface, namely FSVolumeInterface, to capture the abstraction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2707) HttpFS should read the hadoop-auth secret from a file instead inline from the configuration
[ https://issues.apache.org/jira/browse/HDFS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2707: Fix Version/s: (was: 0.24.0) 0.23.1 > HttpFS should read the hadoop-auth secret from a file instead inline from the > configuration > --- > > Key: HDFS-2707 > URL: https://issues.apache.org/jira/browse/HDFS-2707 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 0.24.0, 0.23.1 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Fix For: 0.23.1 > > Attachments: HDFS-2707.patch, HDFS-2707.patch > > > Similar to HADOOP-7621, the secret should be in a file other than the > configuration file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2857) Cleanup BlockInfo class
[ https://issues.apache.org/jira/browse/HDFS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203189#comment-13203189 ] Suresh Srinivas commented on HDFS-2857: --- Given that this patch is not a straight forward port, I will not commit this to 0.23 > Cleanup BlockInfo class > --- > > Key: HDFS-2857 > URL: https://issues.apache.org/jira/browse/HDFS-2857 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.24.0 > > Attachments: HDFS-2857.23.txt, HDFS-2857.txt > > > Following are some of the cleanup required: > # Remove unnecessary methods > # Add interface annotation > # Make some of the method private -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203188#comment-13203188 ] Hudson commented on HDFS-2572: -- Integrated in Hadoop-Common-0.23-Commit #518 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/518/]) HDFS-2572. Removed since it's only committed to trunk, not 0.23.0. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241747 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203183#comment-13203183 ] Aaron T. Myers commented on HDFS-2912: -- bq. Since the patch calls Runtime.exit(1) I dont know of any way to test it other than the manual test. There are several tests around which stub in mock Runtime objects so the Runtime.exit(...) doesn't actually cause a JVM exit. These tests then verify that Runtime.exit(...) was called the appropriate number of times. > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2909.HDFS-1623.patch > > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203185#comment-13203185 ] Hudson commented on HDFS-2572: -- Integrated in Hadoop-Common-trunk-Commit #1690 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1690/]) HDFS-2572. Moved to trunk section from 0.23.1 acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241746 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2676) Remove Avro RPC
[ https://issues.apache.org/jira/browse/HDFS-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2676: Fix Version/s: (was: 0.23.1) > Remove Avro RPC > --- > > Key: HDFS-2676 > URL: https://issues.apache.org/jira/browse/HDFS-2676 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.1 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.24.0 > > Attachments: HDFS-2676.txt, HDFS-2676.txt, HDFS-2676.txt > > > Please see the discussion in HDFS-2660 for more details. I have created a > branch HADOOP-6659 to save the Avro work, if in the future some one wants to > use the work that existed to add support for Avro RPC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2788) HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code
[ https://issues.apache.org/jira/browse/HDFS-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2788: -- Fix Version/s: 0.23.1 > HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code > - > > Key: HDFS-2788 > URL: https://issues.apache.org/jira/browse/HDFS-2788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 0.23.1 > > Attachments: hdfs-2788.txt > > > HDFS-941 introduced HdfsServerConstants#DN_KEEPALIVE_TIMEOUT but its never > used. Perhaps was renamed to > DFSConfigKeys#DFS_DATANODE_SOCKET_REUSE_KEEPALIVE_DEFAULT while the patch was > written and the old one wasn't deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203179#comment-13203179 ] Hadoop QA commented on HDFS-2764: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513738/HDFS-2764.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1853//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1853//console This message is automatically generated. > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2596) TestDirectoryScanner doesn't test parallel scans
[ https://issues.apache.org/jira/browse/HDFS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2596: Fix Version/s: 0.23.1 > TestDirectoryScanner doesn't test parallel scans > > > Key: HDFS-2596 > URL: https://issues.apache.org/jira/browse/HDFS-2596 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, test >Affects Versions: 0.22.0, 0.23.0 >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 0.23.1 > > Attachments: hdfs-2596-1.patch > > > The code from HDFS-854 below doesn't run the test with parallel scanning. > They probably intended "parallelism < 3". > {code} > + public void testDirectoryScanner() throws Exception { > +// Run the test with and without parallel scanning > +for (int parallelism = 1; parallelism < 2; parallelism++) { > + runTest(parallelism); > +} > + } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Mankude reassigned HDFS-2914: -- Assignee: Hari Mankude > HA: Standby stuck in safemode when shared edits directory is bounced > > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2572) Unnecessary double-check in DN#getHostName
[ https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2572: -- Fix Version/s: 0.23.1 > Unnecessary double-check in DN#getHostName > -- > > Key: HDFS-2572 > URL: https://issues.apache.org/jira/browse/HDFS-2572 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2572.patch, HDFS-2572.patch > > > We do a double config.get unnecessarily inside DN#getHostName(...). Can be > removed by this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll
[ https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203173#comment-13203173 ] Bikas Saha commented on HDFS-2910: -- Sure. Perhaps that work would resolve this JIRA too. > HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir > is inaccessible during log roll > --- > > Key: HDFS-2910 > URL: https://issues.apache.org/jira/browse/HDFS-2910 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll
[ https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203171#comment-13203171 ] Todd Lipcon commented on HDFS-2910: --- In order to make the NN ride over a hiccup, it seems the solution is to add a more resilient JournalSet implementation -- ie either one that operates over a quorum of shared dirs, or one which has a more stubborn retry policy. Given that NFS itself already has built in retries and can be configured to arbitrary timeouts, it doesn't seem like we should worry about short hiccups -- any outage that makes it past the configured NFS retry/timeouts is likely to be worth causing a failover IMO. > HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir > is inaccessible during log roll > --- > > Key: HDFS-2910 > URL: https://issues.apache.org/jira/browse/HDFS-2910 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2
[ https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2654: -- Target Version/s: 0.23.1, 1.1.0 (was: 1.1.0, 0.23.1) Fix Version/s: 0.23.1 0.24.0 > Make BlockReaderLocal not extend RemoteBlockReader2 > --- > > Key: HDFS-2654 > URL: https://issues.apache.org/jira/browse/HDFS-2654 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.23.1, 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, > hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, > hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, > hdfs-2654-b1-4.patch > > > The BlockReaderLocal code paths are easier to understand (especially true on > branch-1 where BlockReaderLocal inherits code from BlockerReader and > FSInputChecker) if the local and remote block reader implementations are > independent, and they're not really sharing much code anyway. If for some > reason they start to share significant code we can make the BlockReader > interface an abstract class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2539: Fix Version/s: (was: 0.23.1) (was: 0.24.0) 0.23.0 > Support doAs and GETHOMEDIRECTORY in webhdfs > > > Key: HDFS-2539 > URL: https://issues.apache.org/jira/browse/HDFS-2539 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.23.0, 1.0.0 > > Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, > h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, > h2539_2009b.patch, h2539_2009b_0.20s.patch, h2539_2009c.patch, > h2539_2009c_0.20s.patch, h2539_2010.patch, > h2539_2010_0.20s.patch, h2539_2010b.patch, h2539_2010b_0.20s.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2540) Change WebHdfsFileSystem to two-step create/append
[ https://issues.apache.org/jira/browse/HDFS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2540: Fix Version/s: (was: 0.23.1) (was: 0.24.0) 0.23.0 > Change WebHdfsFileSystem to two-step create/append > -- > > Key: HDFS-2540 > URL: https://issues.apache.org/jira/browse/HDFS-2540 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.23.0, 1.0.0 > > Attachments: h2540_2007.patch, h2540_2007_0.20s.patch, > h2540_2008.patch, h2540_2008_0.20s.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203169#comment-13203169 ] Uma Maheswara Rao G commented on HDFS-2914: --- {quote} I could probably be persuaded that the NN should leave SM automatically once resources become available again, as long the implementation includes some measure(s) to prevent the NN from flapping in/out of SM if the free space is hovering near the threshold. Something like "leave SM automatically only if free space is now well above what is required, and only if it's been like that for several minutes." {quote} Yes, this sounds good. As NameNodeResourceChecker moved the system into safemode on some condition, should be its responsibility to take out of the safemode whenever system is out of that condition. > HA: Standby stuck in safemode when shared edits directory is bounced > > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2528) webhdfs rest call to a secure dn fails when a token is sent
[ https://issues.apache.org/jira/browse/HDFS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2528: Fix Version/s: (was: 0.23.1) (was: 0.24.0) 0.23.0 > webhdfs rest call to a secure dn fails when a token is sent > --- > > Key: HDFS-2528 > URL: https://issues.apache.org/jira/browse/HDFS-2528 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 0.20.205.0 >Reporter: Arpit Gupta >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.23.0, 1.0.0 > > Attachments: h2528_2001.patch, h2528_2001_0.20s.patch, > h2528_2001b.patch, h2528_2001b_0.20s.patch, h2528_2002.patch, > h2528_2002_0.20s.patch, h2528_2003.patch, h2528_2003_0.20s.patch, > h2528_2003_0.20s.patch > > > curl -L -u : --negotiate -i > "http://NN:50070/webhdfs/v1/tmp/webhdfs_data/file_small_data.txt?op=OPEN"; > the following exception is thrown by the datanode when the redirect happens. > {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"Call > to failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]"}} > Interestingly when using ./bin/hadoop with a webhdfs path we are able to cat > or tail a file successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2527) Remove the use of Range header from webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2527: Fix Version/s: (was: 0.23.1) (was: 0.24.0) 0.23.0 > Remove the use of Range header from webhdfs > --- > > Key: HDFS-2527 > URL: https://issues.apache.org/jira/browse/HDFS-2527 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.23.0, 1.0.0 > > Attachments: h2527_2001b_0.20s.patch, h2527_2002.patch, > h2527_2002_0.20s.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated HDFS-2912: - Attachment: HDFS-2909.HDFS-1623.patch Attached patch that implements the changed proposed in the previous comment. Since the patch calls Runtime.exit(1) I dont know of any way to test it other than the manual test. > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2909.HDFS-1623.patch > > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2416) distcp with a webhdfs uri on a secure cluster fails
[ https://issues.apache.org/jira/browse/HDFS-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2416: Fix Version/s: (was: 0.23.1) (was: 0.24.0) 0.23.0 > distcp with a webhdfs uri on a secure cluster fails > --- > > Key: HDFS-2416 > URL: https://issues.apache.org/jira/browse/HDFS-2416 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 0.20.205.0 >Reporter: Arpit Gupta >Assignee: Jitendra Nath Pandey > Fix For: 0.23.0, 1.0.0 > > Attachments: HDFS-2416-branch-0.20-security.6.patch, > HDFS-2416-branch-0.20-security.7.patch, > HDFS-2416-branch-0.20-security.8.patch, HDFS-2416-branch-0.20-security.patch, > HDFS-2416-trunk.patch, HDFS-2416-trunk.patch, > HDFS-2419-branch-0.20-security.patch, HDFS-2419-branch-0.20-security.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2397) Undeprecate SecondaryNameNode
[ https://issues.apache.org/jira/browse/HDFS-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-2397: Fix Version/s: 0.23.1 > Undeprecate SecondaryNameNode > - > > Key: HDFS-2397 > URL: https://issues.apache.org/jira/browse/HDFS-2397 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Todd Lipcon >Assignee: Eli Collins > Fix For: 0.23.1 > > Attachments: hdfs-2397.txt, hdfs-2397.txt, hdfs-2397.txt, > hdfs-2397.txt > > > I would like to consider un-deprecating the SecondaryNameNode for 0.23, and > amending the documentation to indicate that it is still the most trust-worthy > way to run checkpoints, and while CN/BN may have some advantages, they're not > battle hardened as of yet. The test coverage for the 2NN is far superior to > the CheckpointNode or BackupNode, and people have a lot more production > experience. Indicating that it is deprecated before we have expanded test > coverage of the CN/BN won't send the right message to our users. (For > comparison, look at what a mess we got into by prematurely deprecating the > "old" MR API before the "new" API had feature parity and a few versions of > bug fixes). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2570) Add descriptions for dfs.*.https.address in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2570: -- Fix Version/s: 0.24.0 > Add descriptions for dfs.*.https.address in hdfs-default.xml > > > Key: HDFS-2570 > URL: https://issues.apache.org/jira/browse/HDFS-2570 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 0.23.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: hdfs-2570-1.patch, hdfs-2570-2.patch > > > Let's add descriptions for dfs.*.https.address in hdfs-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll
[ https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203160#comment-13203160 ] Bikas Saha commented on HDFS-2910: -- That is for the current policy of shutting down the NN on such errors. But if the NN continues to be active for short transient shared dir hiccups then this needs to be fixed. So I will let this JIRA remain active. > HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir > is inaccessible during log roll > --- > > Key: HDFS-2910 > URL: https://issues.apache.org/jira/browse/HDFS-2910 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203157#comment-13203157 ] Bikas Saha commented on HDFS-2912: -- >From what I read of the code, for some of the cases (such as a flush of logs) >where the NN actually dies on shared dir hiccups the runtime.exit() call was >not added in the HA context. It was added when JournalSet was added by >Jitendra long ago. In any case, I would ideally like to have a cleaner shutdown mechanism to make sure that exit(1) do not proliferate in hard to find ways. Will let [HDFS-2913|https://issues.apache.org/jira/browse/HDFS-2913] track that. For now, I will add an exit(1) after the LOG.FATAL in JournalSet.mapJournalsAndReportErrors(). This is the common code path through which all journal operations go through (roll edit logs, flush etc). So putting one here should hopefully catch all journal related cases. > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2568) Use a set to manage child sockets in XceiverServer
[ https://issues.apache.org/jira/browse/HDFS-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2568: -- Fix Version/s: 0.24.0 > Use a set to manage child sockets in XceiverServer > -- > > Key: HDFS-2568 > URL: https://issues.apache.org/jira/browse/HDFS-2568 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.24.0 >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2568.patch, HDFS-2568.patch > > > Found while reading up for HDFS-2454, currently we maintain childSockets in a > DataXceiverServer as a Map. This can very well be a > Set data structure -- since the goal is easy removals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2543) HADOOP_PREFIX cannot be overriden
[ https://issues.apache.org/jira/browse/HDFS-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2543: -- Fix Version/s: 0.24.0 > HADOOP_PREFIX cannot be overriden > - > > Key: HDFS-2543 > URL: https://issues.apache.org/jira/browse/HDFS-2543 > Project: Hadoop HDFS > Issue Type: Bug > Components: scripts >Affects Versions: 0.23.0 >Reporter: Bruno Mahé >Assignee: Bruno Mahé > Labels: bigtop > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2543.patch > > > hadoop-config.sh forces HADOOP_prefix to a specific value: > export HADOOP_PREFIX=`dirname "$this"`/.. > It would be nice to make this overridable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203135#comment-13203135 ] Aaron T. Myers commented on HDFS-2914: -- bq. The issue I see is that even if this standby is made active later on, it will not exit out of the safemode unless user does the safemode leave. Do we want this behaviour? I think we probably do. If the NFS mount is flaky, we've got bigger problems than just the NN being moved into SM. bq. The other problem with this approach is that if nfs dir bounces even once, standby will go into safemode and this will happen silently without alerts. I guess the admin should configure some alerts for the NN being in SM, then. :) But regardless, I could probably be persuaded that the NN should leave SM automatically once resources become available again, as long the implementation includes some measure(s) to prevent the NN from flapping in/out of SM if the free space is hovering near the threshold. Something like "leave SM automatically only if free space is now well above what is required, and only if it's been like that for several minutes." Such a change would not be specific to the HA branch, however, and should probably be done on trunk. > HA: Standby stuck in safemode when shared edits directory is bounced > > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability
[ https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203133#comment-13203133 ] Uma Maheswara Rao G commented on HDFS-2362: --- Ok, Thanks Eli. > More Improvements on NameNode Scalability > - > > Key: HDFS-2362 > URL: https://issues.apache.org/jira/browse/HDFS-2362 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Hairong Kuang > > This jira acts as an umbrella jira to track all the improvements we've done > recently to improve Namenode's performance, responsiveness, and hence > scalability. Those improvements include: > 1. Incremental block reports (HDFS-395) > 2. BlockManager.reportDiff optimization for processing block reports > (HDFS-2477) > 3. Upgradable lock to allow simutaleous read operation while reportDiff is in > progress in processing block reports (HDFS-2490) > 4. More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks (HDFS-2476) > 5. Increase granularity of write operations in ReplicationMonitor thus > reducing contention for write lock (HDFS-2495) > 6. Support variable block sizes > 7. Release RPC handlers while waiting for edit log is synced to disk > 8. Reduce network traffic pressure to the master rack where NN is located by > lowering read priority of the replicas on the rack > 9. A standalone KeepAlive heartbeat thread > 10. Reduce Multiple traversals of path directory to one for most namespace > manipulations > 11. Move logging out of write lock section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2594) webhdfs HTTP API should implement getDelegationTokens() instead getDelegationToken()
[ https://issues.apache.org/jira/browse/HDFS-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2594: -- Fix Version/s: 0.24.0 > webhdfs HTTP API should implement getDelegationTokens() instead > getDelegationToken() > > > Key: HDFS-2594 > URL: https://issues.apache.org/jira/browse/HDFS-2594 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0, 0.23.1 >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: h2594_2030.patch, h2594_2030_no_apt.patch, > h2594_20111201.patch > > > The current API returns a single delegation token, that method from the > FileSystem API is deprecated in favor of the one that returns a list of > tokens. The HTTP API should implement the new/undeprecated signature > getDelegationTokens(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203124#comment-13203124 ] Hari Mankude commented on HDFS-2914: Hi Aaron, The issue I see is that even if this standby is made active later on, it will not exit out of the safemode unless user does the safemode leave. Do we want this behaviour? The other problem with this approach is that if nfs dir bounces even once, standby will go into safemode and this will happen silently without alerts. > HA: Standby stuck in safemode when shared edits directory is bounced > > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability
[ https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203121#comment-13203121 ] Eli Collins commented on HDFS-2362: --- Not for 23.1, which is getting cut soon. We'll merge the PB changes (Jitendra has a branch for this) and BR scalability changes when 23.1 has branched. > More Improvements on NameNode Scalability > - > > Key: HDFS-2362 > URL: https://issues.apache.org/jira/browse/HDFS-2362 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Hairong Kuang > > This jira acts as an umbrella jira to track all the improvements we've done > recently to improve Namenode's performance, responsiveness, and hence > scalability. Those improvements include: > 1. Incremental block reports (HDFS-395) > 2. BlockManager.reportDiff optimization for processing block reports > (HDFS-2477) > 3. Upgradable lock to allow simutaleous read operation while reportDiff is in > progress in processing block reports (HDFS-2490) > 4. More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks (HDFS-2476) > 5. Increase granularity of write operations in ReplicationMonitor thus > reducing contention for write lock (HDFS-2495) > 6. Support variable block sizes > 7. Release RPC handlers while waiting for edit log is synced to disk > 8. Reduce network traffic pressure to the master rack where NN is located by > lowering read priority of the replicas on the rack > 9. A standalone KeepAlive heartbeat thread > 10. Reduce Multiple traversals of path directory to one for most namespace > manipulations > 11. Move logging out of write lock section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2914: - Summary: HA: Standby stuck in safemode when shared edits directory is bounced (was: HA Standby stuck in safemode when shared edits directory is bounced) > HA: Standby stuck in safemode when shared edits directory is bounced > > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2914: - Issue Type: Sub-task (was: Bug) Parent: HDFS-1623 > HA Standby stuck in safemode when shared edits directory is bounced > --- > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203109#comment-13203109 ] Aaron T. Myers commented on HDFS-2914: -- Hey Hari, per the discussion on HDFS-1594, it is by design that the NN does not automatically leave SM even after resources become available again. In order to leave SM, the admin can run `hdfs dfsadmin -safemode leave', even while the NN is in the standby state. > HA Standby stuck in safemode when shared edits directory is bounced > --- > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203106#comment-13203106 ] Todd Lipcon commented on HDFS-2912: --- In log4j, LOG.fatal doesn't actually terminate the NN, but there should be a Runtime.exit() call following. Did we lose it somewhere along the line? > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced
[ https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203099#comment-13203099 ] Hari Mankude commented on HDFS-2914: When shared edits dir is bounced, df will return space of zero. Since shared is required dir, standby nn will enter into safe mode. 2012-02-08 01:08:19,850 WARN namenode.NameNodeResourceChecker (NameNodeResourceChecker.java:isResourceAvailable(89)) - Space available on volume 'nfs directory' is 0, which is below the configured reserved amount 104857600 2012-02-08 01:08:19,853 WARN namenode.FSNamesystem (FSNamesystem.java:run(3095)) - NameNode low on available disk space. Entering safe mode. The fix could be trivial enough to exit safe mode when shared resources become available for standby NN. > HA Standby stuck in safemode when shared edits directory is bounced > --- > > Key: HDFS-2914 > URL: https://issues.apache.org/jira/browse/HDFS-2914 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude > > When shared edits dir is bounced, standby NN is put into safemode by the > NameNodeResourceMonitor(). However, there is no path for it to exit out of > safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is failing
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Target Version/s: 0.24.0 > TestBackupNode is failing > - > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > Looks like it has been for a few days. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is racy
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Description: TestBackupNode#waitCheckpointDone can spuriously fail because of a race. (was: Looks like it has been for a few days.) Summary: TestBackupNode is racy (was: TestBackupNode is failing) > TestBackupNode is racy > -- > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > TestBackupNode#waitCheckpointDone can spuriously fail because of a race. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is failing
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Component/s: test > TestBackupNode is failing > - > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > Looks like it has been for a few days. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is failing
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Status: Patch Available (was: Reopened) > TestBackupNode is failing > - > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node, test >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > Looks like it has been for a few days. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2764) TestBackupNode is failing
[ https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2764: - Attachment: HDFS-2764.patch Here's a patch which addresses the issue. The trouble was that a helper method used by both failing tests had a race condition. In waitCheckpointDone, the test would just wait for the BN to get a particular fsimage snapshot, and then assert that the NN also had that fsimage snapshot, even though the BN might not have uploaded it back to the NN yet. While I was in this test class I also took the liberty of updating it to a JUnit 4-style test. I guess it was failing consistently on my box because it's an SSD, and things just move too damn fast. > TestBackupNode is failing > - > > Key: HDFS-2764 > URL: https://issues.apache.org/jira/browse/HDFS-2764 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2764.patch > > > Looks like it has been for a few days. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced
HA Standby stuck in safemode when shared edits directory is bounced --- Key: HDFS-2914 URL: https://issues.apache.org/jira/browse/HDFS-2914 Project: Hadoop HDFS Issue Type: Bug Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Hari Mankude When shared edits dir is bounced, standby NN is put into safemode by the NameNodeResourceMonitor(). However, there is no path for it to exit out of safe mode when shared edits dir reappears. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203092#comment-13203092 ] Bikas Saha commented on HDFS-2912: -- For some reason the LOG.FATAL statements is not terminating the NN in my case. Will look into it further. > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability
[ https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203089#comment-13203089 ] Uma Maheswara Rao G commented on HDFS-2362: --- Recently I remember Eli's and Dhruba's discussion on mailing list about merging this NN scalability issues to 0.23. Are we planning it for 0.23.1 release? > More Improvements on NameNode Scalability > - > > Key: HDFS-2362 > URL: https://issues.apache.org/jira/browse/HDFS-2362 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Hairong Kuang > > This jira acts as an umbrella jira to track all the improvements we've done > recently to improve Namenode's performance, responsiveness, and hence > scalability. Those improvements include: > 1. Incremental block reports (HDFS-395) > 2. BlockManager.reportDiff optimization for processing block reports > (HDFS-2477) > 3. Upgradable lock to allow simutaleous read operation while reportDiff is in > progress in processing block reports (HDFS-2490) > 4. More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks (HDFS-2476) > 5. Increase granularity of write operations in ReplicationMonitor thus > reducing contention for write lock (HDFS-2495) > 6. Support variable block sizes > 7. Release RPC handlers while waiting for edit log is synced to disk > 8. Reduce network traffic pressure to the master rack where NN is located by > lowering read priority of the replicas on the rack > 9. A standalone KeepAlive heartbeat thread > 10. Reduce Multiple traversals of path directory to one for most namespace > manipulations > 11. Move logging out of write lock section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203081#comment-13203081 ] Uma Maheswara Rao G commented on HDFS-2911: --- I too agree. Recently i have debugged many issues due to OOME in my clusters. for example: HADOOP-7916, HDFS-2850 > Gracefully handle OutOfMemoryErrors > --- > > Key: HDFS-2911 > URL: https://issues.apache.org/jira/browse/HDFS-2911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Eli Collins > > We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. > We should catch them in a high-level handler, cleanly fail the RPC (vs > sending back the OOM stackrace) or background thread, and shutdown the NN or > DN. Currently the process is left in a not well-test tested state > (continuously fails RPCs and internal threads, may or may not recover and > doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203064#comment-13203064 ] Eli Collins commented on HDFS-2911: --- HDFS isn't really an application. If we labor on subsequent failures can result in data loss. IMO it's better to failfast. > Gracefully handle OutOfMemoryErrors > --- > > Key: HDFS-2911 > URL: https://issues.apache.org/jira/browse/HDFS-2911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Eli Collins > > We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. > We should catch them in a high-level handler, cleanly fail the RPC (vs > sending back the OOM stackrace) or background thread, and shutdown the NN or > DN. Currently the process is left in a not well-test tested state > (continuously fails RPCs and internal threads, may or may not recover and > doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2819) Document new HA-related configs in hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2819: -- Attachment: hdfs-2819-ammend.txt Thanks for the review Suresh. Comments below and updated patch attached (hdfs-2819-ammend.txt) #1 Because it is the prefix for a key rather than a key itself (ie you can't use it by itself to lookup anything). This prefix plus a suffix (namespace ID) will result in a key that refers to a set of namesnodes. The naming is consistent with other variables that use _PREFIX. #2 "dfs.ha.namenodes" is the prefix for a given namservice, eg "dfs.ha.namenodes.EXAMPLENAMESERVICE". This description already says "contains a comma-separated list of namenodes", maybe you were thinking of another key? #3 Yes, empty values are parsed as null. Note that a value with whitespace is not, ie " " here would not be kosher. #4 I added them per Todd's request above, disagree w his thinking? #5 These values are used to set "ipc.client.connect.max.retries" and "ipc.client.connect.max.retries.on.timeouts" respectively for the failover rpc proxy. I updated the description with the rationale for the 0 default (failover effectively means the clients do retry). These are marked "Expert only" because we don't expect most users to modify them or need to understand them. #6 The base time is 500ms and we don't wait on the first retry so the sequence is 0, 1s, 2s, 4s, 8s, .. (up to 15 retries, the last base value caps at 8s, though note that the 5th to 15th values, like the others, will vary by +/- 50% each time, so could delay up to 12s). Make sense? #7 Not sure I follow, do you have a specific suggestion? I marked these as "Expert only" because we don't expect most users to modify or need to understand them. > Document new HA-related configs in hdfs-default.xml > --- > > Key: HDFS-2819 > URL: https://issues.apache.org/jira/browse/HDFS-2819 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Eli Collins > Attachments: hdfs-2819-ammend.txt, hdfs-2819.txt, hdfs-2819.txt, > hdfs-2819.txt > > > We've added a few configs, like shared edits dir, dfs.ha.namenodes, etc - we > should probably add these to hdfs-default.xml so they get documented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203037#comment-13203037 ] Tsz Wo (Nicholas), SZE commented on HDFS-2911: -- OutOfMemoryError is a subclass of Error which indicates serious problems that a reasonable application *should not try to catch* according to the [javadoc|http://docs.oracle.com/javase/6/docs/api/java/lang/Error.html]. It is hard to handle OutOfMemoryError. One problem is that there could be more OutOfMemoryErrors being thrown when handling the first OutOfMemoryError. > Gracefully handle OutOfMemoryErrors > --- > > Key: HDFS-2911 > URL: https://issues.apache.org/jira/browse/HDFS-2911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Eli Collins > > We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. > We should catch them in a high-level handler, cleanly fail the RPC (vs > sending back the OOM stackrace) or background thread, and shutdown the NN or > DN. Currently the process is left in a not well-test tested state > (continuously fails RPCs and internal threads, may or may not recover and > doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2905) HA: Standby NN NPE when shared edits dir is deleted
[ https://issues.apache.org/jira/browse/HDFS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203036#comment-13203036 ] Hari Mankude commented on HDFS-2905: Looks good. +1 from my side. > HA: Standby NN NPE when shared edits dir is deleted > --- > > Key: HDFS-2905 > URL: https://issues.apache.org/jira/browse/HDFS-2905 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: HDFS-2905.HDFS-1623.patch, HDFS-2905.HDFS-1623.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2913) HA: Need a way to shutdown the Name Node
[ https://issues.apache.org/jira/browse/HDFS-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2913: - Component/s: ha > HA: Need a way to shutdown the Name Node > > > Key: HDFS-2913 > URL: https://issues.apache.org/jira/browse/HDFS-2913 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > > Ideally, NameNode.stop() needs to be called because it will change the HA > state and shutdown all services. NameNode reference is not available > anywhere. Hence it is not possible to shutdown the name node gracefully. > A possible solution could be to have a Service interface that gets passed > down to components like FSNameSystem, via which they can inform the NameNode > about irrecoverable errors. NameNode could then decide to shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
[ https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2912: - Component/s: ha > HA: Namenode not shutting down when shared edits dir is inaccessible > > > Key: HDFS-2912 > URL: https://issues.apache.org/jira/browse/HDFS-2912 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > > When there is an error in shared edits dir then current policy requires the > active name node to abort and shutdown. > Currently there is no way to shut down the name node and hence this does not > happen even after all journals have been aborted on error. In fact the name > node stays Active and also is not in safe mode. Ideally it should shut down, > or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2579) Starting delegation token manager during safemode fails
[ https://issues.apache.org/jira/browse/HDFS-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203006#comment-13203006 ] Todd Lipcon commented on HDFS-2579: --- We've found one bug during stress testing - there's a super-rare race here if the secret manager happens to be calling logUpdateMasterKey exactly when the NN wants to stop the secret manager. The issue is that the "stopSecretManager" call is holding the FSNamesystem lock, but the secret manager thread is waiting on the same lock. The solution is to have the secret manager use lockInterruptibly instead. > Starting delegation token manager during safemode fails > --- > > Key: HDFS-2579 > URL: https://issues.apache.org/jira/browse/HDFS-2579 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node, security >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2579.txt, hdfs-2579.txt > > > I noticed this on the HA branch, but it seems to actually affect non-HA > branch 0.23 if security is enabled. When the NN starts up, if security is > enabled, we start the delegation token secret manager, which then tries to > call {{logUpdateMasterKey}}. This fails because the edit logs may not be > written while in safe-mode. > It seems to me that there's not any necessary reason that you have to make a > new master key at startup, since you've loaded the old key when you load the > FSImage. You'd only be lacking a DT master key on a fresh cluster, in which > case we could have it generate one at format time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2902) HA: Allow new shared edit logs dir to be configured while NN is running
[ https://issues.apache.org/jira/browse/HDFS-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203002#comment-13203002 ] Bikas Saha commented on HDFS-2902: -- Reading the code shows a possible inconsistency issue. FSImage.storage (an NNStorage object) manages the info about all storage dirs and records their health state. This includes edits and name dirs. FSEditLogs.journalSet manages the info about all the journals and each journal maintains its own reference to the StorageDirectory it is writing to. This storage directory is managed by FSImage.storage above. However, both these work independently. So marking a directory as bad in FSImage.storage does not really stop it from being written via a journal. And vice versa. > HA: Allow new shared edit logs dir to be configured while NN is running > --- > > Key: HDFS-2902 > URL: https://issues.apache.org/jira/browse/HDFS-2902 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2510) Add HA-related metrics
[ https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2510: - Attachment: HDFS-2510-HDFS-1623.patch Thanks a lot for the review, Todd. Here's a patch which addresses your feedback. > Add HA-related metrics > -- > > Key: HDFS-2510 > URL: https://issues.apache.org/jira/browse/HDFS-2510 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2510-HDFS-1623.patch, HDFS-2510.HDFS-1623.patch > > > Off the top of my head, I can think of: > NN metrics: > * A binary metric for active or standby > * The size of the pending DN message queues > * A timestamp for when the standby NN last read from shared edit log > * The difference between highest generation stamp seen from the shared edit > log and the highest generation stamp seen from any DN > It would probably also be useful to have a DN metric which somehow describes > which active/standby NNs its talking to, e.g. "times since last communicated > with standby/active NNs." > I'm sure there are others as well. Comments strongly encouraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll
[ https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203000#comment-13203000 ] Todd Lipcon commented on HDFS-2910: --- We should just do a hard exit here -- upon restart or failover, the new active NN will recover the logs. > HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir > is inaccessible during log roll > --- > > Key: HDFS-2910 > URL: https://issues.apache.org/jira/browse/HDFS-2910 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2913) HA: Need a way to shutdown the Name Node
[ https://issues.apache.org/jira/browse/HDFS-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202997#comment-13202997 ] Todd Lipcon commented on HDFS-2913: --- Currently it is meant to do a "fail fast" shutdown -- i.e System.exit(1) after logging a FATAL message. A graceful shutdown would be a nice optimization, but HDFS-2912 should be treated as a bug that the expected fail-fast behavior isn't being triggered. Doing a graceful shutdown after hitting an unknown state is likely to be non-trivial > HA: Need a way to shutdown the Name Node > > > Key: HDFS-2913 > URL: https://issues.apache.org/jira/browse/HDFS-2913 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > > Ideally, NameNode.stop() needs to be called because it will change the HA > state and shutdown all services. NameNode reference is not available > anywhere. Hence it is not possible to shutdown the name node gracefully. > A possible solution could be to have a Service interface that gets passed > down to components like FSNameSystem, via which they can inform the NameNode > about irrecoverable errors. NameNode could then decide to shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2909) HA: Inaccessible shared edits dir not getting removed from FSImage storage dirs upon error
[ https://issues.apache.org/jira/browse/HDFS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202990#comment-13202990 ] Bikas Saha commented on HDFS-2909: -- Aside from all the above I see some other issues. Say everything is healthy and FSImage.rollEditLogs() is called. It first calls FSEditLogs.rollLogs that actually rolls the logs. It then calls storage.writeTransactionIdFileToStorage() which records this in all storage dirs so that the information about the rolled edits is not lost. However, NN could crash in after FSEditLogs.rollLogs() has completed and before storage.writeTransactionIdFileToStorage() is called. That might leave the data in an inconsistent state. > HA: Inaccessible shared edits dir not getting removed from FSImage storage > dirs upon error > -- > > Key: HDFS-2909 > URL: https://issues.apache.org/jira/browse/HDFS-2909 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll
[ https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202991#comment-13202991 ] Bikas Saha commented on HDFS-2910: -- I think FSEditLog should not be starting a new segment when ending the last one failed. Specifically in this case, the failure should abortAllJournals and shutdown the HA NN. Even if we fix the NN shutdown case, this bug still needs to be fixed or else the edit logs will be left behind in an inconsistent state. > HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir > is inaccessible during log roll > --- > > Key: HDFS-2910 > URL: https://issues.apache.org/jira/browse/HDFS-2910 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Bikas Saha >Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2907) Make FSDataset in Datanode Pluggable
[ https://issues.apache.org/jira/browse/HDFS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2907: -- Target Version/s: 0.24.0 Fix Version/s: (was: 0.24.0) > Make FSDataset in Datanode Pluggable > > > Key: HDFS-2907 > URL: https://issues.apache.org/jira/browse/HDFS-2907 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Sanjay Radia >Assignee: Sanjay Radia >Priority: Minor > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2913) HA: Need a way to shutdown the Name Node
HA: Need a way to shutdown the Name Node Key: HDFS-2913 URL: https://issues.apache.org/jira/browse/HDFS-2913 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Bikas Saha Assignee: Bikas Saha Ideally, NameNode.stop() needs to be called because it will change the HA state and shutdown all services. NameNode reference is not available anywhere. Hence it is not possible to shutdown the name node gracefully. A possible solution could be to have a Service interface that gets passed down to components like FSNameSystem, via which they can inform the NameNode about irrecoverable errors. NameNode could then decide to shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible
HA: Namenode not shutting down when shared edits dir is inaccessible Key: HDFS-2912 URL: https://issues.apache.org/jira/browse/HDFS-2912 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Bikas Saha Assignee: Bikas Saha When there is an error in shared edits dir then current policy requires the active name node to abort and shutdown. Currently there is no way to shut down the name node and hence this does not happen even after all journals have been aborted on error. In fact the name node stays Active and also is not in safe mode. Ideally it should shut down, or at least go into safe mode or standby mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2510) Add HA-related metrics
[ https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202950#comment-13202950 ] Todd Lipcon commented on HDFS-2510: --- {code} + public long getMillisSinceLastLoadedEdits() { +if (haContext.getState().getServiceState() == HAServiceState.STANDBY) { {code} Does this code possibly get called early during start-up before the ha context state has been set? (ie before the first start*Service) - in EditLogTailer, the new javadoc is redundant - just keep the @return bit > Add HA-related metrics > -- > > Key: HDFS-2510 > URL: https://issues.apache.org/jira/browse/HDFS-2510 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-2510.HDFS-1623.patch > > > Off the top of my head, I can think of: > NN metrics: > * A binary metric for active or standby > * The size of the pending DN message queues > * A timestamp for when the standby NN last read from shared edit log > * The difference between highest generation stamp seen from the shared edit > log and the highest generation stamp seen from any DN > It would probably also be useful to have a DN metric which somehow describes > which active/standby NNs its talking to, e.g. "times since last communicated > with standby/active NNs." > I'm sure there are others as well. Comments strongly encouraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202933#comment-13202933 ] Shaneal Manek commented on HDFS-2911: - Incidentally, I worked with a jvmti agent a while ago that did a thread/heap dump on OOM. It was really useful for debugging. The license is compatible, so it may be worth scavenging some of that code/functionality - check it out if curious: https://github.com/Greplin/polarbear > Gracefully handle OutOfMemoryErrors > --- > > Key: HDFS-2911 > URL: https://issues.apache.org/jira/browse/HDFS-2911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 1.0.0 >Reporter: Eli Collins > > We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. > We should catch them in a high-level handler, cleanly fail the RPC (vs > sending back the OOM stackrace) or background thread, and shutdown the NN or > DN. Currently the process is left in a not well-test tested state > (continuously fails RPCs and internal threads, may or may not recover and > doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira