[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203335#comment-13203335
 ] 

Uma Maheswara Rao G commented on HDFS-2911:
---

Suresh, yes you are right.
Thinking again, how can we do this(fast fail) in client code? That will run 
along with the several kind of applications right. And that will be again upto 
user interest to fastfail on OOME or not. We will have ipc threads and streamer 
threads running at clinet side. am i missing?


> Gracefully handle OutOfMemoryErrors
> ---
>
> Key: HDFS-2911
> URL: https://issues.apache.org/jira/browse/HDFS-2911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Eli Collins
>
> We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
> We should catch them in a high-level handler, cleanly fail the RPC (vs 
> sending back the OOM stackrace) or background thread, and shutdown the NN or 
> DN. Currently the process is left in a not well-test tested state 
> (continuously fails RPCs and internal threads, may or may not recover and 
> doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-654) HDFS needs to support new rename introduced for FileContext

2012-02-07 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203329#comment-13203329
 ] 

Uma Maheswara Rao G commented on HDFS-654:
--

{quote}
The count changes when the destination is removed. 
FSDirectory.removeChild(dstInode) and FSNamesystem.removePathAndBlocks() 
decrements the total INode count and the number of blocks. Also the lease to 
the removed destination is also removed.
{quote}

Here in new rename api, we are removing the blocks and adding to invalidates. 
We did not synced the edit log before adding to invalidates. This can leads to 
miss the blocks, as i explained the scenario in HDFS-2815.
I did not verify this yet. Will file a separate JIRA, once i confirm this as a 
bug.

> HDFS needs to support new rename introduced for FileContext
> ---
>
> Key: HDFS-654
> URL: https://issues.apache.org/jira/browse/HDFS-654
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.21.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.21.0
>
> Attachments: HDFS-654.patch, hdfs-654.1.patch, hdfs-654.2.patch, 
> hdfs-654.3.patch, hdfs-654.5.patch, hdfs-654.5.patch, hdfs-654.7.patch, 
> hdfs-654.9.patch
>
>
> New rename functionality with different semantics to overwrite the existing 
> destination was introduced for use in FileContext. Currently the default 
> implementation in FileSystem is not atomic. This change implements atomic 
> rename operation for use by FileContext.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203313#comment-13203313
 ] 

Suresh Srinivas commented on HDFS-2911:
---

bq. that a reasonable application should not try to catch
Nicholas, I think what this means is, an application should not try to catch it 
for recovery purpose. I think failing fast instead of trying to recover seems 
like a reasonable choice.

@Uma
bq. I too agree. 
You are agreeing with Eli?

> Gracefully handle OutOfMemoryErrors
> ---
>
> Key: HDFS-2911
> URL: https://issues.apache.org/jira/browse/HDFS-2911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Eli Collins
>
> We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
> We should catch them in a high-level handler, cleanly fail the RPC (vs 
> sending back the OOM stackrace) or background thread, and shutdown the NN or 
> DN. Currently the process is left in a not well-test tested state 
> (continuously fails RPCs and internal threads, may or may not recover and 
> doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203308#comment-13203308
 ] 

Hudson commented on HDFS-2764:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1704 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1704/])
HDFS-2764. TestBackupNode is racy. Contributed by Aaron T. Myers.

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241780
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.24.0
>
> Attachments: HDFS-2764.patch, HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203293#comment-13203293
 ] 

Hudson commented on HDFS-2764:
--

Integrated in Hadoop-Common-trunk-Commit #1693 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1693/])
HDFS-2764. TestBackupNode is racy. Contributed by Aaron T. Myers.

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241780
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.24.0
>
> Attachments: HDFS-2764.patch, HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203292#comment-13203292
 ] 

Hudson commented on HDFS-2764:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1768 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1768/])
HDFS-2764. TestBackupNode is racy. Contributed by Aaron T. Myers.

atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241780
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java


> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.24.0
>
> Attachments: HDFS-2764.patch, HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Aaron T. Myers (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-2764.
--

   Resolution: Fixed
Fix Version/s: 0.24.0
 Hadoop Flags: Reviewed

I've just committed this to trunk.

> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.24.0
>
> Attachments: HDFS-2764.patch, HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Attachment: HDFS-2764.patch

Thanks a lot for the review, Eli. Here's a patch which adds the comment per 
your suggestion.

I'll commit this momentarily based on your +1 since it's just a comment change.

> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch, HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Status: Open  (was: Patch Available)

> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203281#comment-13203281
 ] 

Hudson commented on HDFS-2786:
--

Integrated in Hadoop-Mapreduce-0.23-Commit #524 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/524/])
Merged r1241766 from trunk for HDFS-2786.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241768
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2579) Starting delegation token manager during safemode fails

2012-02-07 Thread Jitendra Nath Pandey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203271#comment-13203271
 ] 

Jitendra Nath Pandey commented on HDFS-2579:


bq. The issue is that the "stopSecretManager" call is holding the FSNamesystem 
lock, but the secret manager thread is waiting on the same lock.

Another possible approach: Secret manager acquires namesystem write lock using 
tryLock with a timeout, in a loop and checks the "running" flag before 
attempting tryLock. Since it is not a deadlock situation, stopSecretManager 
will be able to mark running as false.

> Starting delegation token manager during safemode fails
> ---
>
> Key: HDFS-2579
> URL: https://issues.apache.org/jira/browse/HDFS-2579
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node, security
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2579.txt, hdfs-2579.txt, hdfs-2579.txt
>
>
> I noticed this on the HA branch, but it seems to actually affect non-HA 
> branch 0.23 if security is enabled. When the NN starts up, if security is 
> enabled, we start the delegation token secret manager, which then tries to 
> call {{logUpdateMasterKey}}. This fails because the edit logs may not be 
> written while in safe-mode.
> It seems to me that there's not any necessary reason that you have to make a 
> new master key at startup, since you've loaded the old key when you load the 
> FSImage. You'd only be lacking a DT master key on a fresh cluster, in which 
> case we could have it generate one at format time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203270#comment-13203270
 ] 

Hudson commented on HDFS-2786:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1703 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1703/])
HDFS-2786. Fix host-based token incompatibilities in DFSUtil. Contributed 
by Kihwal Lee.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241766
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Jitendra Nath Pandey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203265#comment-13203265
 ] 

Jitendra Nath Pandey commented on HDFS-2914:


Standby doesn't need to enter safe mode because it is not writing any 
transactions anyway. When it transitions to active, that's when a check for 
available resources to write logs should be performed.

> HA: Standby stuck in safemode when shared edits directory is bounced
> 
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>Assignee: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203261#comment-13203261
 ] 

Hudson commented on HDFS-2786:
--

Integrated in Hadoop-Common-0.23-Commit #520 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/520/])
Merged r1241766 from trunk for HDFS-2786.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241768
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203257#comment-13203257
 ] 

Hudson commented on HDFS-2786:
--

Integrated in Hadoop-Hdfs-0.23-Commit #509 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/509/])
Merged r1241766 from trunk for HDFS-2786.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241768
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203251#comment-13203251
 ] 

Hudson commented on HDFS-2786:
--

Integrated in Hadoop-Common-trunk-Commit #1692 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1692/])
HDFS-2786. Fix host-based token incompatibilities in DFSUtil. Contributed 
by Kihwal Lee.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241766
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203250#comment-13203250
 ] 

Hudson commented on HDFS-2786:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1767 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1767/])
HDFS-2786. Fix host-based token incompatibilities in DFSUtil. Contributed 
by Kihwal Lee.

jitendra : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241766
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java


> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Jitendra Nath Pandey (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-2786:
---

  Resolution: Fixed
   Fix Version/s: 0.23.1
  0.24.0
Target Version/s: 0.24.0, 0.23.1  (was: 0.23.1, 0.24.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks to Kihwal.

> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2887) Define a FSVolume interface

2012-02-07 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203243#comment-13203243
 ] 

Hadoop QA commented on HDFS-2887:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12513748/h2887_20120207.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1854//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1854//console

This message is automatically generated.

> Define a FSVolume interface
> ---
>
> Key: HDFS-2887
> URL: https://issues.apache.org/jira/browse/HDFS-2887
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h2887_20120203.patch, h2887_20120207.patch
>
>
> FSVolume is an inner class in FSDataset.  It is actually a part of the 
> implementation of FSDatasetInterface.  It is better to define a new 
> interface, namely FSVolumeInterface, to capture the abstraction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2786) Fix host-based token incompatibilities in DFSUtil

2012-02-07 Thread Jitendra Nath Pandey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203241#comment-13203241
 ] 

Jitendra Nath Pandey commented on HDFS-2786:


+1. lgtm

> Fix host-based token incompatibilities in DFSUtil
> -
>
> Key: HDFS-2786
> URL: https://issues.apache.org/jira/browse/HDFS-2786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node, security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
> Attachments: hdfs-2786.patch, hdfs-2786.patch
>
>
> DFSUtil introduces new static methods that duplicate functionality in 
> NetUtils.  These new methods lack the logic necessary for host-based tokens 
> to work.  After speaking with Suresh, the approach being taken is:
> * DFSUtil.getSocketAddress will be removed.  Callers will be reverted to 
> using the NetUtils version.
> * DFSUtil.getDFSClient will changed to take accept a uri/host:port string 
> instead of an InetSocketAddress.  The method will internal call 
> NetUtils.createSocketAddr. This alleviates the callers from being required to 
> call NetUtils.createSocketAddr and reduce the opportunity for error that will 
> break host-based tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203229#comment-13203229
 ] 

Hudson commented on HDFS-2572:
--

Integrated in Hadoop-Mapreduce-0.23-Commit #522 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/522/])
HDFS-2572. Removed since it's only committed to trunk, not 0.23.0.

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241747
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203227#comment-13203227
 ] 

Hudson commented on HDFS-2572:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1701 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1701/])
HDFS-2572. Moved to trunk section from 0.23.1

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241746
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2905) HA: Standby NN NPE when shared edits dir is deleted

2012-02-07 Thread Jitendra Nath Pandey (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203220#comment-13203220
 ] 

Jitendra Nath Pandey commented on HDFS-2905:


+1. I have committed this. Thanks to Bikas.

> HA: Standby NN NPE when shared edits dir is deleted
> ---
>
> Key: HDFS-2905
> URL: https://issues.apache.org/jira/browse/HDFS-2905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2905.HDFS-1623.patch, HDFS-2905.HDFS-1623.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2905) HA: Standby NN NPE when shared edits dir is deleted

2012-02-07 Thread Jitendra Nath Pandey (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey resolved HDFS-2905.


  Resolution: Fixed
Hadoop Flags: Reviewed

> HA: Standby NN NPE when shared edits dir is deleted
> ---
>
> Key: HDFS-2905
> URL: https://issues.apache.org/jira/browse/HDFS-2905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2905.HDFS-1623.patch, HDFS-2905.HDFS-1623.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203217#comment-13203217
 ] 

Todd Lipcon commented on HDFS-2912:
---

I think the issue is this -- previously the abort logic was to only do 
Runtime.exit(1) when a _sync_ fails. We figured this was sufficient since it 
guards against data loss. But, as you've pointed out in the JIRAs today, there 
are some other cases where we should abort to avoid getting into an 
inconsistent state.

The old code (which is verified by the tests Aaron mentioned above -- look for 
mock(Runtime.class) ) does the abort by catching the IOException thrown by 
mapJournalsAndReportErrors and aborting at that point. The particular call site 
is logSync() in FSEditLog. So we either need to do as you did (and abort from 
mapJournalsAndReportErrors itself) or change _all_ of the call sites to do the 
abort in case an exception is thrown.

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2909.HDFS-1623.patch
>
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203210#comment-13203210
 ] 

Bikas Saha commented on HDFS-2912:
--

Could you please point me to the test that verifies the LOG.Fatal section that 
was added to JournalSet.mapJournalsAndReportErrors()?
I should ideally be modifying that test to verify the new change to that piece 
of code.


> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2909.HDFS-1623.patch
>
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2579) Starting delegation token manager during safemode fails

2012-02-07 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2579:
--

Attachment: hdfs-2579.txt

The solution to the above problem turned out to be a little more complicated.
The issue is that, once I just made it use lockInterruptibly, I ran into 
another race where the thread would get interrupted just before logSync() was 
called. If you interrupt a thread while it's in this critical edit log code, it 
can actually abort the whole NN.

So, I had to add some locking around the interrupt to ensure that the DTSM 
thread doesn't get interrupted during logsync, etc.

> Starting delegation token manager during safemode fails
> ---
>
> Key: HDFS-2579
> URL: https://issues.apache.org/jira/browse/HDFS-2579
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node, security
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2579.txt, hdfs-2579.txt, hdfs-2579.txt
>
>
> I noticed this on the HA branch, but it seems to actually affect non-HA 
> branch 0.23 if security is enabled. When the NN starts up, if security is 
> enabled, we start the delegation token secret manager, which then tries to 
> call {{logUpdateMasterKey}}. This fails because the edit logs may not be 
> written while in safe-mode.
> It seems to me that there's not any necessary reason that you have to make a 
> new master key at startup, since you've loaded the old key when you load the 
> FSImage. You'd only be lacking a DT master key on a fresh cluster, in which 
> case we could have it generate one at format time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203208#comment-13203208
 ] 

Eli Collins commented on HDFS-2764:
---

+1  nice find.

I'd add a comment like the following:
{code}
// The checkpoint is not done until the nn has received it from the bn
thisCheckpointTxId = cluster.getNameNode().getFSImage().getStorage()
{code}


> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2614) hadoop dist tarball is missing hdfs headers

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2614:
--

Affects Version/s: (was: 0.24.0)

> hadoop dist tarball is missing hdfs headers
> ---
>
> Key: HDFS-2614
> URL: https://issues.apache.org/jira/browse/HDFS-2614
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.23.1
>Reporter: Bruno Mahé
>Assignee: Alejandro Abdelnur
>  Labels: bigtop
> Fix For: 0.23.1
>
> Attachments: HDFS-2614.patch
>
>
> It would be nice to provide hdfs header so one could easily write programs to 
> be linked against that library and access HDFS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203203#comment-13203203
 ] 

Hudson commented on HDFS-2572:
--

Integrated in Hadoop-Hdfs-0.23-Commit #507 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/507/])
HDFS-2572. Removed since it's only committed to trunk, not 0.23.0.

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241747
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2893) The start/stop scripts don't start/stop the 2NN when using the default configuration

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2893:


Fix Version/s: 0.23.1

> The start/stop scripts don't start/stop the 2NN when using the default 
> configuration
> 
>
> Key: HDFS-2893
> URL: https://issues.apache.org/jira/browse/HDFS-2893
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.1
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Fix For: 0.23.1
>
> Attachments: hdfs-2893.txt
>
>
> HDFS-1703 changed the behavior of the start/stop scripts so that the masters 
> file is no longer used to indicate which hosts to start the 2NN on. The 2NN 
> is now started, when using start-dfs.sh, on hosts only when 
> dfs.namenode.secondary.http-address is configured with a non-wildcard IP. 
> This means you can not start a NN using an http-address specified using a 
> wildcard IP. We should allow a 2NN to be started with the default config, ie 
> start-dfs.sh should start a NN, 2NN and DN. The packaging already works this 
> way (it doesn't use start-dfs.sh, it uses hadoop-daemon.sh directly w/o first 
> checking getconf) so let's bring start-dfs.sh in line with this behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2886) CreateEditLogs should generate a realistic edit log.

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2886:


Target Version/s: 0.24.0, 0.23.1, 0.22.1  (was: 0.22.1, 0.23.1, 0.24.0)
   Fix Version/s: (was: 0.23.1)
  (was: 0.24.0)

> CreateEditLogs should generate a realistic edit log.
> 
>
> Key: HDFS-2886
> URL: https://issues.apache.org/jira/browse/HDFS-2886
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: 0.24.0, 0.23.1, 0.22.1
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.1
>
> Attachments: createLog-0.22.patch, createLog-trunk.patch
>
>
> CreateEditsLog generates non-standard transactions. In real life first 
> transaction that creates a file does not contain blocks. While CreateEditsLog 
> adds blocks to this transaction. Change CreateEditsLog to produce real-life 
> transaction. 
> Also cleanup unused parameters for {{FSDirectory.updateFile()}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2877) If locking of a storage dir fails, it will remove the other NN's lock file on exit

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2877:


Target Version/s: 0.23.1, 1.1.0, 0.22.1  (was: 0.22.1, 1.1.0, 0.23.1)
   Fix Version/s: (was: 0.23.1)
  (was: 0.24.0)

> If locking of a storage dir fails, it will remove the other NN's lock file on 
> exit
> --
>
> Key: HDFS-2877
> URL: https://issues.apache.org/jira/browse/HDFS-2877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0, 0.24.0, 1.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 1.1.0, 0.22.1
>
> Attachments: hdfs-2877.txt
>
>
> In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of 
> whether we successfully lock the directory. So, if another NN has the 
> directory locked, then we'll fail to lock it the first time we start another 
> NN. But our failed start attempt will still remove the other NN's lockfile, 
> and a second attempt will erroneously start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203194#comment-13203194
 ] 

Hudson commented on HDFS-2572:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1765 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1765/])
HDFS-2572. Moved to trunk section from 0.23.1

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241746
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2718) Optimize OP_ADD in edits loading

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2718:


Target Version/s: 0.24.0, 0.23.1, 0.22.1  (was: 0.22.1, 0.23.1, 0.24.0)
   Fix Version/s: (was: 0.23.1)
  (was: 0.24.0)

> Optimize OP_ADD in edits loading
> 
>
> Key: HDFS-2718
> URL: https://issues.apache.org/jira/browse/HDFS-2718
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0, 0.24.0, 1.0.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.22.1
>
> Attachments: editsLoader-0.22.patch, editsLoader-0.22.patch, 
> editsLoader-0.22.patch, editsLoader-trunk.patch, editsLoader-trunk.patch, 
> editsLoader-trunk.patch, editsLoader-trunk.patch
>
>
> During loading the edits journal FSEditLog.loadEditRecords() processes OP_ADD 
> inefficiently. It first removes the existing INodeFile from the directory 
> tree, then adds it back as a regular INodeFile, and then replaces it with 
> INodeFileUnderConstruction if files is not closed. This slows down edits 
> loading. OP_ADD should be done in one shot and retain previously existing 
> data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2887) Define a FSVolume interface

2012-02-07 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2887:
-

Attachment: h2887_20120207.patch

h2887_20120207.patch:
- moves the methods in BlockPoolSliceInterface to FSVolumeInterface so that 
BlockPoolSliceInterface becomes unnecessary;
- moves the static utility methods from FSDataset to DatanodeUtil;

> Define a FSVolume interface
> ---
>
> Key: HDFS-2887
> URL: https://issues.apache.org/jira/browse/HDFS-2887
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h2887_20120203.patch, h2887_20120207.patch
>
>
> FSVolume is an inner class in FSDataset.  It is actually a part of the 
> implementation of FSDatasetInterface.  It is better to define a new 
> interface, namely FSVolumeInterface, to capture the abstraction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2707) HttpFS should read the hadoop-auth secret from a file instead inline from the configuration

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2707:


Fix Version/s: (was: 0.24.0)
   0.23.1

> HttpFS should read the hadoop-auth secret from a file instead inline from the 
> configuration
> ---
>
> Key: HDFS-2707
> URL: https://issues.apache.org/jira/browse/HDFS-2707
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 0.23.1
>
> Attachments: HDFS-2707.patch, HDFS-2707.patch
>
>
> Similar to HADOOP-7621, the secret should be in a file other than the 
> configuration file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2857) Cleanup BlockInfo class

2012-02-07 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203189#comment-13203189
 ] 

Suresh Srinivas commented on HDFS-2857:
---

Given that this patch is not a straight forward port, I will not commit this to 
0.23

> Cleanup BlockInfo class
> ---
>
> Key: HDFS-2857
> URL: https://issues.apache.org/jira/browse/HDFS-2857
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.24.0
>
> Attachments: HDFS-2857.23.txt, HDFS-2857.txt
>
>
> Following are some of the cleanup required:
> # Remove unnecessary methods
> # Add interface annotation
> # Make some of the method private

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203188#comment-13203188
 ] 

Hudson commented on HDFS-2572:
--

Integrated in Hadoop-Common-0.23-Commit #518 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/518/])
HDFS-2572. Removed since it's only committed to trunk, not 0.23.0.

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241747
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203183#comment-13203183
 ] 

Aaron T. Myers commented on HDFS-2912:
--

bq. Since the patch calls Runtime.exit(1) I dont know of any way to test it 
other than the manual test.

There are several tests around which stub in mock Runtime objects so the 
Runtime.exit(...) doesn't actually cause a JVM exit. These tests then verify 
that Runtime.exit(...) was called the appropriate number of times.

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2909.HDFS-1623.patch
>
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203185#comment-13203185
 ] 

Hudson commented on HDFS-2572:
--

Integrated in Hadoop-Common-trunk-Commit #1690 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1690/])
HDFS-2572. Moved to trunk section from 0.23.1

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241746
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2676) Remove Avro RPC

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2676:


Fix Version/s: (was: 0.23.1)

> Remove Avro RPC
> ---
>
> Key: HDFS-2676
> URL: https://issues.apache.org/jira/browse/HDFS-2676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.1
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.24.0
>
> Attachments: HDFS-2676.txt, HDFS-2676.txt, HDFS-2676.txt
>
>
> Please see the discussion in HDFS-2660 for more details. I have created a 
> branch HADOOP-6659 to save the Avro work, if in the future some one wants to 
> use the work that existed to add support for Avro RPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2788) HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2788:
--

Fix Version/s: 0.23.1

> HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code
> -
>
> Key: HDFS-2788
> URL: https://issues.apache.org/jira/browse/HDFS-2788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.23.1
>
> Attachments: hdfs-2788.txt
>
>
> HDFS-941 introduced HdfsServerConstants#DN_KEEPALIVE_TIMEOUT but its never 
> used. Perhaps was renamed to 
> DFSConfigKeys#DFS_DATANODE_SOCKET_REUSE_KEEPALIVE_DEFAULT while the patch was 
> written and the old one wasn't deleted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203179#comment-13203179
 ] 

Hadoop QA commented on HDFS-2764:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12513738/HDFS-2764.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1853//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1853//console

This message is automatically generated.

> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2596) TestDirectoryScanner doesn't test parallel scans

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2596:


Fix Version/s: 0.23.1

> TestDirectoryScanner doesn't test parallel scans
> 
>
> Key: HDFS-2596
> URL: https://issues.apache.org/jira/browse/HDFS-2596
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, test
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.23.1
>
> Attachments: hdfs-2596-1.patch
>
>
> The code from HDFS-854 below doesn't run the test with parallel scanning. 
> They probably intended "parallelism < 3".
> {code}
> +  public void testDirectoryScanner() throws Exception {
> +// Run the test with and without parallel scanning
> +for (int parallelism = 1; parallelism < 2; parallelism++) {
> +  runTest(parallelism);
> +}
> +  }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Hari Mankude (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Mankude reassigned HDFS-2914:
--

Assignee: Hari Mankude

> HA: Standby stuck in safemode when shared edits directory is bounced
> 
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>Assignee: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2572) Unnecessary double-check in DN#getHostName

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2572:
--

Fix Version/s: 0.23.1

> Unnecessary double-check in DN#getHostName
> --
>
> Key: HDFS-2572
> URL: https://issues.apache.org/jira/browse/HDFS-2572
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2572.patch, HDFS-2572.patch
>
>
> We do a double config.get unnecessarily inside DN#getHostName(...). Can be 
> removed by this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203173#comment-13203173
 ] 

Bikas Saha commented on HDFS-2910:
--

Sure. Perhaps that work would resolve this JIRA too.

> HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir 
> is inaccessible during log roll
> ---
>
> Key: HDFS-2910
> URL: https://issues.apache.org/jira/browse/HDFS-2910
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203171#comment-13203171
 ] 

Todd Lipcon commented on HDFS-2910:
---

In order to make the NN ride over a hiccup, it seems the solution is to add a 
more resilient JournalSet implementation -- ie either one that operates over a 
quorum of shared dirs, or one which has a more stubborn retry policy. Given 
that NFS itself already has built in retries and can be configured to arbitrary 
timeouts, it doesn't seem like we should worry about short hiccups -- any 
outage that makes it past the configured NFS retry/timeouts is likely to be 
worth causing a failover IMO.

> HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir 
> is inaccessible during log roll
> ---
>
> Key: HDFS-2910
> URL: https://issues.apache.org/jira/browse/HDFS-2910
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2654:
--

Target Version/s: 0.23.1, 1.1.0  (was: 1.1.0, 0.23.1)
   Fix Version/s: 0.23.1
  0.24.0

> Make BlockReaderLocal not extend RemoteBlockReader2
> ---
>
> Key: HDFS-2654
> URL: https://issues.apache.org/jira/browse/HDFS-2654
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.1, 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, 
> hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, 
> hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, 
> hdfs-2654-b1-4.patch
>
>
> The BlockReaderLocal code paths are easier to understand (especially true on 
> branch-1 where BlockReaderLocal inherits code from BlockerReader and 
> FSInputChecker) if the local and remote block reader implementations are 
> independent, and they're not really sharing much code anyway. If for some 
> reason they start to share significant code we can make the BlockReader 
> interface an abstract class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2539:


Fix Version/s: (was: 0.23.1)
   (was: 0.24.0)
   0.23.0

> Support doAs and GETHOMEDIRECTORY in webhdfs
> 
>
> Key: HDFS-2539
> URL: https://issues.apache.org/jira/browse/HDFS-2539
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0, 1.0.0
>
> Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, 
> h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, 
> h2539_2009b.patch, h2539_2009b_0.20s.patch, h2539_2009c.patch, 
> h2539_2009c_0.20s.patch, h2539_2010.patch, 
> h2539_2010_0.20s.patch, h2539_2010b.patch, h2539_2010b_0.20s.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2540) Change WebHdfsFileSystem to two-step create/append

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2540:


Fix Version/s: (was: 0.23.1)
   (was: 0.24.0)
   0.23.0

> Change WebHdfsFileSystem to two-step create/append
> --
>
> Key: HDFS-2540
> URL: https://issues.apache.org/jira/browse/HDFS-2540
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0, 1.0.0
>
> Attachments: h2540_2007.patch, h2540_2007_0.20s.patch, 
> h2540_2008.patch, h2540_2008_0.20s.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203169#comment-13203169
 ] 

Uma Maheswara Rao G commented on HDFS-2914:
---

{quote}
I could probably be persuaded that the NN should leave SM automatically once 
resources become available again, as long the implementation includes some 
measure(s) to prevent the NN from flapping in/out of SM if the free space is 
hovering near the threshold. Something like "leave SM automatically only if 
free space is now well above what is required, and only if it's been like that 
for several minutes."
{quote}
Yes, this sounds good. As NameNodeResourceChecker moved the system into 
safemode on some condition, should be its responsibility to take out of the 
safemode whenever system is out of that condition.

> HA: Standby stuck in safemode when shared edits directory is bounced
> 
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2528) webhdfs rest call to a secure dn fails when a token is sent

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2528:


Fix Version/s: (was: 0.23.1)
   (was: 0.24.0)
   0.23.0

> webhdfs rest call to a secure dn fails when a token is sent
> ---
>
> Key: HDFS-2528
> URL: https://issues.apache.org/jira/browse/HDFS-2528
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 0.20.205.0
>Reporter: Arpit Gupta
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0, 1.0.0
>
> Attachments: h2528_2001.patch, h2528_2001_0.20s.patch, 
> h2528_2001b.patch, h2528_2001b_0.20s.patch, h2528_2002.patch, 
> h2528_2002_0.20s.patch, h2528_2003.patch, h2528_2003_0.20s.patch, 
> h2528_2003_0.20s.patch
>
>
> curl -L -u : --negotiate -i 
> "http://NN:50070/webhdfs/v1/tmp/webhdfs_data/file_small_data.txt?op=OPEN";
> the following exception is thrown by the datanode when the redirect happens.
> {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"Call
>  to  failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]"}}
> Interestingly when using ./bin/hadoop with a webhdfs path we are able to cat 
> or tail a file successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2527) Remove the use of Range header from webhdfs

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2527:


Fix Version/s: (was: 0.23.1)
   (was: 0.24.0)
   0.23.0

> Remove the use of Range header from webhdfs
> ---
>
> Key: HDFS-2527
> URL: https://issues.apache.org/jira/browse/HDFS-2527
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0, 1.0.0
>
> Attachments: h2527_2001b_0.20s.patch, h2527_2002.patch, 
> h2527_2002_0.20s.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Bikas Saha (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated HDFS-2912:
-

Attachment: HDFS-2909.HDFS-1623.patch

Attached patch that implements the changed proposed in the previous comment.
Since the patch calls Runtime.exit(1) I dont know of any way to test it other 
than the manual test.

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2909.HDFS-1623.patch
>
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2416) distcp with a webhdfs uri on a secure cluster fails

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2416:


Fix Version/s: (was: 0.23.1)
   (was: 0.24.0)
   0.23.0

> distcp with a webhdfs uri on a secure cluster fails
> ---
>
> Key: HDFS-2416
> URL: https://issues.apache.org/jira/browse/HDFS-2416
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 0.20.205.0
>Reporter: Arpit Gupta
>Assignee: Jitendra Nath Pandey
> Fix For: 0.23.0, 1.0.0
>
> Attachments: HDFS-2416-branch-0.20-security.6.patch, 
> HDFS-2416-branch-0.20-security.7.patch, 
> HDFS-2416-branch-0.20-security.8.patch, HDFS-2416-branch-0.20-security.patch, 
> HDFS-2416-trunk.patch, HDFS-2416-trunk.patch, 
> HDFS-2419-branch-0.20-security.patch, HDFS-2419-branch-0.20-security.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2397) Undeprecate SecondaryNameNode

2012-02-07 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-2397:


Fix Version/s: 0.23.1

> Undeprecate SecondaryNameNode
> -
>
> Key: HDFS-2397
> URL: https://issues.apache.org/jira/browse/HDFS-2397
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Eli Collins
> Fix For: 0.23.1
>
> Attachments: hdfs-2397.txt, hdfs-2397.txt, hdfs-2397.txt, 
> hdfs-2397.txt
>
>
> I would like to consider un-deprecating the SecondaryNameNode for 0.23, and 
> amending the documentation to indicate that it is still the most trust-worthy 
> way to run checkpoints, and while CN/BN may have some advantages, they're not 
> battle hardened as of yet. The test coverage for the 2NN is far superior to 
> the CheckpointNode or BackupNode, and people have a lot more production 
> experience. Indicating that it is deprecated before we have expanded test 
> coverage of the CN/BN won't send the right message to our users. (For 
> comparison, look at what a mess we got into by prematurely deprecating the 
> "old" MR API before the "new" API had feature parity and a few versions of 
> bug fixes).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2570) Add descriptions for dfs.*.https.address in hdfs-default.xml

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2570:
--

Fix Version/s: 0.24.0

> Add descriptions for dfs.*.https.address in hdfs-default.xml
> 
>
> Key: HDFS-2570
> URL: https://issues.apache.org/jira/browse/HDFS-2570
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.23.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: hdfs-2570-1.patch, hdfs-2570-2.patch
>
>
> Let's add descriptions for dfs.*.https.address in hdfs-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203160#comment-13203160
 ] 

Bikas Saha commented on HDFS-2910:
--

That is for the current policy of shutting down the NN on such errors. But if 
the NN continues to be active for short transient shared dir hiccups then this 
needs to be fixed. So I will let this JIRA remain active.

> HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir 
> is inaccessible during log roll
> ---
>
> Key: HDFS-2910
> URL: https://issues.apache.org/jira/browse/HDFS-2910
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203157#comment-13203157
 ] 

Bikas Saha commented on HDFS-2912:
--

>From what I read of the code, for some of the cases (such as a flush of logs) 
>where the NN actually dies on shared dir hiccups the runtime.exit() call was 
>not added in the HA context. It was added when JournalSet was added by 
>Jitendra long ago.

In any case, I would ideally like to have a cleaner shutdown mechanism to make 
sure that exit(1) do not proliferate in hard to find ways. Will let 
[HDFS-2913|https://issues.apache.org/jira/browse/HDFS-2913] track that.

For now, I will add an exit(1) after the LOG.FATAL in 
JournalSet.mapJournalsAndReportErrors(). This is the common code path through 
which all journal operations go through (roll edit logs, flush etc). So putting 
one here should hopefully catch all journal related cases.

 

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2568) Use a set to manage child sockets in XceiverServer

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2568:
--

Fix Version/s: 0.24.0

> Use a set to manage child sockets in XceiverServer
> --
>
> Key: HDFS-2568
> URL: https://issues.apache.org/jira/browse/HDFS-2568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.24.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2568.patch, HDFS-2568.patch
>
>
> Found while reading up for HDFS-2454, currently we maintain childSockets in a 
> DataXceiverServer as a Map. This can very well be a 
> Set data structure -- since the goal is easy removals.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2543) HADOOP_PREFIX cannot be overriden

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2543:
--

Fix Version/s: 0.24.0

> HADOOP_PREFIX cannot be overriden
> -
>
> Key: HDFS-2543
> URL: https://issues.apache.org/jira/browse/HDFS-2543
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.23.0
>Reporter: Bruno Mahé
>Assignee: Bruno Mahé
>  Labels: bigtop
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2543.patch
>
>
> hadoop-config.sh forces HADOOP_prefix to a specific value:
> export HADOOP_PREFIX=`dirname "$this"`/..
> It would be nice to make this overridable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203135#comment-13203135
 ] 

Aaron T. Myers commented on HDFS-2914:
--

bq. The issue I see is that even if this standby is made active later on, it 
will not exit out of the safemode unless user does the safemode leave. Do we 
want this behaviour?

I think we probably do. If the NFS mount is flaky, we've got bigger problems 
than just the NN being moved into SM.

bq. The other problem with this approach is that if nfs dir bounces even once, 
standby will go into safemode and this will happen silently without alerts.

I guess the admin should configure some alerts for the NN being in SM, then. :)

But regardless, I could probably be persuaded that the NN should leave SM 
automatically once resources become available again, as long the implementation 
includes some measure(s) to prevent the NN from flapping in/out of SM if the 
free space is hovering near the threshold. Something like "leave SM 
automatically only if free space is now well above what is required, and only 
if it's been like that for several minutes." Such a change would not be 
specific to the HA branch, however, and should probably be done on trunk.

> HA: Standby stuck in safemode when shared edits directory is bounced
> 
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability

2012-02-07 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203133#comment-13203133
 ] 

Uma Maheswara Rao G commented on HDFS-2362:
---

Ok, Thanks Eli.

> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports (HDFS-2490)
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock (HDFS-2495)
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2594) webhdfs HTTP API should implement getDelegationTokens() instead getDelegationToken()

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2594:
--

Fix Version/s: 0.24.0

> webhdfs HTTP API should implement getDelegationTokens() instead 
> getDelegationToken()
> 
>
> Key: HDFS-2594
> URL: https://issues.apache.org/jira/browse/HDFS-2594
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.23.1
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Critical
> Fix For: 0.24.0, 0.23.1
>
> Attachments: h2594_2030.patch, h2594_2030_no_apt.patch, 
> h2594_20111201.patch
>
>
> The current API returns a single delegation token, that method from the 
> FileSystem API is deprecated in favor of the one that returns a list of 
> tokens. The HTTP API should implement the new/undeprecated signature 
> getDelegationTokens().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Hari Mankude (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203124#comment-13203124
 ] 

Hari Mankude commented on HDFS-2914:


Hi Aaron, 
The issue I see is that even if this standby is made active later on, it will 
not exit out of the safemode unless user does the safemode leave. Do we want 
this behaviour? The other problem with this approach is that if nfs dir bounces 
even once, standby will go into safemode and this will happen silently without 
alerts. 

> HA: Standby stuck in safemode when shared edits directory is bounced
> 
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability

2012-02-07 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203121#comment-13203121
 ] 

Eli Collins commented on HDFS-2362:
---

Not for 23.1, which is getting cut soon. We'll merge the PB changes (Jitendra 
has a branch for this) and BR scalability changes when 23.1 has branched. 

> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports (HDFS-2490)
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock (HDFS-2495)
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2914) HA: Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2914:
-

Summary: HA: Standby stuck in safemode when shared edits directory is 
bounced  (was: HA Standby stuck in safemode when shared edits directory is 
bounced)

> HA: Standby stuck in safemode when shared edits directory is bounced
> 
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2914:
-

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-1623

> HA Standby stuck in safemode when shared edits directory is bounced
> ---
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203109#comment-13203109
 ] 

Aaron T. Myers commented on HDFS-2914:
--

Hey Hari, per the discussion on HDFS-1594, it is by design that the NN does not 
automatically leave SM even after resources become available again. In order to 
leave SM, the admin can run `hdfs dfsadmin -safemode leave', even while the NN 
is in the standby state.

> HA Standby stuck in safemode when shared edits directory is bounced
> ---
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203106#comment-13203106
 ] 

Todd Lipcon commented on HDFS-2912:
---

In log4j, LOG.fatal doesn't actually terminate the NN, but there should be a 
Runtime.exit() call following. Did we lose it somewhere along the line?

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Hari Mankude (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203099#comment-13203099
 ] 

Hari Mankude commented on HDFS-2914:


When shared edits dir is bounced, df will return space of zero. Since shared is 
required dir, standby nn will enter into safe mode. 

2012-02-08 01:08:19,850 WARN  namenode.NameNodeResourceChecker 
(NameNodeResourceChecker.java:isResourceAvailable(89)) - Space available on 
volume 'nfs directory' is 0, which is below the configured reserved amount 
104857600
2012-02-08 01:08:19,853 WARN  namenode.FSNamesystem 
(FSNamesystem.java:run(3095)) - NameNode low on available disk space. Entering 
safe mode.

The fix could be trivial enough to exit safe mode when shared resources become 
available for standby NN.


> HA Standby stuck in safemode when shared edits directory is bounced
> ---
>
> Key: HDFS-2914
> URL: https://issues.apache.org/jira/browse/HDFS-2914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Hari Mankude
>
> When shared edits dir is bounced, standby NN is put into safemode by the 
> NameNodeResourceMonitor(). However, there is no path for it to exit out of 
> safe mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is failing

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Target Version/s: 0.24.0

> TestBackupNode is failing
> -
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> Looks like it has been for a few days.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is racy

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Description: TestBackupNode#waitCheckpointDone can spuriously fail because 
of a race.  (was: Looks like it has been for a few days.)
Summary: TestBackupNode is racy  (was: TestBackupNode is failing)

> TestBackupNode is racy
> --
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> TestBackupNode#waitCheckpointDone can spuriously fail because of a race.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is failing

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Component/s: test

> TestBackupNode is failing
> -
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> Looks like it has been for a few days.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is failing

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Status: Patch Available  (was: Reopened)

> TestBackupNode is failing
> -
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node, test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> Looks like it has been for a few days.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2764) TestBackupNode is failing

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2764:
-

Attachment: HDFS-2764.patch

Here's a patch which addresses the issue.

The trouble was that a helper method used by both failing tests had a race 
condition. In waitCheckpointDone, the test would just wait for the BN to get a 
particular fsimage snapshot, and then assert that the NN also had that fsimage 
snapshot, even though the BN might not have uploaded it back to the NN yet.

While I was in this test class I also took the liberty of updating it to a 
JUnit 4-style test.

I guess it was failing consistently on my box because it's an SSD, and things 
just move too damn fast.

> TestBackupNode is failing
> -
>
> Key: HDFS-2764
> URL: https://issues.apache.org/jira/browse/HDFS-2764
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2764.patch
>
>
> Looks like it has been for a few days.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2914) HA Standby stuck in safemode when shared edits directory is bounced

2012-02-07 Thread Hari Mankude (Created) (JIRA)
HA Standby stuck in safemode when shared edits directory is bounced
---

 Key: HDFS-2914
 URL: https://issues.apache.org/jira/browse/HDFS-2914
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Hari Mankude


When shared edits dir is bounced, standby NN is put into safemode by the 
NameNodeResourceMonitor(). However, there is no path for it to exit out of safe 
mode when shared edits dir reappears.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203092#comment-13203092
 ] 

Bikas Saha commented on HDFS-2912:
--

For some reason the LOG.FATAL statements is not terminating the NN in my case. 
Will look into it further.

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability

2012-02-07 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203089#comment-13203089
 ] 

Uma Maheswara Rao G commented on HDFS-2362:
---

Recently I remember Eli's and Dhruba's discussion on mailing list about merging 
this NN scalability issues to 0.23.
Are we planning it for 0.23.1 release? 

> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports (HDFS-2490)
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock (HDFS-2495)
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203081#comment-13203081
 ] 

Uma Maheswara Rao G commented on HDFS-2911:
---

I too agree. Recently i have debugged many issues due to OOME in my clusters. 
for example: HADOOP-7916, HDFS-2850

> Gracefully handle OutOfMemoryErrors
> ---
>
> Key: HDFS-2911
> URL: https://issues.apache.org/jira/browse/HDFS-2911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Eli Collins
>
> We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
> We should catch them in a high-level handler, cleanly fail the RPC (vs 
> sending back the OOM stackrace) or background thread, and shutdown the NN or 
> DN. Currently the process is left in a not well-test tested state 
> (continuously fails RPCs and internal threads, may or may not recover and 
> doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203064#comment-13203064
 ] 

Eli Collins commented on HDFS-2911:
---

HDFS isn't really an application. If we labor on subsequent failures can result 
in data loss. IMO it's better to failfast.

> Gracefully handle OutOfMemoryErrors
> ---
>
> Key: HDFS-2911
> URL: https://issues.apache.org/jira/browse/HDFS-2911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Eli Collins
>
> We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
> We should catch them in a high-level handler, cleanly fail the RPC (vs 
> sending back the OOM stackrace) or background thread, and shutdown the NN or 
> DN. Currently the process is left in a not well-test tested state 
> (continuously fails RPCs and internal threads, may or may not recover and 
> doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2819) Document new HA-related configs in hdfs-default.xml

2012-02-07 Thread Eli Collins (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2819:
--

Attachment: hdfs-2819-ammend.txt

Thanks for the review Suresh. Comments below and updated patch attached 
(hdfs-2819-ammend.txt)

#1 Because it is the prefix for a key rather than a key itself (ie you can't 
use it by itself to lookup anything). This prefix plus a suffix (namespace ID) 
will result in a key that refers to a set of namesnodes. The naming is 
consistent with other variables that use _PREFIX.
#2 "dfs.ha.namenodes" is the prefix for a given namservice, eg 
"dfs.ha.namenodes.EXAMPLENAMESERVICE". This description already says "contains 
a comma-separated list of namenodes", maybe you were thinking of another key?
#3 Yes, empty values are parsed as null. Note that a value with whitespace is 
not, ie "  " here would not be kosher.
#4 I added them per Todd's request above, disagree w his thinking?
#5 These values are used to set "ipc.client.connect.max.retries" and 
"ipc.client.connect.max.retries.on.timeouts" respectively for the failover rpc 
proxy. I updated the description with the rationale for the 0 default (failover 
effectively means the clients do retry). These are marked "Expert only" because 
we don't expect most users to modify them or need to understand them.
#6 The base time is 500ms and we don't wait on the first retry so the sequence 
is 0, 1s, 2s, 4s, 8s, .. (up to 15 retries, the last base value caps at 8s, 
though note that the 5th to 15th values, like the others, will vary by +/- 50% 
each time, so could delay up to 12s). Make sense?
#7 Not sure I follow, do you have a specific suggestion? I marked these as 
"Expert only" because we don't expect most users to modify or need to 
understand them.

> Document new HA-related configs in hdfs-default.xml
> ---
>
> Key: HDFS-2819
> URL: https://issues.apache.org/jira/browse/HDFS-2819
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation, ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Eli Collins
> Attachments: hdfs-2819-ammend.txt, hdfs-2819.txt, hdfs-2819.txt, 
> hdfs-2819.txt
>
>
> We've added a few configs, like shared edits dir, dfs.ha.namenodes, etc - we 
> should probably add these to hdfs-default.xml so they get documented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203037#comment-13203037
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2911:
--

OutOfMemoryError is a subclass of Error which indicates serious problems that a 
reasonable application *should not try to catch* according to the 
[javadoc|http://docs.oracle.com/javase/6/docs/api/java/lang/Error.html].

It is hard to handle OutOfMemoryError.  One problem is that there could be more 
OutOfMemoryErrors being thrown when handling the first OutOfMemoryError.

> Gracefully handle OutOfMemoryErrors
> ---
>
> Key: HDFS-2911
> URL: https://issues.apache.org/jira/browse/HDFS-2911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Eli Collins
>
> We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
> We should catch them in a high-level handler, cleanly fail the RPC (vs 
> sending back the OOM stackrace) or background thread, and shutdown the NN or 
> DN. Currently the process is left in a not well-test tested state 
> (continuously fails RPCs and internal threads, may or may not recover and 
> doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2905) HA: Standby NN NPE when shared edits dir is deleted

2012-02-07 Thread Hari Mankude (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203036#comment-13203036
 ] 

Hari Mankude commented on HDFS-2905:


Looks good.
+1 from my side.

> HA: Standby NN NPE when shared edits dir is deleted
> ---
>
> Key: HDFS-2905
> URL: https://issues.apache.org/jira/browse/HDFS-2905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: HDFS-2905.HDFS-1623.patch, HDFS-2905.HDFS-1623.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2913) HA: Need a way to shutdown the Name Node

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2913:
-

Component/s: ha

> HA: Need a way to shutdown the Name Node
> 
>
> Key: HDFS-2913
> URL: https://issues.apache.org/jira/browse/HDFS-2913
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> Ideally, NameNode.stop() needs to be called because it will change the HA 
> state and shutdown all services. NameNode reference is not available 
> anywhere. Hence it is not possible to shutdown the name node gracefully.
> A possible solution could be to have a Service interface that gets passed 
> down to components like FSNameSystem, via which they can inform the NameNode 
> about irrecoverable errors. NameNode could then decide to shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2912:
-

Component/s: ha

> HA: Namenode not shutting down when shared edits dir is inaccessible
> 
>
> Key: HDFS-2912
> URL: https://issues.apache.org/jira/browse/HDFS-2912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> When there is an error in shared edits dir then current policy requires the 
> active name node to abort and shutdown.
> Currently there is no way to shut down the name node and hence this does not 
> happen even after all journals have been aborted on error. In fact the name 
> node stays Active and also is not in safe mode. Ideally it should shut down, 
> or at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2579) Starting delegation token manager during safemode fails

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203006#comment-13203006
 ] 

Todd Lipcon commented on HDFS-2579:
---

We've found one bug during stress testing - there's a super-rare race here if 
the secret manager happens to be calling logUpdateMasterKey exactly when the NN 
wants to stop the secret manager. The issue is that the "stopSecretManager" 
call is holding the FSNamesystem lock, but the secret manager thread is waiting 
on the same lock.

The solution is to have the secret manager use lockInterruptibly instead.

> Starting delegation token manager during safemode fails
> ---
>
> Key: HDFS-2579
> URL: https://issues.apache.org/jira/browse/HDFS-2579
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node, security
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2579.txt, hdfs-2579.txt
>
>
> I noticed this on the HA branch, but it seems to actually affect non-HA 
> branch 0.23 if security is enabled. When the NN starts up, if security is 
> enabled, we start the delegation token secret manager, which then tries to 
> call {{logUpdateMasterKey}}. This fails because the edit logs may not be 
> written while in safe-mode.
> It seems to me that there's not any necessary reason that you have to make a 
> new master key at startup, since you've loaded the old key when you load the 
> FSImage. You'd only be lacking a DT master key on a fresh cluster, in which 
> case we could have it generate one at format time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2902) HA: Allow new shared edit logs dir to be configured while NN is running

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203002#comment-13203002
 ] 

Bikas Saha commented on HDFS-2902:
--

Reading the code shows a possible inconsistency issue.

FSImage.storage (an NNStorage object) manages the info about all storage dirs 
and records their health state. This includes edits and name dirs.
FSEditLogs.journalSet manages the info about all the journals and each journal 
maintains its own reference to the StorageDirectory it is writing to. This 
storage directory is managed by FSImage.storage above.

However, both these work independently. So marking a directory as bad in 
FSImage.storage does not really stop it from being written via a journal. And 
vice versa.



> HA: Allow new shared edit logs dir to be configured while NN is running
> ---
>
> Key: HDFS-2902
> URL: https://issues.apache.org/jira/browse/HDFS-2902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2510) Add HA-related metrics

2012-02-07 Thread Aaron T. Myers (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2510:
-

Attachment: HDFS-2510-HDFS-1623.patch

Thanks a lot for the review, Todd. Here's a patch which addresses your feedback.

> Add HA-related metrics
> --
>
> Key: HDFS-2510
> URL: https://issues.apache.org/jira/browse/HDFS-2510
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2510-HDFS-1623.patch, HDFS-2510.HDFS-1623.patch
>
>
> Off the top of my head, I can think of:
> NN metrics:
> * A binary metric for active or standby
> * The size of the pending DN message queues
> * A timestamp for when the standby NN last read from shared edit log
> * The difference between highest generation stamp seen from the shared edit 
> log and the highest generation stamp seen from any DN
> It would probably also be useful to have a DN metric which somehow describes 
> which active/standby NNs its talking to, e.g. "times since last communicated 
> with standby/active NNs."
> I'm sure there are others as well. Comments strongly encouraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203000#comment-13203000
 ] 

Todd Lipcon commented on HDFS-2910:
---

We should just do a hard exit here -- upon restart or failover, the new active 
NN will recover the logs.

> HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir 
> is inaccessible during log roll
> ---
>
> Key: HDFS-2910
> URL: https://issues.apache.org/jira/browse/HDFS-2910
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2913) HA: Need a way to shutdown the Name Node

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202997#comment-13202997
 ] 

Todd Lipcon commented on HDFS-2913:
---

Currently it is meant to do a "fail fast" shutdown -- i.e System.exit(1) after 
logging a FATAL message. A graceful shutdown would be a nice optimization, but 
HDFS-2912 should be treated as a bug that the expected fail-fast behavior isn't 
being triggered. Doing a graceful shutdown after hitting an unknown state is 
likely to be non-trivial

> HA: Need a way to shutdown the Name Node
> 
>
> Key: HDFS-2913
> URL: https://issues.apache.org/jira/browse/HDFS-2913
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> Ideally, NameNode.stop() needs to be called because it will change the HA 
> state and shutdown all services. NameNode reference is not available 
> anywhere. Hence it is not possible to shutdown the name node gracefully.
> A possible solution could be to have a Service interface that gets passed 
> down to components like FSNameSystem, via which they can inform the NameNode 
> about irrecoverable errors. NameNode could then decide to shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2909) HA: Inaccessible shared edits dir not getting removed from FSImage storage dirs upon error

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202990#comment-13202990
 ] 

Bikas Saha commented on HDFS-2909:
--

Aside from all the above I see some other issues.
Say everything is healthy and FSImage.rollEditLogs() is called.
It first calls FSEditLogs.rollLogs that actually rolls the logs.
It then calls storage.writeTransactionIdFileToStorage() which records this in 
all storage dirs so that the information about the rolled edits is not lost.
However, NN could crash in after FSEditLogs.rollLogs() has completed and before 
storage.writeTransactionIdFileToStorage() is called. That might leave the data 
in an inconsistent state.

> HA: Inaccessible shared edits dir not getting removed from FSImage storage 
> dirs upon error
> --
>
> Key: HDFS-2909
> URL: https://issues.apache.org/jira/browse/HDFS-2909
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2910) HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir is inaccessible during log roll

2012-02-07 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202991#comment-13202991
 ] 

Bikas Saha commented on HDFS-2910:
--

I think FSEditLog should not be starting a new segment when ending the last one 
failed. Specifically in this case, the failure should abortAllJournals and 
shutdown the HA NN.
Even if we fix the NN shutdown case, this bug still needs to be fixed or else 
the edit logs will be left behind in an inconsistent state.



> HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir 
> is inaccessible during log roll
> ---
>
> Key: HDFS-2910
> URL: https://issues.apache.org/jira/browse/HDFS-2910
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2907) Make FSDataset in Datanode Pluggable

2012-02-07 Thread Suresh Srinivas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2907:
--

Target Version/s: 0.24.0
   Fix Version/s: (was: 0.24.0)

> Make FSDataset in Datanode Pluggable
> 
>
> Key: HDFS-2907
> URL: https://issues.apache.org/jira/browse/HDFS-2907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sanjay Radia
>Assignee: Sanjay Radia
>Priority: Minor
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2913) HA: Need a way to shutdown the Name Node

2012-02-07 Thread Bikas Saha (Created) (JIRA)
HA: Need a way to shutdown the Name Node


 Key: HDFS-2913
 URL: https://issues.apache.org/jira/browse/HDFS-2913
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Bikas Saha
Assignee: Bikas Saha


Ideally, NameNode.stop() needs to be called because it will change the HA state 
and shutdown all services. NameNode reference is not available anywhere. Hence 
it is not possible to shutdown the name node gracefully.

A possible solution could be to have a Service interface that gets passed down 
to components like FSNameSystem, via which they can inform the NameNode about 
irrecoverable errors. NameNode could then decide to shutdown.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2912) HA: Namenode not shutting down when shared edits dir is inaccessible

2012-02-07 Thread Bikas Saha (Created) (JIRA)
HA: Namenode not shutting down when shared edits dir is inaccessible


 Key: HDFS-2912
 URL: https://issues.apache.org/jira/browse/HDFS-2912
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Bikas Saha
Assignee: Bikas Saha


When there is an error in shared edits dir then current policy requires the 
active name node to abort and shutdown.
Currently there is no way to shut down the name node and hence this does not 
happen even after all journals have been aborted on error. In fact the name 
node stays Active and also is not in safe mode. Ideally it should shut down, or 
at least go into safe mode or standby mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2510) Add HA-related metrics

2012-02-07 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202950#comment-13202950
 ] 

Todd Lipcon commented on HDFS-2510:
---

{code}
+  public long getMillisSinceLastLoadedEdits() {
+if (haContext.getState().getServiceState() == HAServiceState.STANDBY) {
{code}

Does this code possibly get called early during start-up before the ha context 
state has been set? (ie before the first start*Service)

- in EditLogTailer, the new javadoc is redundant - just keep the @return bit


> Add HA-related metrics
> --
>
> Key: HDFS-2510
> URL: https://issues.apache.org/jira/browse/HDFS-2510
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-2510.HDFS-1623.patch
>
>
> Off the top of my head, I can think of:
> NN metrics:
> * A binary metric for active or standby
> * The size of the pending DN message queues
> * A timestamp for when the standby NN last read from shared edit log
> * The difference between highest generation stamp seen from the shared edit 
> log and the highest generation stamp seen from any DN
> It would probably also be useful to have a DN metric which somehow describes 
> which active/standby NNs its talking to, e.g. "times since last communicated 
> with standby/active NNs."
> I'm sure there are others as well. Comments strongly encouraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Shaneal Manek (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202933#comment-13202933
 ] 

Shaneal Manek commented on HDFS-2911:
-

Incidentally, I worked with a jvmti agent a while ago that did a thread/heap 
dump on OOM. It was really useful for debugging.

The license is compatible, so it may be worth scavenging some of that 
code/functionality - check it out if curious:  
https://github.com/Greplin/polarbear

> Gracefully handle OutOfMemoryErrors
> ---
>
> Key: HDFS-2911
> URL: https://issues.apache.org/jira/browse/HDFS-2911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Eli Collins
>
> We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
> We should catch them in a high-level handler, cleanly fail the RPC (vs 
> sending back the OOM stackrace) or background thread, and shutdown the NN or 
> DN. Currently the process is left in a not well-test tested state 
> (continuously fails RPCs and internal threads, may or may not recover and 
> doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >