[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-18 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188507#comment-13188507
 ] 

Hudson commented on HDFS-2795:
--

Integrated in Hadoop-Hdfs-HAbranch-build #51 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/51/])
Amend HDFS-2795. Fix PersistBlocks failure due to an NPE in 
isPopulatingReplQueues()

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1232510
Files : 
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> HA: Standby NN takes a long time to recover from a dead DN starting up
> --
>
> Key: HDFS-2795
> URL: https://issues.apache.org/jira/browse/HDFS-2795
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2795.txt
>
>
> To reproduce:
> # Start an HA cluster with a DN.
> # Write several blocks to the FS with replication 1.
> # Shutdown the DN
> # Wait for the NNs to declare the DN dead. All blocks will be 
> under-replicated.
> # Restart the DN.
> Note that upon restarting the DN, the active NN will immediately get all 
> block locations from the initial BR. The standby NN will not, and instead 
> will slowly add block locations for a subset of the previously-missing blocks 
> on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187633#comment-13187633
 ] 

Hudson commented on HDFS-2795:
--

Integrated in Hadoop-Hdfs-HAbranch-build #50 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/50/])
HDFS-2795. Standby NN takes a long time to recover from a dead DN starting 
up. Contributed by Todd Lipcon.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1232285
Files : 
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyIsHot.java


> HA: Standby NN takes a long time to recover from a dead DN starting up
> --
>
> Key: HDFS-2795
> URL: https://issues.apache.org/jira/browse/HDFS-2795
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2795.txt
>
>
> To reproduce:
> # Start an HA cluster with a DN.
> # Write several blocks to the FS with replication 1.
> # Shutdown the DN
> # Wait for the NNs to declare the DN dead. All blocks will be 
> under-replicated.
> # Restart the DN.
> Note that upon restarting the DN, the active NN will immediately get all 
> block locations from the initial BR. The standby NN will not, and instead 
> will slowly add block locations for a subset of the previously-missing blocks 
> on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-16 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187513#comment-13187513
 ] 

Aaron T. Myers commented on HDFS-2795:
--

+1, amendment looks good to me.

> HA: Standby NN takes a long time to recover from a dead DN starting up
> --
>
> Key: HDFS-2795
> URL: https://issues.apache.org/jira/browse/HDFS-2795
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2795.txt
>
>
> To reproduce:
> # Start an HA cluster with a DN.
> # Write several blocks to the FS with replication 1.
> # Shutdown the DN
> # Wait for the NNs to declare the DN dead. All blocks will be 
> under-replicated.
> # Restart the DN.
> Note that upon restarting the DN, the active NN will immediately get all 
> block locations from the initial BR. The standby NN will not, and instead 
> will slowly add block locations for a subset of the previously-missing blocks 
> on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187467#comment-13187467
 ] 

Todd Lipcon commented on HDFS-2795:
---

Woops, this broke one of the TestPersistBlocks tests -- when it replays append, 
it was getting a NPE since {{haContext}} isn't set yet during startup. Does the 
following look OK?
{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/ap
index 5e8377e..c57d152 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
@@ -3718,7 +3718,8 @@ public class FSNamesystem implements Namesystem, 
FSClusterStats,
 
   @Override
   public boolean isPopulatingReplQueues() {
-if (!haContext.getState().shouldPopulateReplQueues()) {
+if (haContext != null && // null during startup!
+!haContext.getState().shouldPopulateReplQueues()) {
   return false;
 }
 // safeMode is volatile, and may be set to null at any time
{code}

> HA: Standby NN takes a long time to recover from a dead DN starting up
> --
>
> Key: HDFS-2795
> URL: https://issues.apache.org/jira/browse/HDFS-2795
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2795.txt
>
>
> To reproduce:
> # Start an HA cluster with a DN.
> # Write several blocks to the FS with replication 1.
> # Shutdown the DN
> # Wait for the NNs to declare the DN dead. All blocks will be 
> under-replicated.
> # Restart the DN.
> Note that upon restarting the DN, the active NN will immediately get all 
> block locations from the initial BR. The standby NN will not, and instead 
> will slowly add block locations for a subset of the previously-missing blocks 
> on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-16 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187306#comment-13187306
 ] 

Aaron T. Myers commented on HDFS-2795:
--

{code}// Create 20 blocks.{code}

Unless I'm missing something, this test will actually create 5 blocks, not 20. 

Patch looks good otherwise. +1.

> HA: Standby NN takes a long time to recover from a dead DN starting up
> --
>
> Key: HDFS-2795
> URL: https://issues.apache.org/jira/browse/HDFS-2795
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Aaron T. Myers
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-2795.txt
>
>
> To reproduce:
> # Start an HA cluster with a DN.
> # Write several blocks to the FS with replication 1.
> # Shutdown the DN
> # Wait for the NNs to declare the DN dead. All blocks will be 
> under-replicated.
> # Restart the DN.
> Note that upon restarting the DN, the active NN will immediately get all 
> block locations from the initial BR. The standby NN will not, and instead 
> will slowly add block locations for a subset of the previously-missing blocks 
> on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira