[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-18 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188507#comment-13188507
 ] 

Hudson commented on HDFS-2795:
--

Integrated in Hadoop-Hdfs-HAbranch-build #51 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/51/])
Amend HDFS-2795. Fix PersistBlocks failure due to an NPE in 
isPopulatingReplQueues()

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1232510
Files : 
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 HA: Standby NN takes a long time to recover from a dead DN starting up
 --

 Key: HDFS-2795
 URL: https://issues.apache.org/jira/browse/HDFS-2795
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
Priority: Critical
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2795.txt


 To reproduce:
 # Start an HA cluster with a DN.
 # Write several blocks to the FS with replication 1.
 # Shutdown the DN
 # Wait for the NNs to declare the DN dead. All blocks will be 
 under-replicated.
 # Restart the DN.
 Note that upon restarting the DN, the active NN will immediately get all 
 block locations from the initial BR. The standby NN will not, and instead 
 will slowly add block locations for a subset of the previously-missing blocks 
 on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187633#comment-13187633
 ] 

Hudson commented on HDFS-2795:
--

Integrated in Hadoop-Hdfs-HAbranch-build #50 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/50/])
HDFS-2795. Standby NN takes a long time to recover from a dead DN starting 
up. Contributed by Todd Lipcon.

todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1232285
Files : 
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
* 
/hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyIsHot.java


 HA: Standby NN takes a long time to recover from a dead DN starting up
 --

 Key: HDFS-2795
 URL: https://issues.apache.org/jira/browse/HDFS-2795
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
Priority: Critical
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2795.txt


 To reproduce:
 # Start an HA cluster with a DN.
 # Write several blocks to the FS with replication 1.
 # Shutdown the DN
 # Wait for the NNs to declare the DN dead. All blocks will be 
 under-replicated.
 # Restart the DN.
 Note that upon restarting the DN, the active NN will immediately get all 
 block locations from the initial BR. The standby NN will not, and instead 
 will slowly add block locations for a subset of the previously-missing blocks 
 on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-16 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187306#comment-13187306
 ] 

Aaron T. Myers commented on HDFS-2795:
--

{code}// Create 20 blocks.{code}

Unless I'm missing something, this test will actually create 5 blocks, not 20. 

Patch looks good otherwise. +1.

 HA: Standby NN takes a long time to recover from a dead DN starting up
 --

 Key: HDFS-2795
 URL: https://issues.apache.org/jira/browse/HDFS-2795
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-2795.txt


 To reproduce:
 # Start an HA cluster with a DN.
 # Write several blocks to the FS with replication 1.
 # Shutdown the DN
 # Wait for the NNs to declare the DN dead. All blocks will be 
 under-replicated.
 # Restart the DN.
 Note that upon restarting the DN, the active NN will immediately get all 
 block locations from the initial BR. The standby NN will not, and instead 
 will slowly add block locations for a subset of the previously-missing blocks 
 on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187467#comment-13187467
 ] 

Todd Lipcon commented on HDFS-2795:
---

Woops, this broke one of the TestPersistBlocks tests -- when it replays append, 
it was getting a NPE since {{haContext}} isn't set yet during startup. Does the 
following look OK?
{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/ap
index 5e8377e..c57d152 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
@@ -3718,7 +3718,8 @@ public class FSNamesystem implements Namesystem, 
FSClusterStats,
 
   @Override
   public boolean isPopulatingReplQueues() {
-if (!haContext.getState().shouldPopulateReplQueues()) {
+if (haContext != null  // null during startup!
+!haContext.getState().shouldPopulateReplQueues()) {
   return false;
 }
 // safeMode is volatile, and may be set to null at any time
{code}

 HA: Standby NN takes a long time to recover from a dead DN starting up
 --

 Key: HDFS-2795
 URL: https://issues.apache.org/jira/browse/HDFS-2795
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
Priority: Critical
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2795.txt


 To reproduce:
 # Start an HA cluster with a DN.
 # Write several blocks to the FS with replication 1.
 # Shutdown the DN
 # Wait for the NNs to declare the DN dead. All blocks will be 
 under-replicated.
 # Restart the DN.
 Note that upon restarting the DN, the active NN will immediately get all 
 block locations from the initial BR. The standby NN will not, and instead 
 will slowly add block locations for a subset of the previously-missing blocks 
 on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2795) HA: Standby NN takes a long time to recover from a dead DN starting up

2012-01-16 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187513#comment-13187513
 ] 

Aaron T. Myers commented on HDFS-2795:
--

+1, amendment looks good to me.

 HA: Standby NN takes a long time to recover from a dead DN starting up
 --

 Key: HDFS-2795
 URL: https://issues.apache.org/jira/browse/HDFS-2795
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Todd Lipcon
Priority: Critical
 Fix For: HA branch (HDFS-1623)

 Attachments: hdfs-2795.txt


 To reproduce:
 # Start an HA cluster with a DN.
 # Write several blocks to the FS with replication 1.
 # Shutdown the DN
 # Wait for the NNs to declare the DN dead. All blocks will be 
 under-replicated.
 # Restart the DN.
 Note that upon restarting the DN, the active NN will immediately get all 
 block locations from the initial BR. The standby NN will not, and instead 
 will slowly add block locations for a subset of the previously-missing blocks 
 on every DN heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira