Re: Review Request: Populate needed replication queues before leaving safe mode.

Dhruba Borthakur Tue, 16 Nov 2010 17:37:01 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/#review39
-----------------------------------------------------------




http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
<https://reviews.apache.org/r/105/#comment27>

    Please change the default to 1, so that it is backward compatible.



http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
<https://reviews.apache.org/r/105/#comment29>

    We can first check canInitializeReplQueue to optimize on CPU.



http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
<https://reviews.apache.org/r/105/#comment28>

    This can move to after the SafeMode daemon is created.


- Dhruba


On 2010-11-16 16:39:18, Patrick Kling wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/105/
> -----------------------------------------------------------
> 
> (Updated 2010-11-16 16:39:18)
> 
> 
> Review request for hadoop-hdfs.
> 
> 
> Summary
> -------
> 
> This patch introduces a new configuration variable 
> dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks 
> for which block reports have to be received before the NameNode will start 
> initializing the needed replication queues. Once a sufficient number of block 
> reports have been received, the queues are initialized while the NameNode is 
> still in safe mode. After the queues are initialized, subsequent block 
> reports are handled by updating the queues incrementally.
> 
> The benefit of this is twofold:
> - It allows us to compute the replication queues while we are waiting for the 
> last few block reports (when the NameNode is mostly idle). Once these block 
> reports have been received, we can then immediately leave safe mode without 
> having to wait for the computation of the needed replication queues (which 
> requires a full traversal of the blocks map).
> - With Raid, it may not be necessary to stay in safe mode until all blocks 
> have been reported. Using this change, we could monitor if all of the missing 
> blocks can be recreated using parity information and if so leave safe mode 
> early. In order for this monitoring to work, we need access to the needed 
> replication queues while the NameNode is still in safe mode.
> 
> 
> This addresses bug HDFS-1476.
>     https://issues.apache.org/jira/browse/HDFS-1476
> 
> 
> Diffs
> -----
> 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
>  1035545 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
>  1035545 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
>  1035545 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java
>  1035545 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
>  1035545 
> 
> Diff: https://reviews.apache.org/r/105/diff
> 
> 
> Testing
> -------
> 
> new test case in TestListCorruptFileBlocks
> 
> 
> Thanks,
> 
> Patrick
> 
>

Re: Review Request: Populate needed replication queues before leaving safe mode.

Reply via email to