[jira] Commented: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode

dhruba borthakur (JIRA) Tue, 16 Nov 2010 12:08:36 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932632#action_12932632
 ]


dhruba borthakur commented on HDFS-1476:
----------------------------------------

Thinking more about this one,  we can exit safemode faster if we can compute 
misReplicatedBlocks even before we have one replica of all blocks.

Step 1: the namenode waits to ensure that there is at least one replica of all 
known blocks.
Step 2: Then it invokes processMisReplicatedBlocks to update neededReplication

When the cluster restarts, the namenode starts in Step 1 and starts processing 
a storm of block reports from all datanodes. But a few datanodes are somewhat 
slow and the block report from the straggler datanodes delays the transition 
from Step 1 to Step 2. The CPU usage on the NN decreases exponentially as Step 
1 progresses and becomes almost negligible when Step 1 is about to end.

This jira could change the code so that processMisReplicatedBlocks is invoked 
before Step 1 finishes completely. This will make the NN exit safemode earlier

> listCorruptFileBlocks should be functional while the name node is still in 
> safe mode
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-1476
>                 URL: https://issues.apache.org/jira/browse/HDFS-1476
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Patrick Kling
>
> This would allow us to detect whether missing blocks can be fixed using Raid 
> and if that is the case exit safe mode earlier.
> One way to make listCorruptFileBlocks available before the name node has 
> exited from safe mode would be to perform a scan of the blocks map on each 
> call to listCorruptFileBlocks to determine if there are any blocks with no 
> replicas. This scan could be parallelized by dividing the space of block IDs 
> into multiple intervals than can be scanned independently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode

Reply via email to