[ 
https://issues.apache.org/jira/browse/HADOOP-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645628#action_12645628
 ] 

Konstantin Shvachko commented on HADOOP-4597:
---------------------------------------------

I did manual testing, which confirms the change works as suspected.
# Create a new file system containing a few files by starting name-node and 2 
data-nodes, and loading a couple of files into it. Then stop the cluster.
# Start name-node with {{dfs.safemode.threshold.pct = 1.1}}
# Start one data-node, which contains exactly one copy of each block.
# Call {{dfsadmin -metasave tmp.txt}}. File tmp.txt will show that there is 0 
"Blocks waiting for replication:".
# Call {{dfsadmin -safemode leave}}. The name-node will leave safe-mode.
# Call {{dfsadmin -metasave tmp.txt}}. File tmp.txt will show that the number 
of "Blocks waiting for replication:" > 0, 
and will list all blocks of the file system because they are all 
under-replicated.

Without the patch the last step would still show "Blocks waiting for 
replication: 0".

> Under-replicated blocks are not calculated if the name-node is forced out of 
> safe-mode.
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4597
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4597
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: NeededRepl-18.patch, NeededRepl.patch
>
>
> Currently during name-node startup under-replicated blocks are not added to 
> the neededReplications queue until the name-node leaves safe mode. This is an 
> optimization since otherwise all blocks will first go into the 
> under-replicated queue and then most of them will be removed from it.
> When the name-node leaves safe-mode automatically it checks all blocks to 
> have a correct number of replicas ({{processMisReplicatedBlocks()}}). 
> When the name-node leaves safe-mode manually it does not perform the checkup.
> In the latter case all under-replicated blocks remain not replicated forever 
> because there is no alternative mechanism to trigger replications.
> The proposal is to call {{processMisReplicatedBlocks()}} any time the 
> name-node leaves safe mode - automatically or manually.
> In addition to solving that problem this could be an alternative mechanism 
> for refreshing {{neededReplications}} and {{excessReplicateMap}} sets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to