[
https://issues.apache.org/jira/browse/HADOOP-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645628#action_12645628
]
Konstantin Shvachko commented on HADOOP-4597:
---------------------------------------------
I did manual testing, which confirms the change works as suspected.
# Create a new file system containing a few files by starting name-node and 2
data-nodes, and loading a couple of files into it. Then stop the cluster.
# Start name-node with {{dfs.safemode.threshold.pct = 1.1}}
# Start one data-node, which contains exactly one copy of each block.
# Call {{dfsadmin -metasave tmp.txt}}. File tmp.txt will show that there is 0
"Blocks waiting for replication:".
# Call {{dfsadmin -safemode leave}}. The name-node will leave safe-mode.
# Call {{dfsadmin -metasave tmp.txt}}. File tmp.txt will show that the number
of "Blocks waiting for replication:" > 0,
and will list all blocks of the file system because they are all
under-replicated.
Without the patch the last step would still show "Blocks waiting for
replication: 0".
> Under-replicated blocks are not calculated if the name-node is forced out of
> safe-mode.
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-4597
> URL: https://issues.apache.org/jira/browse/HADOOP-4597
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.0
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Priority: Blocker
> Fix For: 0.18.3
>
> Attachments: NeededRepl-18.patch, NeededRepl.patch
>
>
> Currently during name-node startup under-replicated blocks are not added to
> the neededReplications queue until the name-node leaves safe mode. This is an
> optimization since otherwise all blocks will first go into the
> under-replicated queue and then most of them will be removed from it.
> When the name-node leaves safe-mode automatically it checks all blocks to
> have a correct number of replicas ({{processMisReplicatedBlocks()}}).
> When the name-node leaves safe-mode manually it does not perform the checkup.
> In the latter case all under-replicated blocks remain not replicated forever
> because there is no alternative mechanism to trigger replications.
> The proposal is to call {{processMisReplicatedBlocks()}} any time the
> name-node leaves safe mode - automatically or manually.
> In addition to solving that problem this could be an alternative mechanism
> for refreshing {{neededReplications}} and {{excessReplicateMap}} sets.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.