[ 
https://issues.apache.org/jira/browse/HADOOP-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-1855:
----------------------------------------

    Attachment: FsckBlockPlacement.patch

This patch verifies the replica placement policy. Currently it ensures that 
replicas are placed on at least two racks if there are multiple racks.
There is a reasonable concern that we should improve our block placement 
distributing replicas on at least replication-1 racks.
This would be beneficial for map-reduce jar and config files, which increases 
the likelihood of finding the initial files on a local rack for tasks.
The patch contains a method that verifies the number of racks the block is 
actually replicated to vs any required number of racks.
The method can be used in fsck once the improved replication policy is 
implemented. Until then we should report only the blocks that
are replicated on less than 2 racks in order to avoid confusion among users and 
system administrators.

Features:
- fsck reports mis-placed blocks as long as it detects them.
- There is a new "-rack" option, which can be used together or instead of 
"-location". If -rack is specified fsck prints data-node locations
prefixed with a string that defines this data-node placement in the cluster 
topology hierarchy. For example, /rack/data-node or
/data-center/rack/data-node.
- fsck also prints the total number of mis-placed blocks.
- some trivial bugs were fixed, like, instead of printing number of blocks for 
each file the old version was printing the total block count;
  also the average blocks replication and the percentage of over-replicated 
blocks was calculated incorrectly.
- I included more statistics in the report:
-- number of minimally replicated blocks, which is useful for checking 
safe-mode condition.
-- total number of missing replicas
-- number of data-nodes and
-- number of racks.
- fsck help message is updated to reflect the new option and the actual options 
dependencies.


> fsck should verify block placement
> ----------------------------------
>
>                 Key: HADOOP-1855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1855
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: FsckBlockPlacement.patch
>
>
> fsck currently detects missing and under-replicated blocks. It would be 
> helpful if it can also detect blocks that do not conform to the block 
> placement policy. An administrator can use this tool to verify that blocks 
> are distributed across racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to