DFS should place one replica per rack
-------------------------------------

                 Key: HADOOP-2559
                 URL: https://issues.apache.org/jira/browse/HADOOP-2559
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
            Reporter: Runping Qi



Currently, when writing out a block, dfs will place one copy to a local data 
node, one copy to a rack local node
and another one to a remote node. This leads to a number of undesired 
properties:

1. The block will be rack-local to two tacks instead of three, reducing the 
advantage of rack locality based scheduling by 1/3.

2. The Blocks of a file (especiallya  large file) are unevenly distributed over 
the nodes: One third will be on the local node, and two thirds on the nodes on 
the same rack. This may make some nodes full much faster than others, 
increasing the need of rebalancing. Furthermore, this also make some nodes 
become "hot spots" if those big 
files are popular and accessed by many applications.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to