[ https://issues.apache.org/jira/browse/HDFS-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917284#action_12917284 ]
Sriram Rao commented on HDFS-1432: ---------------------------------- > Performance: Map-reduce jobs could have a performance impact if the number of > replicas are reduced from 3 to 2. So, the >tradeoff is reducing the total > amount of storage while possibly increasing job latencies. With 2 copies in 2 racks, you are still preserving rack locality. That maybe sufficient. > HDFS across data centers: HighTide > ---------------------------------- > > Key: HDFS-1432 > URL: https://issues.apache.org/jira/browse/HDFS-1432 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: dhruba borthakur > Assignee: dhruba borthakur > > There are many instances when the same piece of data resides on multiple HDFS > clusters in different data centers. The primary reason being that the > physical limitation of one data center is insufficient to host the entire > data set. In that case, the administrator(s) typically partition that data > into two (or more) HDFS clusters on two different data centers and then > duplicates some subset of that data into both the HDFS clusters. > In such a situation, there will be six physical copies of data that is > duplicated, three copies in one data center and another three copies in > another data center. It would be nice if we can keep fewer than 3 replicas on > each of the data centers and have the ability to fix a replica in the local > data center by copying data from the remote copy in the remote data center. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.