[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

yeqi (JIRA) Sun, 04 May 2014 20:14:22 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989244#comment-13989244
 ]


yeqi commented on HDFS-5442:
----------------------------

Hi forks, This jira seems only considered BDR cross the DCs, but not MR job 
execution efficiency. In case of multi-DCs, job should be submitted to the DC 
where most of it's source files can be found locally, so that there won't be 
huge cross-DC-traffic. I think a "smart" MR client is good for routing job.

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>            Assignee: Dian Fu
>         Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
> Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

Reply via email to