[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848966#comment-13848966
 ] 

LiuLei commented on HDFS-5442:
------------------------------

Hi Jerry,the design document is very good.


There are two clusters in your design document: the primary cluster and the 
secondary cluster. I think we only need one cluster.

Example there are  two datacenters, we can do below things:
1.The datanode of the cluster can be deployed in two datacenters.
2. The Active NameNode is deploy in first datacenter, and Standby NameNode is 
deploy in second datacenter. 
3. The JournalNode can be deploy in three datacenters(d1:2,d2:2,d3:1), so even 
if one datacenter is failure, the QJM is still available.
4. When client to create file, the client can specify the num of replication in 
every datacenters. Example the client specify first datacenter store three 
replications and second datacenter store one replication, and then Active 
NameNode choose three datanodes form first datacenter and choose one datanode 
from second datacenter, the client create pipelien with the four datanodes and 
write data.

Failover:
When one datacenter is failure, we only switch Standby NameNode to Active 
NameNode.





> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>         Attachments: Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to