[ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851219#comment-13851219 ]
Jerry Chen commented on HDFS-5442: ---------------------------------- {quote}It might be good to break up the work into two major features.{quote} Logically, yes. And just as you mentioned, the user will have the flexibility to choose between sync or async features based on their needs. On the other hand, in design perspective, the two features share some common concepts and facilities, and serves common requirement of cross datacenter replication. We also see the needs of sync replication and async replication to be used at the same time and complement each other for different data characteristics. {quote}There seems to be assumption of replication of entire namespace at few places. This might not be desirable in many cases. Enabling this feature per directory or list of directories would be very useful.{quote} As the namespace replication is based on namespace journaling, to replicate the entire namespace is in concept straightforward and simple. Per list of directories namespace replication does can be done by filtering, but that would complex the whole thing as we know that the edit logs for a directory doesn’t form a closure in namespace journaling. On the other hand, the data plays a critical role for cross datacenter replication. User can configure a list of directories for synchronous replication of data and other directories data will be replicated asynchronously. We will target the entire namespace replication in the phase-1 work and can consider this in phase-2 work when we understanding the exact impact for partial namespace replication. {quote}There seems to be assumption of primary cluster and secondary cluster. Can this be chained to having something A->B and B->C. Or even the use case of A->B or B->A. Calling out those with configuration options would be very useful for cluster admins.{quote} In the design, secondary and primary cluster are operating differently. To support chain like A->B and B->C, a secondary cluster should act as a primary cluster for C. This needs extra work specific for chaining to be done. I would suggest consider this as future improvement. When we talk about chain cluster, I would tend to consider it as asynchronous replication. This would simplify things a little. While reverse/switch the primary and secondary cluster role is supported but this doesn’t mean two way replication at the same time. {quote}Another place which would need more information is about primary cluster NN tracking datanode information from secondary cluster (via secondary cluster NN). This needs to be thought to see if this is really scalable.{quote} We should assume that the part of datanode information tracked by primary cluster is kept as minimum. And this information is updated in batch via secondary cluster NN. In network communication, our goal is to send the secondary cluster details when there is really a change in DN state and batches wise. For example, DN expires with secondary cluster or DN space completely filled and cannot write any new data to it, that time we report this DNs. We skip reporting DNs which are already registered and they are still qualifies for writes. Let’s communicate by using patches as to other details of “how to”. {quote}How would ReplicationManager or changing replication of files work in general with this policy?{quote} In the high level, we would assume the original replication in each local cluster is still working as it was. The concept of the original replication number is applied to the local blocks only. The added part is remote block replication which is triggered by secondary cluster NameNode. > Zero loss HDFS data replication for multiple datacenters > -------------------------------------------------------- > > Key: HDFS-5442 > URL: https://issues.apache.org/jira/browse/HDFS-5442 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Avik Dey > Attachments: Disaster Recovery Solution for Hadoop.pdf > > > Hadoop is architected to operate efficiently at scale for normal hardware > failures within a datacenter. Hadoop is not designed today to handle > datacenter failures. Although HDFS is not designed for nor deployed in > configurations spanning multiple datacenters, replicating data from one > location to another is common practice for disaster recovery and global > service availability. There are current solutions available for batch > replication using data copy/export tools. However, while providing some > backup capability for HDFS data, they do not provide the capability to > recover all your HDFS data from a datacenter failure and be up and running > again with a fully operational Hadoop cluster in another datacenter in a > matter of minutes. For disaster recovery from a datacenter failure, we should > provide a fully distributed, zero data loss, low latency, high throughput and > secure HDFS data replication solution for multiple datacenter setup. > Design and code for Phase-1 to follow soon. -- This message was sent by Atlassian JIRA (v6.1.4#6159)