[ 
https://issues.apache.org/jira/browse/HDFS-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916458#action_12916458
 ] 

dhruba borthakur edited comment on HDFS-1432 at 9/30/10 8:53 AM:
-----------------------------------------------------------------

*Goal*: The goal of the HighTideNode is to keep only one physical replica per 
data center. This is mostly for older files that change very infrequently.The 
HighTide server watches over the two HDFS namespaces from two different 
NameNodes in two different data centers. These two equivalent namespaces will 
be populated via means that are external to HighTide. The HighTide server 
verifies (via checksum of the crc files) that two directories in the two HDFS 
contain identical data, and if so, reduces the replication factor to 2 on both 
HDFS. (One or both HDFS could be using HDFS-RAID too).The HighTideNode monitors 
any missing replicas on both namenode, and if it finds any it will fix by 
copying data from the other namenode in the remote data center.

In short, the replication within a HDFS cluster will  occur via the NameNode as 
usual. Each NameNode will maintain fewer than 3 copies of the data. The 
replication across HDFS clusters will be coordinated by the HighTideNode. It 
invokes the -list-corruptFiles RPC to each NameNode periodically (every minute) 
to detect missing replicas.

*DataNodeGateway*:I envision a single HighTideNode coordinating replication 
between multiple HDFS clusters. An alternative  approach would be to do some 
sort of a GateWay approach: a specialized DataNode that exports the DataNode 
protocol and appears like a huge-big DataNode to a HDFS cluster, but instead of 
storing blocks on local disks, the GatewayDataNode would store data in a remote 
HDFS cluster. This is similar to existing NFS Gateways, e.g. NFS-CIFS 
interaction. The downside is that this design is more complex and intrusive to 
HDFS rather than being a layer on top of it.

*Mean-Time-To-Recover (MTR)*: Will this approach of having remote replicas 
increase the probability of data loss? My claim is that we should try to keep 
the MTR practically the same as it is today. If all the replicas of a block on 
HDFS1 goes missing, then the HighTideNode will first increase the replication 
factor of the equivalent file in HDFS2. This ensures that we get back to 
3-overall copies as soon as possible, thus keeping the MTR same as it is now. 
Then the HighTideNode will copy over this block from HDFS2 to HDFS1, wait for 
HDFS1 to attain a replica count of 2 before decreasing the replica count on 
HDFS2 from 3 back to 2.

*HDFS-RAID*: HighTide can co-exist with HDFS-RAID. HDFS-RAID allows us to  keep 
fewer physical copies of data + parity. The MTR from RAID is smaller compared 
to HighTide, but the savings using HighTide is way more because the 
savings-percentage does not depend on RAID-stripe size or on file-lengths. Once 
can use RAID to achieve a replication factor of 1.2 in each HDFS cluster and 
then use HighTide to have an additional 1.2 replicas on the remote HDFS cluster.

*Performance*: Map-reduce jobs could have a performance impact if the number of 
replicas are reduced from 3 to 2. So, the tradeoff is reducing the total amount 
of storage while possibly increasing job latencies.

*HBase*: With the current HBase design it is difficult to use HighTide to 
replicate across data centers. This is something that we need to delve more 
into.



      was (Author: dhruba):
    The goal of the HighTideNode is to keep only one physical replica per data 
center. This is mostly for older files that change very infrequently.The 
HighTide server watches over the two HDFS namespaces from two different 
NameNodes in two different data centers. These two equivalent namespaces will 
be populated via means that are external to HighTide. The HighTide server 
verifies (via checksum of the crc files) that two directories in the two HDFS 
contain identical data, and if so, reduces the replication factor to 2 on both 
HDFS. (One or both HDFS could be using HDFS-RAID too).The HighTideNode monitors 
any missing replicas on both namenode, and if it finds any it will fix by 
copying data from the other namenode in the remote data center.

In short, the replication within a HDFS cluster will  occur via the NameNode as 
usual. Each NameNode will maintain fewer than 3 copies of the data. The 
replication across HDFS clusters will be coordinated by the HighTideNode. It 
invokes the -list-corruptFiles RPC to each NameNode periodically (every minute) 
to detect missing replicas.

*DataNodeGateway*:I envision a single HighTideNode coordinating replication 
between multiple HDFS clusters. An alternative  approach would be to do some 
sort of a GateWay approach: a specialized DataNode that exports the DataNode 
protocol and appears like a huge-big DataNode to a HDFS cluster, but instead of 
storing blocks on local disks, the GatewayDataNode would store data in a remote 
HDFS cluster. This is similar to existing NFS Gateways, e.g. NFS-CIFS 
interaction. The downside is that this design is more complex and intrusive to 
HDFS rather than being a layer on top of it.

*Mean-Time-To-Recover (MTR)*: Will this approach of having remote replicas 
increase the probability of data loss? My claim is that we should try to keep 
the MTR practically the same as it is today. If all the replicas of a block on 
HDFS1 goes missing, then the HighTideNode will first increase the replication 
factor of the equivalent file in HDFS2. This ensures that we get back to 
3-overall copies as soon as possible, thus keeping the MTR same as it is now. 
Then the HighTideNode will copy over this block from HDFS2 to HDFS1, wait for 
HDFS1 to attain a replica count of 2 before decreasing the replica count on 
HDFS2 from 3 back to 2.

  
> HDFS across data centers: HighTide
> ----------------------------------
>
>                 Key: HDFS-1432
>                 URL: https://issues.apache.org/jira/browse/HDFS-1432
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> There are many instances when the same piece of data resides on multiple HDFS 
> clusters in different data centers.  The primary reason being that the 
> physical limitation of one data center is insufficient to host the entire 
> data set. In that case, the administrator(s) typically partition that data 
> into two  (or more) HDFS clusters on two different data centers and then 
> duplicates some subset of that data into both the HDFS clusters.
> In such a situation, there will be six physical copies of data that is 
> duplicated, three copies in one data center and another three copies in 
> another data center. It would be nice if we can keep fewer than 3 replicas on 
> each of the data centers and have the ability to fix a replica in the local 
> data center by copying data from the remote copy in the remote data center.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to