[ 
https://issues.apache.org/jira/browse/HDFS-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916520#action_12916520
 ] 

Joydeep Sen Sarma commented on HDFS-1432:
-----------------------------------------

who manages the replication in both the data centers? it seems simpler to put 
the logic for asserting equivalence at the same layer that manages this 
replication (at the time of replicating the data - the replication agent knows 
that the two sides are now equivalent).

the follow-up question then is why hightidenode itself does not take care of 
the cross data center file replication?

the other problem with choosing one of the two file systems and then passing 
that down to the client is that the client cannot recover from bad/missing 
block problems that have yet to be healed by the system. it seems better to 
have a hightide client side file system layer that can switch to the survivor 
copy in the other data center (just like HDFS-RAID). 

in short - i think things seem more consistent if hightide (node/client) takes 
care of  discovery/replication/recovery of data transparently across the two 
data centers (with some policy on which parts of the namespace need replication 
to which data centers). 

---

in certain cases - we do not want complete location transparency. for example - 
a client may want to execute a job only when all required data sets  have been 
replicated to the data center where the job will run. that can be done easily 
by having new api's (provided by the hightidenode) regarding available data 
centers for a given part of the namespace. then the caller can wait for 
availability at the desired data center.

---

the other aspect that is not addressed here is concurrent modification 
(regardless of who manages the replication, there is replication lag and we 
have to account for this). i will assume for now that hightide manages 
replication and gives the appearance of a single file system. a crc checksum 
can distinguish a modified free tree from the original - but it cannot figure 
out which one is the later of the two.

at least for our use case - a simple 'latest wins' policy is good enough:
- the client application generates a file id by means external to HighTide
- it creates files/directories stamped with this id
- when hightide replicates data to a remote data center, before overwriting the 
existing data if any - it checks whether the incoming id is bigger than the 
existing one - and only then performs the overwrite.

for us - an id encoding the timestamp will be sufficient (since we have a 
central entity doling out timestamps). but in general - we would want to keep 
this abstract (more generally - at least <Data-Center, time> tuple would be 
required).

another option that may be feasible in certain scenarios is that hightide 
prevents concurrent updates to a part of the namespace that is required to be 
replicated - but has not been replicated so far. however - this makes the 
system very tightly coupled and will not have good availability characteristics.

> HDFS across data centers: HighTide
> ----------------------------------
>
>                 Key: HDFS-1432
>                 URL: https://issues.apache.org/jira/browse/HDFS-1432
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> There are many instances when the same piece of data resides on multiple HDFS 
> clusters in different data centers.  The primary reason being that the 
> physical limitation of one data center is insufficient to host the entire 
> data set. In that case, the administrator(s) typically partition that data 
> into two  (or more) HDFS clusters on two different data centers and then 
> duplicates some subset of that data into both the HDFS clusters.
> In such a situation, there will be six physical copies of data that is 
> duplicated, three copies in one data center and another three copies in 
> another data center. It would be nice if we can keep fewer than 3 replicas on 
> each of the data centers and have the ability to fix a replica in the local 
> data center by copying data from the remote copy in the remote data center.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to