[ 
https://issues.apache.org/jira/browse/HDFS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986852#action_12986852
 ] 

Sanjay Radia commented on HDFS-1593:
------------------------------------

 In the case of the NN issuing a copy operation,  the data does not go via the 
NN, but is transferred directly;  the two DNs do have to ensure that the other 
peer is indeed a DN in the cluster; I need to look at the code more closely but 
I believe the DN generates an access token since it shares the access token 
secret with the NN.

Dhruba, you are asserting that the two clusters have the same principals; while 
this may be true in many cases it may not always be true in all environments. 
Further the secret that is used to generate access tokens is not the same in 
two different clusters (even if they have use the same principal). 

BTW we have been looking at the same problem here at Yahoo and are trying to 
figure out the best secure solution.
There is another issue you are missing; the block sizes on the two clusters may 
not be the same; their default block sizes may be different. Hence I am not 
sure if one can simply copy a block across. 
Q. are you trying to push or pull the data? In order to handle different block 
sizes it seems easier to pull the data. One choice is to access a byte range in 
a file via the DfsClient; the other choice is to get the block's bytes from 
multiple DNs in the remote src cluster. 

I do however agree that transferring data directly from one or more data nodes 
to another is desirable.



> Allow a datanode to copy a block to a datanode on a foreign HDFS cluster.
> -------------------------------------------------------------------------
>
>                 Key: HDFS-1593
>                 URL: https://issues.apache.org/jira/browse/HDFS-1593
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: copyBlockTrunk1.txt
>
>
> This patch introduces an RPC to the datanode to allow it to copy a block to a 
> datanode on a remote HDFS cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to