Re: Cross-data centre DFS communication?
On Feb 28, 2008, at 2:43 AM, Miles Osborne wrote: Currently, we have the following setup: --cluster A, running Nutch: small RAM per node --cluster B, just running Hadoop: lots of RAM per node At some point in the future we will want cluster B to talk to cluster A, and ideally this should be DFS-to-DFS Is this possible? Or do we need to do something like: Cluster A -- Unix filesystem -- Cluster B via hadoop dfs -cat / -put operations etc To copy between clusters, there is a tool called distcp. Look at bin/ hadoop distcp. It runs a map/reduce job that copies a group of files. It can also be used to copy between versions of hadoop, if the source file system is hftp, which uses xml to read hdfs. -- Owen
Re: Cross-data centre DFS communication?
Owen O'Malley wrote: To copy between clusters, there is a tool called distcp. Look at bin/hadoop distcp. It runs a map/reduce job that copies a group of files. It can also be used to copy between versions of hadoop, if the source file system is hftp, which uses xml to read hdfs. Can you further explain the hftp part of this? I'm not familiar with that. We have a similar need to go cross-data center. In an earlier post it was suggested that there was no map/reduce model for that so this sounds more like what we're looking for. -- Steve Sapovits Invite Media - http://www.invitemedia.com [EMAIL PROTECTED]
Re: Cross-data centre DFS communication?
On Feb 28, 2008, at 8:20 AM, Steve Sapovits wrote: Can you further explain the hftp part of this? I'm not familiar with that. We have a similar need to go cross-data center. Sure, the info server on the name node of HDFS has a read-only interface that lists directories in xml and allows the client to read files over http. There is a FileSystem implementation that provides the client side interface to the xml/http access. To use it, you need a path with hftp as the protocol: hadoop distcp hftp://namenode1:50070/foo/bar hdfs://namenode2:8020/foo In an earlier post it was suggested that there was no map/reduce model for that so this sounds more like what we're looking for. It isn't a good idea to run map/reduce jobs across clusters, so you usually need to copy the data locally. -- Owen
Re: Cross-data centre DFS communication?
Owen O'Malley wrote: Sure, the info server on the name node of HDFS has a read-only interface that lists directories in xml and allows the client to read files over http. There is a FileSystem implementation that provides the client side interface to the xml/http access. To use it, you need a path with hftp as the protocol: hadoop distcp hftp://namenode1:50070/foo/bar hdfs://namenode2:8020/foo Very useful. Thanks. -- Steve Sapovits Invite Media - http://www.invitemedia.com [EMAIL PROTECTED]