On Feb 28, 2008, at 8:20 AM, Steve Sapovits wrote:

Can you further explain the hftp part of this? I'm not familiar with that. We have a similar need to go cross-data center.

Sure, the info server on the name node of HDFS has a read-only interface that lists directories in xml and allows the client to read files over http. There is a FileSystem implementation that provides the client side interface to the xml/http access.

To use it, you need a path with hftp as the protocol:
hadoop distcp hftp://namenode1:50070/foo/bar hdfs://namenode2:8020/foo


In an earlier post it
was suggested that there was no map/reduce model for that so this
sounds more like what we're looking for.

It isn't a good idea to run map/reduce jobs across clusters, so you usually need to copy the data locally.

-- Owen

Reply via email to