Forgot to mention that the methods in my previous reply is about copy files from non hadoop cluster to a hadoop cluser. Otherwise, inter-cluster copy can be handled by hadoop distcp, see here: http://hadoop.apache.org/common/docs/current/distcp.html
Thanks, -- Michael --- On Fri, 3/5/10, jiang licht <licht_ji...@yahoo.com> wrote: From: jiang licht <licht_ji...@yahoo.com> Subject: Re: Copying files between two remote hadoop clusters To: common-user@hadoop.apache.org Date: Friday, March 5, 2010, 4:37 PM This is sth that I asked recently :) Here's a list of what I can think of 1. on remote box of data, cat filetobesent | ssh hadoopmaster 'hadoop fs -put - dstinhdfs' 2. on remote box of data, configure core-site.xml to set fs.default.name to hdfs://namenode:port and then fire a "hadoop fs -copyFromLocal" or "hadoop fs -put" as it is if your namenode is accessible from your data box or through a VPN to reach the namenode. 3. hdfs-aware gridftp, you can read more detail about it here sth that was mentioned in Brian Bockelman's reply: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201003.mbox/%3c2506096c-c00d-40ec-8751-4abd8f040...@cse.unl.edu%3e 4. you can write a data transfer tool that is HDFS-aware and will run on data box: reads data on data box and send it over network to its partner on namenode and writes directly into hadoop cluster. 5. other idea? Thanks, Michael --- On Fri, 3/5/10, zenMonkey <numan.sal...@gmail.com> wrote: From: zenMonkey <numan.sal...@gmail.com> Subject: Copying files between two remote hadoop clusters To: hadoop-u...@lucene.apache.org Date: Friday, March 5, 2010, 4:25 PM I want to write a script that pulls data (flat files) from a remote machine and pushes that into its hadoop cluster. At the moment, it is done in two steps: 1 - Secure copy the remote files 2 - Put the files into HDFS I was wondering if it was possible to optimize this by avoiding copying to local fs before pushing to hdfs; and instead write directly to hdfs. I am not sure if this is something that hadoop tools already provide. Thanks for any help. -- View this message in context: http://old.nabble.com/Copying-files-between-two-remote-hadoop-clusters-tp27799963p27799963.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.