You might find this useful as well:

"What are the ways of importing data to HDFS from remote locations? I need
this process to be well-managed and automated.

Here are just some of the options. First you should look at available HDFS
shell commands. For large inter/intra-cluster copying distcp might work best
for you. For moving data from RDBMS system you should check Sqoop. To
automate moving (constantly produced) data from many different locations
refer to Flume. You might also want to look at Chukwa (data collection
system for monitoring large distributed systems) and Scribe (server for
aggregating log data streamed in real time from a large number of servers)."

(see http://blog.sematext.com/2010/08/02/hadoop-digest-july-2010/ with
better formatting and links ;))

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nautch - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/

On Tue, Sep 7, 2010 at 12:14 AM, Mark <static.void....@gmail.com> wrote:

>  Thanks.. ill give that a try
>
>
> On 9/6/10 2:02 PM, Harsh J wrote:
>
>> Java: You can use a DFSClient instance with a proper config object
>> (Configuration) from right about anywhere - basically all that matters is
>> the right fs.default.name value, which is your namenode's communication
>> point.
>>
>> Can even use hadoop installation's 'bin/hadoop dfs' on a remote node
>> (without it acting as a proper node i.e. not in slaves or masters list) if
>> you want to use the scripts.
>>
>> On 7 Sep 2010 01:43, "Mark"<static.void....@gmail.com>  wrote:
>>
>>  How do I go about uploading content from a remote machine to the hadoop
>> cluster? Do I have to first move the data to one of the nodes and then do
>> a
>> fs -put or is there some client I can use to just access an existing
>> cluster?
>>
>> Thanks
>>
>>

Reply via email to