On Jun 14, 2009, at 11:01 PM, Sugandha Naolekar wrote:
Hello!
I have a 4 node cluster of hadoop running. Now, there is 5th machine
which
is acting as a client of hadoop. It's not a part of the hadoop
cluster(master/slave config file). Now I have to writer a JAVA code
that
gets executed on this client which will simply put the client
ystem's data
into HDFS(and get it replicated over 2 datanodes) and as per my
requirement,
I can simply fetch it back on the client machine itself.
For this, I have done following things as of now::
***********************************
-> Among 4 nodes 2 are datanodes and ther oter 2 are namenode and
jobtracker
respectively.
***********************************
***********************************
-> Now, to make that code work on client machine, I have designed a
UI. Now
here on the client m/c, do i need to install hadoop?
***********************************
You will need to have the same version of Hadoop installed on any
client that need to communicate with the Hadoop cluster.
***********************************
-> I have installed hadoop on it, and in it's config file, I have
specified
only 2 tags.
1) fs.default.name-> value=namenode's address.
2) dfs.http.address(namenode's addres)
***********************************
I'm assuming you mean that you have Hadoop installed on the client
with a hadoop-site.xml (or core-site.xml) with the correct
fs.default.name. Correct?
***********************************
Thus, If there is a file in /home/hadoop/test.java on client
machine; I will
have 2 get the instance of HDFS fs by Filesystem.get. rt??
***********************************
Before you begin writing special FileSystem Java code, I would do a
quick sanity check of the client configuration.
Can you run the command...
% bin/hadoop fs -ls
...without error?
Can you -put files onto HDFS from the client...
% bin/hadoop fs -put <src> <dst>
...without error?
* You should also check your firewall rules between the client and
NameNode.
* Make sure that the TCP port you specified in fs.default.name is open
for connection from the client.
* Run "netstat -t -l" to make sure that the NameNode is running and
listening on the TCP port you specified.
Only when you've ensured that the hadoop commandline works would I
begin writing custom client code based on the FileSystem class.
***********************************
Then, by using Filesystem.util, I will have to simply specify both the
fs::local as src, hdfs as destination, and src path as the
/home/hadoop/test.java and destination as /user/hadoop/. rt??
So it should work ...!
***********************************
***********************************
-> But, it gives me an error as "not able to find src path
/home/hadoop/test.java"
-> Will i have to use RPC classes and methods under hadoop api to do
this.??
***********************************
You should be able to just use the FileSystem class to w/o needing to
use any RPC classes
FileSystem documentation:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html
***********************************
Things don;t seem to be working in any of the ways. Please help me
out.
***********************************
Thanks!