Hi Ben,

I've been down this same path recently and I think I understand your
issues:

1) Yes, you need the hadoop folder to be in the same location on each
node. Only the master node actually uses the slaves file, to start up
DataNode and JobTracker daemons on those nodes.
2) If you did not specify any slave nodes on your master node then the
start-all did not create these processes on any nodes other than master.
This node can be accessed and the dfs written to from other machines as
you can do but there is no replication since there is only one DataNode.

Try running jps on your other nodes to verify this, and access the
NameNode web page to see what slaves you actually have running. By
adding your slave nodes to the slaves file on your master and bouncing
hadoop you should see a big difference in the size of your cluster.

Good luck, it's an adventure,
Jeff

-----Original Message-----
From: Ben Kucinich [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 07, 2008 10:52 AM
To: core-user@hadoop.apache.org
Subject: Starting up a larger cluster

In the Nutch wiki, I was reading this
http://wiki.apache.org/hadoop/GettingStartedWithHadoop

I have problems understanding this section:

== Starting up a larger cluster ==

 Ensure that the Hadoop package is accessible from the same path on
all nodes that are to be included in the cluster. If you have
separated configuration from the install then ensure that the config
directory is also accessible the same way.
 Populate the slaves file with the nodes to be included in the
cluster. One node per line.

1) Does the first line mean, that I have to place the hadoop folder in
exactly the same location on every slave node? For example, if I put
hadoop home directory in my /usr/local/ in master node it should be
present in /user/local/ in all the slave nodes as well?

2) I ran start-all.sh in one node (192.168.1.2) with fs.default.name
as 192.168.1.2:9000 and mapred.job.tracker as 192.168.1.2:9001. So, I
believe this will play the role of master node. I did not populate the
slaves file with any slave nodes. But in many other systems,
192.168.1.3, 192.168.1.4, etc. I made the same settings in
hadoop-site.xml. So I believe these are slave nodes. Now in the slave
nodes I ran commands like bin/hadoop -dfs put dir newdir and the
newdir was created in the DFS. I wonder how the master node allowed
the slave nodes to put the files even though I did not populate the
slaves file.

Please help me with these queries since I am new to Hadoop.

Reply via email to