Hi Ben, I've been down this same path recently and I think I understand your issues:
1) Yes, you need the hadoop folder to be in the same location on each node. Only the master node actually uses the slaves file, to start up DataNode and JobTracker daemons on those nodes. 2) If you did not specify any slave nodes on your master node then the start-all did not create these processes on any nodes other than master. This node can be accessed and the dfs written to from other machines as you can do but there is no replication since there is only one DataNode. Try running jps on your other nodes to verify this, and access the NameNode web page to see what slaves you actually have running. By adding your slave nodes to the slaves file on your master and bouncing hadoop you should see a big difference in the size of your cluster. Good luck, it's an adventure, Jeff -----Original Message----- From: Ben Kucinich [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 10:52 AM To: core-user@hadoop.apache.org Subject: Starting up a larger cluster In the Nutch wiki, I was reading this http://wiki.apache.org/hadoop/GettingStartedWithHadoop I have problems understanding this section: == Starting up a larger cluster == Ensure that the Hadoop package is accessible from the same path on all nodes that are to be included in the cluster. If you have separated configuration from the install then ensure that the config directory is also accessible the same way. Populate the slaves file with the nodes to be included in the cluster. One node per line. 1) Does the first line mean, that I have to place the hadoop folder in exactly the same location on every slave node? For example, if I put hadoop home directory in my /usr/local/ in master node it should be present in /user/local/ in all the slave nodes as well? 2) I ran start-all.sh in one node (192.168.1.2) with fs.default.name as 192.168.1.2:9000 and mapred.job.tracker as 192.168.1.2:9001. So, I believe this will play the role of master node. I did not populate the slaves file with any slave nodes. But in many other systems, 192.168.1.3, 192.168.1.4, etc. I made the same settings in hadoop-site.xml. So I believe these are slave nodes. Now in the slave nodes I ran commands like bin/hadoop -dfs put dir newdir and the newdir was created in the DFS. I wonder how the master node allowed the slave nodes to put the files even though I did not populate the slaves file. Please help me with these queries since I am new to Hadoop.