Hadoop Distributed System Problems: Does not recognise any slave nodes

Andy XUE Thu, 24 Mar 2011 05:05:22 -0700

Hi there:

I'm a new user to Hadoop and Nutch, and I am trying to run the crawler *
Nutch* on a distributed system powered by *Hadoop*. However as it turns out,
the distributed system does not recognise any slave nodes in the cluster.
I've stucked at this point for months and am desperate to look for a
solution. I appreciate if anyone would be kindly enough to spend 10 minutes
of their valuable time to help.


Thank you so much!!


This is what I currently encounter:
==================================
In order to set up Hadoop clusters, I followed the instructions described in
both of:
        http://wiki.apache.org/nutch/NutchHadoopTutorial
        http://hadoop.apache.org/common/docs/current/cluster_setup.html

The problem is that, when we have a distributed file system (*HDFS* in
Hadoop) , the files are stored on both of the computers. All data in HDFS,
which are supposed to be replicated or stored onto every computer in the
cluster, is only found on the master node. They are not replicated to other
slave nodes in the cluster, which causes the subsequent tasks such as *
jobtracker* to fail. I've attached a jobstracker log file.

It worked fine when there is only one computer (the master node) in the
cluster and everything is stored in the master node. However the problem
arises when the program tries to write files onto another computer (slave
node). The wield part is that HDFS can create folders on the slave nodes but
not the files. Therefore the HDFS folders on the slave nodes are all empty.
On the web interface (http://materNode:50070 and http://materNode:50030)
which shows the status of HDFS and jobtracker, it indicates that there is
only one active node (i.e., the master node). It fails to recognize any of
the slave nodes.

I use Nutch 1.2 and Hadoop 0.20 in the experiment.

Here are the things that I've done:
I followed the instructions in the aforementioned documentations. I created
users with identical username on multiple computers, which belong to the
same local network, with Ubuntu 10.10 installed. I set passphrase-less ssh
keys for all computers and experiments show that every node in the cluster
can *ssh* to another without the requirement of a password. I've shutdown
the firewall by "*sudo ufw disable*". I've tried to search for solutions on
the Internet, but there is no luck so far.

Appreciate for the help.

The Hadoop configuration files (*core-site.xml* <http://db.tt/co0q25s>, *
hdfs-site.xml <http://db.tt/TSK7jA6>*, *mapred-site.xml<http://db.tt/8dJoUrp>
*, and *hadoop-env.sh <http://db.tt/FztxTEw>*) and the log file with error
message (*hadoop-rui-jobtracker-ss2.log <http://db.tt/PPGhEaa>*) are linked.

p.s.: Re: Harsh J: Thank you so much for your time and reply, I've uploaded
the configuration and log files as links. The directory of *HADOOP_HOME*(i.e.,
*/home/rui/workspace/nutch/search/*) is where the '*bin/*', '*conf/*', '*
lib/*' etc are located. the '*start-all.sh*' is located at *
${HADOOP_HOME}/bin/shart-all.sh*. There is no separate directory for Hadoop.
I thought it is integrated into Nutch.

==================================

Regards
Andy
The University of Melbourne

Hadoop Distributed System Problems: Does not recognise any slave nodes

Reply via email to