Hello Jaya, It would be useful to check the namenode and datanode log file. From my past experiences, sometimes I would get java.io... kind of exceptions. I think it could be relating to me using vmware. Though sometimes, namenode/master simply could not connect to datanode/slaves (showing as some ipc error message). I did the same thing to start the namenode/master and the cluster is up and healthy again. Too bad this phenomenon is not consistent so I couldn't pin point how or why this happened. Would you let me know if you find the cause? Thanks.
Best Regards Richard Yang [EMAIL PROTECTED] [EMAIL PROTECTED] -----Original Message----- From: jaylac [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 8:43 PM To: [email protected] Subject: System is hanging while executing bin/start-all.sh and bin/stop-all.sh Hi Whenever i execute the bin/start-all.sh command the slave node hangs. Sometimes the master node hanga. If i restart the system and do the job, then im getting the proper output... Same problem while stopping all the daemons... Have anyone faced this problem... Please someone tell me the solution for this... Im using two RED HAT LINUX machines... one master(10.229.62.6) and the other slave(10.229.62.56) In master node, the user name is jaya In slave node, the user name is jaya The steps which i follow is..... Edit /home/jaya/.bashrc file Here ill set the HADOOP_CONF_DIR environment variable MASTER NODE 1. Edit conf/slaves file.... Contents ==================== localhost [EMAIL PROTECTED] ==================== 2. Edit conf/hadoop-en.sh file # The java implementation to use. Required. export JAVA_HOME=/usr/java/jdk1.6.0 # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=200 export HADOOP_HOME=/opt/hadoop-0.11.0 # Extra Java runtime options. Empty by default. export HADOOP_OPTS=-server # Where log files are stored. $HADOOP_HOME/logs by default. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves Thats it.... No other changes in this file.... 3. Edit conf/hadoop-site.xml file Contents =========================================== <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>10.229.62.6:50010</value> </property> <property> <name>mapred.job.tracker</name> <value>10.229.62.6:50011</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> ==================================== SLAVE NODE 1. Edit conf/masters file.... Contents ==================== [EMAIL PROTECTED] ==================== 2. Edit conf/hadoop-en.sh file # The java implementation to use. Required. export JAVA_HOME=/usr/java/jdk1.6.0 # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=200 export HADOOP_HOME=/opt/hadoop-0.11.0 # Extra Java runtime options. Empty by default. export HADOOP_OPTS=-server # Where log files are stored. $HADOOP_HOME/logs by default. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves Thats it.... No other changes in this file.... 3. Edit conf/hadoop-site.xml file Contents =========================================== <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>10.229.62.6:50010</value> </property> <property> <name>mapred.job.tracker</name> <value>10.229.62.6:50011</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> ==================================== I've already done steps for passwordless login Thats is all........... Then ill perform the following operations.... In the HADOOP_HOME directory, [EMAIL PROTECTED] hadoop-0.11.0]$ bin/hadoop namenode -format Re-format filesystem in /tmp/hadoop-146736/dfs/name ? (Y or N) Y Formatted /tmp/hadoop-146736/dfs/name [EMAIL PROTECTED] hadoop-0.11.0]$ Then [EMAIL PROTECTED] hadoop-0.11.0]$ bin/start-all.sh tarting namenode, logging to /opt/hadoop-0.11.0/logs/hadoop-jaya-namenode-local host.localdomain.out 10.229.62.109: starting datanode, logging to /opt/hadoop-0.11.0/logs/hadoop-jaya -datanode-auriga.out localhost: starting datanode, logging to /opt/hadoop-0.11.0/logs/hadoop-jaya-dat anode-localhost.localdomain.out localhost: starting secondarynamenode, logging to /opt/hadoop-0.11.0/logs/hadoop -jaya-secondarynamenode-localhost.localdomain.out starting jobtracker, logging to /opt/hadoop-0.11.0/logs/hadoop-jaya-jobtracker-l ocalhost.localdomain.out 10.229.62.109: starting tasktracker, logging to /opt/hadoop-0.11.0/logs/hadoop-j aya-tasktracker-auriga.out localhost: starting tasktracker, logging to /opt/hadoop-0.11.0/logs/hadoop-jaya- tasktracker-localhost.localdomain.out At this time the slave node gets hanged.... ill reastart the slave node.... Then im getting the proper output when i execute "bin/hadoop jar hadoop-0.11.0-examples.jar input output" Similarly, when i stop the daemons, the slave node is hanging... Sometimes the master node is also hanging... Please help me asap........ Thanks in advance Jaya -- View this message in context: http://www.nabble.com/System-is-hanging-while-executing-bin-start-all.sh-and -bin-stop-all.sh-tf3451821.html#a9628679 Sent from the Hadoop Users mailing list archive at Nabble.com.
