Re: hadoop cluster mode not starting up

Harsh J Wed, 17 Aug 2011 04:59:00 -0700

A Df,

Setting up a proper cluster on a sane network environment is as easy
as setting up a pseudo-distributed one.


Some questions:
- What OS are you deploying hadoop here on?
- Do you have bash? What version of bash is available?
- What user/group are you running hadoop as? Is it consistent across
all slaves+master?

What I usually used to do to run a fresh cluster is:

- Ensure that I can ssh from my master to any slave, without a
password (as hadoop's scripts require).
- Place Hadoop at a common location across all the machines (Be wary
of NFS mounts here, you don't want datanode's dfs.data.dir directories
on NFS mounts for example)
- Write out a configuration set and pass it to all nodes.
- Issue a namenode format, and then start-all.sh from the master.

Perhaps, if your environment supports it, you can ease things out with
the use of the free tool SCM Express [1] and the likes. These tools
have a wizard-like interface and point out common issues as you go
about setting up and running your cluster.

[1] - http://www.cloudera.com/products-services/scm-express/

On Wed, Aug 17, 2011 at 5:12 PM, A Df <[email protected]> wrote:
> Hello Everyone:
>
> I am adding the contents of my config file in the hopes that someone will be 
> able to help. See inline for the discussions. I really don't understand why 
> it works in pseudo-mode but gives so much problems in cluster. I have tried 
> the instructions from the Apache cluster setup, Yahoo Development Network and 
> from Michael Noll's tutorial.
>
> w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat core-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
>
> <configuration>
>      <property>
>          <name>fs.default.name</name>
>          <value>hdfs://ngs.uni.ac.uk:3000</value>
>      </property>
>      <property>
> <name>HADOOP_LOG_DIR</name>
>          <value>/home/w1153435/hadoop-0.20.2_cluster/var/log/hadoop</value>
>      </property>
>  <property>
> <name>hadoop.tmp.dir</name>
>          <value>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop</value>
>      </property>
> </configuration>
>
> w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat hdfs-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>      <property>
>          <name>dfs.replication</name>
>          <value>3</value>
>      </property>
>          <property>
>          <name>dfs.http.address</name>
>          <value>0.0.0.0:3500</value>
>      </property>
> <property>
>     <name>dfs.data.dir</name>
>     <value>/home/w1153435/hadoop-0.20.2_cluster/dfs/data</value>
>     <final>true</final>
>   </property>
>   <property>
>     <name>dfs.name.dir</name>
>     <value>/home/w1153435/hadoop-0.20.2_cluster/dfs/name</value>
>     <final>true</final>
>   </property>
> </configuration>
>
> w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat mapred-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>      <property>
>          <name>mapred.job.tracker</name>
>          <value>ngs.uni.ac.uk:3001</value>
>      </property>
> <property>
>          <name>mapred.system.dir</name>
>          <value>/home/w1153435/hadoop-0.20.2_cluster/mapred/system</value>
>      </property>
> <property>
>          <name>mapred.map.tasks</name>
>          <value>80</value>
>      </property>
> <property>
>          <name>mapred.reduce.tasks</name>
>          <value>16</value>
>      </property>
>
> </configuration>
>
> In addition:
>
> w1153435@ngs:~/hadoop-0.20.2_cluster> bin/hadoop dfsadmin -report
> Configured Capacity: 0 (0 KB)
> Present Capacity: 0 (0 KB)
> DFS Remaining: 0 (0 KB)
> DFS Used: 0 (0 KB)
> DFS Used%: ï¿½%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
>
> Name: 161.74.12.36:50010
> Decommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 0 (0 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Wed Aug 17 12:40:17 BST 2011
>
> Cheers,
> A Df
>
>>________________________________
>>From: A Df <[email protected]>
>>To: "[email protected]" <[email protected]>; 
>>"[email protected]" <[email protected]>
>>Sent: Tuesday, 16 August 2011, 16:20
>>Subject: Re: hadoop cluster mode not starting up
>>
>>
>>
>>See inline:
>>
>>
>>>________________________________
>>>From: shanmuganathan.r <[email protected]>
>>>To: [email protected]
>>>Sent: Tuesday, 16 August 2011, 13:35
>>>Subject: Re: hadoop cluster mode not starting up
>>>
>>>Hi Df,
>>>
>>>      Are you use the IP instead of names in conf/masters and conf/slaves . 
>>>For running the secondary namenode in separate machine refer the following 
>>>link
>>>
>>>
>>>=Yes, I use the names in those files but the ip address are mapped to the 
>>>names in the /extras/hosts file. Does this cause problems?
>>>
>>>
>>>http://www.hadoop-blog.com/2010/12/secondarynamenode-process-is-starting.html
>>>
>>>
>>>=I want to making too many changes so I will stick to having the master be 
>>>both namenode and secondarynamenode. I tried starting up the hdfs and 
>>>mapreduce but the jobtracker is not running on the master and their is still 
>>>errors regarding the datanodes because only 5 of 7 datanodes have 
>>>tasktracker. I ran both commands for to start the hdfs and mapreduce so why 
>>>is the jobtracker missing?
>>>
>>>Regards,
>>>
>>>Shanmuganathan
>>>
>>>
>>>
>>>---- On Tue, 16 Aug 2011 17:06:04 +0530 A Df<[email protected]> 
>>>wrote ----
>>>
>>>
>>>I already used a few tutorials as follows:
>>>    * Hadoop Tutorial on Yahoo Developer network which uses an old hadoop 
>>>and thus older conf files.
>>>
>>>    * 
>>>http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>>> which only has two nodes and the master acts as namenode and secondary 
>>>namenode. I need one with more than that.
>>>
>>>
>>>Is there a way to prevent the node from using the central file system 
>>>because I don't have root permission and my user folder is in a central file 
>>>system which is replicated on all the nodes?
>>>
>>>See inline too for my responses
>>>
>>>
>>>
>>>>________________________________
>>>>From: Steve Loughran <[email protected]>
>>>>To: [email protected]
>>>>Sent: Tuesday, 16 August 2011, 12:08
>>>>Subject: Re: hadoop cluster mode not starting up
>>>>
>>>>On 16/08/11 11:19, A Df wrote:
>>>>> See inline
>>>>>
>>>>>
>>>>>
>>>>>> ________________________________
>>>>>> From: Steve Loughran<[email protected]>
>>>>>> To: [email protected]
>>>>>> Sent: Tuesday, 16 August 2011, 11:08
>>>>>> Subject: Re: hadoop cluster mode not starting up
>>>>>>
>>>>>> On 16/08/11 11:02, A Df wrote:
>>>>>>> Hello All:
>>>>>>>
>>>>>>> I used a combination of tutorials to setup hadoop but most seems to be 
>>>>>>> using either an old version of hadoop or only using 2 machines for the 
>>>>>>> cluster which isn't really a cluster. Does anyone know of a good 
>>>>>>> tutorial which setups multiple nodes for a cluster?? I already looked 
>>>>>>> at the Apache website but it does not give sample values for the conf 
>>>>>>> files. Also each set of tutorials seem to have a different set of 
>>>>>>> parameters which they indicate should be changed so now its a bit 
>>>>>>> confusing. For example, my configuration sets a dedicate namenode, 
>>>>>>> secondary namenode and 8 slave nodes but when I run the start command 
>>>>>>> it gives an error. Should I install hadoop to my user directory or on 
>>>>>>> the root? I have it in my directory but all the nodes have a central 
>>>>>>> file system as opposed to distributed so whatever I do on one node in 
>>>>>>> my user folder it affect all the others so how do i set the paths to 
>>>>>>> ensure that it uses a distributed system?
>>>>>>>
>>>>>>> For the errors below, I checked the directories and the files are 
>>>>>>> there. Am I not sure what went wrong and how to set the conf to not 
>>>>>>> have central file system. Thank you.
>>>>>>>
>>>>>>> Error message
>>>>>>> CODE
>>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster>  bin/start-dfs.sh
>>>>>>> bin/start-dfs.sh: line 28: 
>>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or 
>>>>>>> directory
>>>>>>> bin/start-dfs.sh: line 50: 
>>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or 
>>>>>>> directory
>>>>>>> bin/start-dfs.sh: line 51: 
>>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or 
>>>>>>> directory
>>>>>>> bin/start-dfs.sh: line 52: 
>>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or 
>>>>>>> directory
>>>>>>> CODE
>>>>>>
>>>>>> there's  No such file or directory as
>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh
>>>>>>
>>>>>>
>>>>>> There is, I checked as shown
>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster>  ls bin
>>>>>> hadoop            rcc                start-dfs.sh      stop-dfs.sh
>>>>>> hadoop-config.sh  slaves.sh          start-mapred.sh  stop-mapred.sh
>>>>>> hadoop-daemon.sh  start-all.sh      stop-all.sh
>>>>>> hadoop-daemons.sh  start-balancer.sh  stop-balancer.sh
>>>>
>>>>try "pwd" to print out where the OS thinks you are, as it doesn't seem
>>>>to be where you think you are
>>>>
>>>>
>>>>w1153435@ngs:~/hadoop-0.20.2_cluster> pwd
>>>>/home/w1153435/hadoop-0.20.2_cluster
>>>>
>>>>
>>>>w1153435@ngs:~/hadoop-0.20.2_cluster/bin> pwd
>>>>/home/w1153435/hadoop-0.20.2_cluster/bin
>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I had tried running this command below earlier but also got problems:
>>>>>>> CODE
>>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster>  export 
>>>>>>> HADOOP_CONF_DIR=${HADOOP_HOME}/conf
>>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster>  export 
>>>>>>> HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves
>>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster>  ${HADOOP_HOME}/bin/slaves.sh 
>>>>>>> "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop"
>>>>>>> -bash: /bin/slaves.sh: No such file or directory
>>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster>  export 
>>>>>>> HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster
>>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster>  ${HADOOP_HOME}/bin/slaves.sh 
>>>>>>> "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop"
>>>>>>> cat: /conf/slaves: No such file or directory
>>>>>>> CODE
>>>>>>>
>>>>>> there's  No such file or directory as /conf/slaves because you set
>>>>>> HADOOP_HOME after setting the other env variables, which are expanded at
>>>>>> set-time, not run-time.
>>>>>>
>>>>>> I redid the command but still have errors on the slaves
>>>>>>
>>>>>>
>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster>  export 
>>>>>> HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster
>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster>  export 
>>>>>> HADOOP_CONF_DIR=${HADOOP_HOME}/conf
>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster>  export 
>>>>>> HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves
>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster>  ${HADOOP_HOME}/bin/slaves.sh 
>>>>>> "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop"
>>>>>> privn51: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn58: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn52: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn55: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn57: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn54: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn53: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>>> privn56: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: 
>>>>>> No such file or directory
>>>>
>>>>try ssh-ing in, do it by hand, make sure you have the right permissions etc
>>>>
>>>>
>>>>I reset the above path variables again and checked that they existed and 
>>>>tried the command above but same error. I used ssh with no problems and no 
>>>>password request so that is fine. What else could be wrong?
>>>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_HOME                     
>>>>    /home/w1153435/hadoop-0.20.2_cluster
>>>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_CONF_DIR                 
>>>>    /home/w1153435/hadoop-0.20.2_cluster/conf
>>>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_SLAVES                   
>>>>    /home/w1153435/hadoop-0.20.2_cluster/conf/slaves
>>>>w1153435@ngs:~/hadoop-0.20.2_cluster>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>



-- 
Harsh J

Re: hadoop cluster mode not starting up

Reply via email to