A Df, Setting up a proper cluster on a sane network environment is as easy as setting up a pseudo-distributed one.
Some questions: - What OS are you deploying hadoop here on? - Do you have bash? What version of bash is available? - What user/group are you running hadoop as? Is it consistent across all slaves+master? What I usually used to do to run a fresh cluster is: - Ensure that I can ssh from my master to any slave, without a password (as hadoop's scripts require). - Place Hadoop at a common location across all the machines (Be wary of NFS mounts here, you don't want datanode's dfs.data.dir directories on NFS mounts for example) - Write out a configuration set and pass it to all nodes. - Issue a namenode format, and then start-all.sh from the master. Perhaps, if your environment supports it, you can ease things out with the use of the free tool SCM Express [1] and the likes. These tools have a wizard-like interface and point out common issues as you go about setting up and running your cluster. [1] - http://www.cloudera.com/products-services/scm-express/ On Wed, Aug 17, 2011 at 5:12 PM, A Df <[email protected]> wrote: > Hello Everyone: > > I am adding the contents of my config file in the hopes that someone will be > able to help. See inline for the discussions. I really don't understand why > it works in pseudo-mode but gives so much problems in cluster. I have tried > the instructions from the Apache cluster setup, Yahoo Development Network and > from Michael Noll's tutorial. > > w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat core-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://ngs.uni.ac.uk:3000</value> > </property> > <property> > <name>HADOOP_LOG_DIR</name> > <value>/home/w1153435/hadoop-0.20.2_cluster/var/log/hadoop</value> > </property> > <property> > <name>hadoop.tmp.dir</name> > <value>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop</value> > </property> > </configuration> > > w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat hdfs-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>dfs.replication</name> > <value>3</value> > </property> > <property> > <name>dfs.http.address</name> > <value>0.0.0.0:3500</value> > </property> > <property> > <name>dfs.data.dir</name> > <value>/home/w1153435/hadoop-0.20.2_cluster/dfs/data</value> > <final>true</final> > </property> > <property> > <name>dfs.name.dir</name> > <value>/home/w1153435/hadoop-0.20.2_cluster/dfs/name</value> > <final>true</final> > </property> > </configuration> > > w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat mapred-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>ngs.uni.ac.uk:3001</value> > </property> > <property> > <name>mapred.system.dir</name> > <value>/home/w1153435/hadoop-0.20.2_cluster/mapred/system</value> > </property> > <property> > <name>mapred.map.tasks</name> > <value>80</value> > </property> > <property> > <name>mapred.reduce.tasks</name> > <value>16</value> > </property> > > </configuration> > > In addition: > > w1153435@ngs:~/hadoop-0.20.2_cluster> bin/hadoop dfsadmin -report > Configured Capacity: 0 (0 KB) > Present Capacity: 0 (0 KB) > DFS Remaining: 0 (0 KB) > DFS Used: 0 (0 KB) > DFS Used%: �% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > > ------------------------------------------------- > Datanodes available: 1 (1 total, 0 dead) > > Name: 161.74.12.36:50010 > Decommission Status : Normal > Configured Capacity: 0 (0 KB) > DFS Used: 0 (0 KB) > Non DFS Used: 0 (0 KB) > DFS Remaining: 0(0 KB) > DFS Used%: 100% > DFS Remaining%: 0% > Last contact: Wed Aug 17 12:40:17 BST 2011 > > Cheers, > A Df > >>________________________________ >>From: A Df <[email protected]> >>To: "[email protected]" <[email protected]>; >>"[email protected]" <[email protected]> >>Sent: Tuesday, 16 August 2011, 16:20 >>Subject: Re: hadoop cluster mode not starting up >> >> >> >>See inline: >> >> >>>________________________________ >>>From: shanmuganathan.r <[email protected]> >>>To: [email protected] >>>Sent: Tuesday, 16 August 2011, 13:35 >>>Subject: Re: hadoop cluster mode not starting up >>> >>>Hi Df, >>> >>> Are you use the IP instead of names in conf/masters and conf/slaves . >>>For running the secondary namenode in separate machine refer the following >>>link >>> >>> >>>=Yes, I use the names in those files but the ip address are mapped to the >>>names in the /extras/hosts file. Does this cause problems? >>> >>> >>>http://www.hadoop-blog.com/2010/12/secondarynamenode-process-is-starting.html >>> >>> >>>=I want to making too many changes so I will stick to having the master be >>>both namenode and secondarynamenode. I tried starting up the hdfs and >>>mapreduce but the jobtracker is not running on the master and their is still >>>errors regarding the datanodes because only 5 of 7 datanodes have >>>tasktracker. I ran both commands for to start the hdfs and mapreduce so why >>>is the jobtracker missing? >>> >>>Regards, >>> >>>Shanmuganathan >>> >>> >>> >>>---- On Tue, 16 Aug 2011 17:06:04 +0530 A Df<[email protected]> >>>wrote ---- >>> >>> >>>I already used a few tutorials as follows: >>> * Hadoop Tutorial on Yahoo Developer network which uses an old hadoop >>>and thus older conf files. >>> >>> * >>>http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ >>> which only has two nodes and the master acts as namenode and secondary >>>namenode. I need one with more than that. >>> >>> >>>Is there a way to prevent the node from using the central file system >>>because I don't have root permission and my user folder is in a central file >>>system which is replicated on all the nodes? >>> >>>See inline too for my responses >>> >>> >>> >>>>________________________________ >>>>From: Steve Loughran <[email protected]> >>>>To: [email protected] >>>>Sent: Tuesday, 16 August 2011, 12:08 >>>>Subject: Re: hadoop cluster mode not starting up >>>> >>>>On 16/08/11 11:19, A Df wrote: >>>>> See inline >>>>> >>>>> >>>>> >>>>>> ________________________________ >>>>>> From: Steve Loughran<[email protected]> >>>>>> To: [email protected] >>>>>> Sent: Tuesday, 16 August 2011, 11:08 >>>>>> Subject: Re: hadoop cluster mode not starting up >>>>>> >>>>>> On 16/08/11 11:02, A Df wrote: >>>>>>> Hello All: >>>>>>> >>>>>>> I used a combination of tutorials to setup hadoop but most seems to be >>>>>>> using either an old version of hadoop or only using 2 machines for the >>>>>>> cluster which isn't really a cluster. Does anyone know of a good >>>>>>> tutorial which setups multiple nodes for a cluster?? I already looked >>>>>>> at the Apache website but it does not give sample values for the conf >>>>>>> files. Also each set of tutorials seem to have a different set of >>>>>>> parameters which they indicate should be changed so now its a bit >>>>>>> confusing. For example, my configuration sets a dedicate namenode, >>>>>>> secondary namenode and 8 slave nodes but when I run the start command >>>>>>> it gives an error. Should I install hadoop to my user directory or on >>>>>>> the root? I have it in my directory but all the nodes have a central >>>>>>> file system as opposed to distributed so whatever I do on one node in >>>>>>> my user folder it affect all the others so how do i set the paths to >>>>>>> ensure that it uses a distributed system? >>>>>>> >>>>>>> For the errors below, I checked the directories and the files are >>>>>>> there. Am I not sure what went wrong and how to set the conf to not >>>>>>> have central file system. Thank you. >>>>>>> >>>>>>> Error message >>>>>>> CODE >>>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh >>>>>>> bin/start-dfs.sh: line 28: >>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or >>>>>>> directory >>>>>>> bin/start-dfs.sh: line 50: >>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or >>>>>>> directory >>>>>>> bin/start-dfs.sh: line 51: >>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >>>>>>> directory >>>>>>> bin/start-dfs.sh: line 52: >>>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >>>>>>> directory >>>>>>> CODE >>>>>> >>>>>> there's No such file or directory as >>>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh >>>>>> >>>>>> >>>>>> There is, I checked as shown >>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> ls bin >>>>>> hadoop rcc start-dfs.sh stop-dfs.sh >>>>>> hadoop-config.sh slaves.sh start-mapred.sh stop-mapred.sh >>>>>> hadoop-daemon.sh start-all.sh stop-all.sh >>>>>> hadoop-daemons.sh start-balancer.sh stop-balancer.sh >>>> >>>>try "pwd" to print out where the OS thinks you are, as it doesn't seem >>>>to be where you think you are >>>> >>>> >>>>w1153435@ngs:~/hadoop-0.20.2_cluster> pwd >>>>/home/w1153435/hadoop-0.20.2_cluster >>>> >>>> >>>>w1153435@ngs:~/hadoop-0.20.2_cluster/bin> pwd >>>>/home/w1153435/hadoop-0.20.2_cluster/bin >>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I had tried running this command below earlier but also got problems: >>>>>>> CODE >>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> export >>>>>>> HADOOP_CONF_DIR=${HADOOP_HOME}/conf >>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> export >>>>>>> HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh >>>>>>> "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >>>>>>> -bash: /bin/slaves.sh: No such file or directory >>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> export >>>>>>> HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >>>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh >>>>>>> "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >>>>>>> cat: /conf/slaves: No such file or directory >>>>>>> CODE >>>>>>> >>>>>> there's No such file or directory as /conf/slaves because you set >>>>>> HADOOP_HOME after setting the other env variables, which are expanded at >>>>>> set-time, not run-time. >>>>>> >>>>>> I redid the command but still have errors on the slaves >>>>>> >>>>>> >>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>>>>> HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>>>>> HADOOP_CONF_DIR=${HADOOP_HOME}/conf >>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>>>>> HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh >>>>>> "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >>>>>> privn51: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn58: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn52: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn55: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn57: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn54: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn53: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>>>> privn56: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: >>>>>> No such file or directory >>>> >>>>try ssh-ing in, do it by hand, make sure you have the right permissions etc >>>> >>>> >>>>I reset the above path variables again and checked that they existed and >>>>tried the command above but same error. I used ssh with no problems and no >>>>password request so that is fine. What else could be wrong? >>>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_HOME >>>> /home/w1153435/hadoop-0.20.2_cluster >>>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_CONF_DIR >>>> /home/w1153435/hadoop-0.20.2_cluster/conf >>>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_SLAVES >>>> /home/w1153435/hadoop-0.20.2_cluster/conf/slaves >>>>w1153435@ngs:~/hadoop-0.20.2_cluster> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >> >> -- Harsh J
