The problem is that start-all.sh isn't all that intelligent. The way that start-all.sh works is by running start-dfs.sh and start-mapred.sh. The start-mapred.sh script always starts a job tracker on the local host and a task tracker on all of the hosts listed in slaves (it uses SSH to do the remote execution). The start-dfs.sh script always starts a name node on the local host, a data node on all of the hosts listed in slaves, and a secondary name node on all of the hosts listed in masters.
In your case, you'll want to run start-dfs.sh on slave3 and start-mapred.sh on slave2. -Joey On Tue, May 31, 2011 at 5:07 PM, Juan P. <gordoslo...@gmail.com> wrote: > Hi Guys, > I recently configured my cluster to have 2 VMs. I configured 1 > machine (slave3) to be the namenode and another to be the > jobtracker (slave2). They both work as datanode/tasktracker as well. > > Both configs have the following contents in their masters and slaves file: > *slave2* > *slave3* > > Both machines have the following contents on their mapred-site.xml file: > *<?xml version="1.0"?>* > *<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>* > * > * > *<!-- Put site-specific property overrides in this file. -->* > * > * > *<configuration>* > * <property>* > * <name>mapred.job.tracker</name>* > * <value>slave2:9001</value>* > * </property>* > *</configuration>* > > Both machines have the following contents on their core-site.xml file: > *<?xml version="1.0"?>* > *<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>* > * > * > *<!-- Put site-specific property overrides in this file. -->* > * > * > *<configuration>* > * <property>* > * <name>fs.default.name</name>* > * <value>hdfs://slave3:9000</value>* > * </property>* > *</configuration>* > > When I log into the namenode and I run the start-all.sh script, everything > but the jobtracker starts. In the log files I get the following exception: > > */************************************************************* > *STARTUP_MSG: Starting JobTracker* > *STARTUP_MSG: host = slave3/10.20.11.112* > *STARTUP_MSG: args = []* > *STARTUP_MSG: version = 0.20.2* > *STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r > 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* > *************************************************************/* > *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler > configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, > limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* > *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: > java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot > assign requested address* > * at org.apache.hadoop.ipc.Server.bind(Server.java:190)* > * at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:253)* > * at org.apache.hadoop.ipc.Server.<init>(Server.java:1026)* > * at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:488)* > * at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* > * at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1595) > * > * at > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* > * at > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* > * at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* > *Caused by: java.net.BindException: Cannot assign requested address* > * at sun.nio.ch.Net.bind(Native Method)* > * at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* > * at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > * > * at org.apache.hadoop.ipc.Server.bind(Server.java:188)* > * ... 8 more* > * > * > *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: > SHUTDOWN_MSG:* > */************************************************************* > *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* > *************************************************************/* > > > As I see it, from the lines > > *STARTUP_MSG: Starting JobTracker* > *STARTUP_MSG: host = slave3/10.20.11.112* > > the namenode (slave3) is trying to run the jobtracker locally but when it > starts the jobtracker server it binds it to the slave2 address and of course > fails: > > *Problem binding to slave2/10.20.11.166:9001* > > What do you guys think could be going wrong? > > Thanks! > Pony > -- Joseph Echeverria Cloudera, Inc. 443.305.9434