If I replace the mapred.job.tracker in hadoop-site with local, then the job seems to work:
[EMAIL PROTECTED] hadoop-0.18.1]$ bin/hadoop jar hadoop-0.18.1-examples.jar wordcount books booksOutput 08/11/14 12:06:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process : 3 08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process : 3 08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process : 3 08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process : 3 08/11/14 12:06:14 INFO mapred.JobClient: Running job: job_local_0001 08/11/14 12:06:14 INFO mapred.MapTask: numReduceTasks: 1 08/11/14 12:06:14 INFO mapred.MapTask: io.sort.mb = 100 08/11/14 12:06:14 INFO mapred.MapTask: data buffer = 79691776/99614720 08/11/14 12:06:14 INFO mapred.MapTask: record buffer = 262144/327680 08/11/14 12:06:14 INFO mapred.MapTask: Starting flush of map output 08/11/14 12:06:14 INFO mapred.MapTask: bufstart = 0; bufend = 1086784; bufvoid = 99614720 08/11/14 12:06:14 INFO mapred.MapTask: kvstart = 0; kvend = 109855; length = 327680 08/11/14 12:06:14 INFO mapred.MapTask: Index: (0, 267034, 267034) 08/11/14 12:06:14 INFO mapred.MapTask: Finished spill 0 08/11/14 12:06:15 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hadoop/books/one.txt:0+662001 08/11/14 12:06:15 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 08/11/14 12:06:15 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000000_0' to hdfs://localhost:54310/user/hadoop/booksOutput 08/11/14 12:06:15 INFO mapred.JobClient: map 100% reduce 0% 08/11/14 12:06:15 INFO mapred.MapTask: numReduceTasks: 1 08/11/14 12:06:15 INFO mapred.MapTask: io.sort.mb = 100 08/11/14 12:06:15 INFO mapred.MapTask: data buffer = 79691776/99614720 08/11/14 12:06:15 INFO mapred.MapTask: record buffer = 262144/327680 08/11/14 12:06:15 INFO mapred.MapTask: Spilling map output: buffer full = false and record full = true 08/11/14 12:06:15 INFO mapred.MapTask: bufstart = 0; bufend = 2545957; bufvoid = 99614720 08/11/14 12:06:15 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680 08/11/14 12:06:15 INFO mapred.MapTask: Starting flush of map output 08/11/14 12:06:16 INFO mapred.MapTask: Index: (0, 717078, 717078) 08/11/14 12:06:16 INFO mapred.MapTask: Finished spill 0 08/11/14 12:06:16 INFO mapred.MapTask: bufstart = 2545957; bufend = 2601773; bufvoid = 99614720 08/11/14 12:06:16 INFO mapred.MapTask: kvstart = 262144; kvend = 267975; length = 327680 08/11/14 12:06:16 INFO mapred.MapTask: Index: (0, 23156, 23156) 08/11/14 12:06:16 INFO mapred.MapTask: Finished spill 1 08/11/14 12:06:16 INFO mapred.Merger: Merging 2 sorted segments 08/11/14 12:06:16 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 740234 bytes 08/11/14 12:06:16 INFO mapred.MapTask: Index: (0, 740232, 740232) 08/11/14 12:06:16 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hadoop/books/three.txt:0+1539989 08/11/14 12:06:16 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done. 08/11/14 12:06:16 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000001_0' to hdfs://localhost:54310/user/hadoop/booksOutput 08/11/14 12:06:17 INFO mapred.MapTask: numReduceTasks: 1 08/11/14 12:06:17 INFO mapred.MapTask: io.sort.mb = 100 08/11/14 12:06:17 INFO mapred.MapTask: data buffer = 79691776/99614720 08/11/14 12:06:17 INFO mapred.MapTask: record buffer = 262144/327680 08/11/14 12:06:17 INFO mapred.MapTask: Starting flush of map output 08/11/14 12:06:17 INFO mapred.MapTask: bufstart = 0; bufend = 2387689; bufvoid = 99614720 08/11/14 12:06:17 INFO mapred.MapTask: kvstart = 0; kvend = 251356; length = 327680 08/11/14 12:06:18 INFO mapred.MapTask: Index: (0, 466648, 466648) 08/11/14 12:06:18 INFO mapred.MapTask: Finished spill 0 08/11/14 12:06:18 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hadoop/books/two.txt:0+1391690 08/11/14 12:06:18 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done. 08/11/14 12:06:18 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000002_0' to hdfs://localhost:54310/user/hadoop/booksOutput 08/11/14 12:06:18 INFO mapred.ReduceTask: Initiating final on-disk merge with 3 files 08/11/14 12:06:18 INFO mapred.Merger: Merging 3 sorted segments 08/11/14 12:06:18 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 1473914 bytes 08/11/14 12:06:18 INFO mapred.LocalJobRunner: reduce > reduce 08/11/14 12:06:18 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done. 08/11/14 12:06:18 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:54310/user/hadoop/booksOutput 08/11/14 12:06:19 INFO mapred.JobClient: Job complete: job_local_0001 08/11/14 12:06:19 INFO mapred.JobClient: Counters: 13 08/11/14 12:06:19 INFO mapred.JobClient: File Systems 08/11/14 12:06:19 INFO mapred.JobClient: HDFS bytes read=10104027 08/11/14 12:06:19 INFO mapred.JobClient: HDFS bytes written=1299408 08/11/14 12:06:19 INFO mapred.JobClient: Local bytes read=4082248 08/11/14 12:06:19 INFO mapred.JobClient: Local bytes written=6548037 08/11/14 12:06:19 INFO mapred.JobClient: Map-Reduce Framework 08/11/14 12:06:19 INFO mapred.JobClient: Reduce input groups=82301 08/11/14 12:06:19 INFO mapred.JobClient: Combine output records=102297 08/11/14 12:06:19 INFO mapred.JobClient: Map input records=77934 08/11/14 12:06:19 INFO mapred.JobClient: Reduce output records=82301 08/11/14 12:06:19 INFO mapred.JobClient: Map output bytes=6076246 08/11/14 12:06:19 INFO mapred.JobClient: Map input bytes=3593680 08/11/14 12:06:19 INFO mapred.JobClient: Combine input records=629186 08/11/14 12:06:19 INFO mapred.JobClient: Map output records=629186 08/11/14 12:06:19 INFO mapred.JobClient: Reduce input records=102297 However, I do get the following error in the jobtracker log: 2008-11-14 12:05:55,663 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.RuntimeException: Not a host:port pair: local And this in the tasktracker log: 2008-11-14 12:05:56,042 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.RuntimeException: Not a host:port pair: local KevinAWorkman wrote: > > Hello everybody, > > I’m sorry if this has already been covered somewhere else, but I’ve been > searching the web for weeks to no avail. :( > > Anyway, I am attempting to set up a single-node cluster following the > directions here: > http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) > . I get everything set up fine and attempt to start the first example > program (running the wordcount job). I format the namenode, start-all, > and copy the files from local. I then try to execute the example job, but > this is the output: > > [EMAIL PROTECTED] hadoop-0.18.1]$ bin/hadoop jar hadoop-0.18.1-examples.jar > wordcoun > t books booksOutput > 08/11/13 18:21:41 INFO mapred.FileInputFormat: Total input paths to > process : 3 > 08/11/13 18:21:41 INFO mapred.FileInputFormat: Total input paths to > process : 3 > 08/11/13 18:21:42 INFO mapred.JobClient: Running job: > job_200811131821_0001 > 08/11/13 18:21:43 INFO mapred.JobClient: map 0% reduce 0% > 08/11/13 18:21:49 INFO mapred.JobClient: map 66% reduce 0% > 08/11/13 18:21:52 INFO mapred.JobClient: map 100% reduce 0% > 08/11/13 18:21:52 INFO mapred.JobClient: Task Id : > attempt_200811131821_0001_m_000001_0, Status : FAILED > Map output lost, rescheduling: > getMapOutput(attempt_200811131821_0001_m_000001_0,0) failed : > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_200811131821_0001/attempt_200811131821_0001_m_000001_0/output/file.out.index > in any of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2402) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) > at > org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) > at > org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) > at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) > at org.mortbay.http.HttpServer.service(HttpServer.java:954) > at > org.mortbay.http.HttpConnection.service(HttpConnection.java:814) > at > org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) > at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) > at > org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) > at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) > at > org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534): > > > > So apparently the map output is being lost, and reduce can not find it. > Upon viewing the logs, I also find this error in the secondary namenode > log file: > > 2008-11-13 17:41:40,518 WARN > org.mortbay.jetty.servlet.WebApplicationContext: Configuration error on > file:/home/staff/hadoop/hadoop-0.18.1/webapps/secondary > java.io.FileNotFoundException: > file:/home/staff/hadoop/hadoop-0.18.1/webapps/secondary > at > org.mortbay.jetty.servlet.WebApplicationContext.resolveWebApp(WebApplicationContext.java:266) > at > org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:449) > at org.mortbay.util.Container.start(Container.java:72) > at org.mortbay.http.HttpServer.doStart(HttpServer.java:753) > at org.mortbay.util.Container.start(Container.java:72) > at > org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:207) > at > org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:156) > at > org.apache.hadoop.dfs.SecondaryNameNode.<init>(SecondaryNameNode.java:108) > at > org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460) > > I have the following defined in hadoop-site: > > <property> > <name>hadoop.tmp.dir</name> > <value>tmp</value> > <description>A base for other temporary directories</description> > </property> > > <property> > <name>fs.default.name</name> > <value>hdfs://localhost:54310</value> > <description>The name of the default file system. A URI whose > scheme and authority determine the FileSystem implementation. The > uri's scheme determines the config property (fs.SCHEME.impl) naming > the FileSystem implementation class. The uri's authority is used to > determine the host, port, etc. for a filesystem.</description> > </property> > > <property> > <name>mapred.job.tracker</name> > <value>localhost:54311</value> > <description>The host and port that the MapReduce job tracker runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> > </property> > > <property> > <name>dfs.replication</name> > <value>1</value> > <description>Default block replication. > The actual number of replications can be specified when the file is > created. > The default is used if replication is not specified in create time. > </description> > </property> > > <property> > <name>mapred.local.dir</name> > <value>stores</value> > <description>The local directory where MapReduce stores intermediate > data files. May be a comma-separated list of > directories on different devices in order to spread disk i/o. > Directories that do not exist are ignored. > </description> > </property> > > Does any of this look familiar to anybody? My ultimate goal is to have a > cluster running on 20 or so linux machines. I thought that running a > single-node cluster would be a good start, but so far I’ve been tackling > error after error for the past few weeks. I appreciate any help anybody > can give me! > > - Kevin Workman > > -- View this message in context: http://www.nabble.com/Could-Not-Find-file.out.index-%28Help-starting-Hadoop%21%29-tp20492507p20505093.html Sent from the Hadoop core-user mailing list archive at Nabble.com.