Re: Could Not Find file.out.index (Help starting Hadoop!)

2008-11-14 Thread KevinAWorkman

If I replace the mapred.job.tracker in hadoop-site with local, then the job
seems to work:

[EMAIL PROTECTED] hadoop-0.18.1]$ bin/hadoop jar hadoop-0.18.1-examples.jar
wordcount books booksOutput
08/11/14 12:06:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/11/14 12:06:13 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/11/14 12:06:14 INFO mapred.JobClient: Running job: job_local_0001
08/11/14 12:06:14 INFO mapred.MapTask: numReduceTasks: 1
08/11/14 12:06:14 INFO mapred.MapTask: io.sort.mb = 100
08/11/14 12:06:14 INFO mapred.MapTask: data buffer = 79691776/99614720
08/11/14 12:06:14 INFO mapred.MapTask: record buffer = 262144/327680
08/11/14 12:06:14 INFO mapred.MapTask: Starting flush of map output
08/11/14 12:06:14 INFO mapred.MapTask: bufstart = 0; bufend = 1086784;
bufvoid = 99614720
08/11/14 12:06:14 INFO mapred.MapTask: kvstart = 0; kvend = 109855; length =
327680
08/11/14 12:06:14 INFO mapred.MapTask: Index: (0, 267034, 267034)
08/11/14 12:06:14 INFO mapred.MapTask: Finished spill 0
08/11/14 12:06:15 INFO mapred.LocalJobRunner:
hdfs://localhost:54310/user/hadoop/books/one.txt:0+662001
08/11/14 12:06:15 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_00_0' done.
08/11/14 12:06:15 INFO mapred.TaskRunner: Saved output of task
'attempt_local_0001_m_00_0' to
hdfs://localhost:54310/user/hadoop/booksOutput
08/11/14 12:06:15 INFO mapred.JobClient:  map 100% reduce 0%
08/11/14 12:06:15 INFO mapred.MapTask: numReduceTasks: 1
08/11/14 12:06:15 INFO mapred.MapTask: io.sort.mb = 100
08/11/14 12:06:15 INFO mapred.MapTask: data buffer = 79691776/99614720
08/11/14 12:06:15 INFO mapred.MapTask: record buffer = 262144/327680
08/11/14 12:06:15 INFO mapred.MapTask: Spilling map output: buffer full =
false and record full = true
08/11/14 12:06:15 INFO mapred.MapTask: bufstart = 0; bufend = 2545957;
bufvoid = 99614720
08/11/14 12:06:15 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length =
327680
08/11/14 12:06:15 INFO mapred.MapTask: Starting flush of map output
08/11/14 12:06:16 INFO mapred.MapTask: Index: (0, 717078, 717078)
08/11/14 12:06:16 INFO mapred.MapTask: Finished spill 0
08/11/14 12:06:16 INFO mapred.MapTask: bufstart = 2545957; bufend = 2601773;
bufvoid = 99614720
08/11/14 12:06:16 INFO mapred.MapTask: kvstart = 262144; kvend = 267975;
length = 327680
08/11/14 12:06:16 INFO mapred.MapTask: Index: (0, 23156, 23156)
08/11/14 12:06:16 INFO mapred.MapTask: Finished spill 1
08/11/14 12:06:16 INFO mapred.Merger: Merging 2 sorted segments
08/11/14 12:06:16 INFO mapred.Merger: Down to the last merge-pass, with 2
segments left of total size: 740234 bytes
08/11/14 12:06:16 INFO mapred.MapTask: Index: (0, 740232, 740232)
08/11/14 12:06:16 INFO mapred.LocalJobRunner:
hdfs://localhost:54310/user/hadoop/books/three.txt:0+1539989
08/11/14 12:06:16 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_01_0' done.
08/11/14 12:06:16 INFO mapred.TaskRunner: Saved output of task
'attempt_local_0001_m_01_0' to
hdfs://localhost:54310/user/hadoop/booksOutput
08/11/14 12:06:17 INFO mapred.MapTask: numReduceTasks: 1
08/11/14 12:06:17 INFO mapred.MapTask: io.sort.mb = 100
08/11/14 12:06:17 INFO mapred.MapTask: data buffer = 79691776/99614720
08/11/14 12:06:17 INFO mapred.MapTask: record buffer = 262144/327680
08/11/14 12:06:17 INFO mapred.MapTask: Starting flush of map output
08/11/14 12:06:17 INFO mapred.MapTask: bufstart = 0; bufend = 2387689;
bufvoid = 99614720
08/11/14 12:06:17 INFO mapred.MapTask: kvstart = 0; kvend = 251356; length =
327680
08/11/14 12:06:18 INFO mapred.MapTask: Index: (0, 466648, 466648)
08/11/14 12:06:18 INFO mapred.MapTask: Finished spill 0
08/11/14 12:06:18 INFO mapred.LocalJobRunner:
hdfs://localhost:54310/user/hadoop/books/two.txt:0+1391690
08/11/14 12:06:18 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_02_0' done.
08/11/14 12:06:18 INFO mapred.TaskRunner: Saved output of task
'attempt_local_0001_m_02_0' to
hdfs://localhost:54310/user/hadoop/booksOutput
08/11/14 12:06:18 INFO mapred.ReduceTask: Initiating final on-disk merge
with 3 files
08/11/14 12:06:18 INFO mapred.Merger: Merging 3 sorted segments
08/11/14 12:06:18 INFO mapred.Merger: Down to the last merge-pass, with 3
segments left of total size: 1473914 bytes
08/11/14 12:06:18 INFO mapred.LocalJobRunner: reduce  reduce
08/11/14 12:06:18 INFO mapred.TaskRunner: Task
'attempt_local_0001_r_00_0' done.
08/11/14 12:06:18 INFO mapred.TaskRunner: Saved output of task
'attempt_local_0001_r_00_0' to
hdfs://localhost:54310/user/hadoop/booksOutput
08/11/14 12:06:19 INFO mapred.JobClient: Job complete: job_local_0001
08/11/14 12:06:19 INFO mapred.JobClient: Counters: 13
08/11/14 12:06:19 INFO mapred.JobClient:   File 

Could Not Find file.out.index (Help starting Hadoop!)

2008-11-13 Thread KevinAWorkman

Hello everybody,

I’m sorry if this has already been covered somewhere else, but I’ve been
searching the web for weeks to no avail. :(

Anyway, I am attempting to set up a single-node cluster following the
directions here:
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
.  I get everything set up fine and attempt to start the first example
program (running the wordcount job).  I format the namenode, start-all, and
copy the files from local.  I then try to execute the example job, but this
is the output:

[EMAIL PROTECTED] hadoop-0.18.1]$ bin/hadoop jar hadoop-0.18.1-examples.jar
wordcoun
t books booksOutput
08/11/13 18:21:41 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/11/13 18:21:41 INFO mapred.FileInputFormat: Total input paths to process
: 3
08/11/13 18:21:42 INFO mapred.JobClient: Running job: job_200811131821_0001
08/11/13 18:21:43 INFO mapred.JobClient:  map 0% reduce 0%
08/11/13 18:21:49 INFO mapred.JobClient:  map 66% reduce 0%
08/11/13 18:21:52 INFO mapred.JobClient:  map 100% reduce 0%
08/11/13 18:21:52 INFO mapred.JobClient: Task Id :
attempt_200811131821_0001_m_01_0, Status : FAILED
Map output lost, rescheduling:
getMapOutput(attempt_200811131821_0001_m_01_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200811131821_0001/attempt_200811131821_0001_m_01_0/output/file.out.index
in any of the configured local directories
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2402)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534):



So apparently the map output is being lost, and reduce can not find it. 
Upon viewing the logs, I also find this error in the secondary namenode log
file:

2008-11-13 17:41:40,518 WARN
org.mortbay.jetty.servlet.WebApplicationContext: Configuration error on
file:/home/staff/hadoop/hadoop-0.18.1/webapps/secondary
java.io.FileNotFoundException:
file:/home/staff/hadoop/hadoop-0.18.1/webapps/secondary
at
org.mortbay.jetty.servlet.WebApplicationContext.resolveWebApp(WebApplicationContext.java:266)
at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:449)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
at org.mortbay.util.Container.start(Container.java:72)
at
org.apache.hadoop.mapred.StatusHttpServer.start(StatusHttpServer.java:207)
at
org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:156)
at
org.apache.hadoop.dfs.SecondaryNameNode.init(SecondaryNameNode.java:108)
at 
org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)

I have the following defined in hadoop-site:

property
   namehadoop.tmp.dir/name
   valuetmp/value
   descriptionA base for other temporary directories/description
/property

property
  namefs.default.name/name
  valuehdfs://localhost:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
/property
 
property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property
 
property
  namedfs.replication/name