Hello When run the following command on Mahout-0.9 and Hadoop-1.2.1, I get multiple errors and I can not figure out what is the problem? Sorry for the long post.
[hadoop@solaris ~]$ mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c ~/categories.txt Running on hadoop, using /export/home/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar 14/03/18 20:28:28 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props found on classpath, will use command-line arguments only 14/03/18 20:28:29 INFO wikipedia.WikipediaDatasetCreatorDriver: Input: wikipedia/chunks Out: wikipediainput Categories: /export/home/hadoop/categories.txt 14/03/18 20:28:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/03/18 20:28:32 INFO input.FileInputFormat: Total input paths to process : 699 14/03/18 20:28:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/03/18 20:28:32 WARN snappy.LoadSnappy: Snappy native library not loaded 14/03/18 20:28:37 INFO mapred.JobClient: Running job: job_201403181916_0001 14/03/18 20:28:38 INFO mapred.JobClient: map 0% reduce 0% 14/03/18 20:41:44 INFO mapred.JobClient: map 1% reduce 0% 14/03/18 20:52:57 INFO mapred.JobClient: map 2% reduce 0% 14/03/18 21:04:02 INFO mapred.JobClient: map 3% reduce 0% 14/03/18 21:15:13 INFO mapred.JobClient: map 4% reduce 0% 14/03/18 21:26:30 INFO mapred.JobClient: map 5% reduce 0% 14/03/18 21:29:07 INFO mapred.JobClient: map 5% reduce 1% 14/03/18 21:34:45 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_m_000040_0, Status : FAILED 14/03/18 21:34:46 WARN mapred.JobClient: Error reading task outputhttp://solaris:50060/tasklog?plaintext=true&attemptid=attempt_201403181916_0001_m_000040_0&filter=stdout 14/03/18 21:34:46 WARN mapred.JobClient: Error reading task outputhttp://solaris:50060/tasklog?plaintext=true&attemptid=attempt_201403181916_0001_m_000040_0&filter=stderr 14/03/18 21:38:29 INFO mapred.JobClient: map 6% reduce 1% 14/03/18 21:41:48 INFO mapred.JobClient: map 6% reduce 2% 14/03/18 21:50:05 INFO mapred.JobClient: map 7% reduce 2% 14/03/18 22:00:59 INFO mapred.JobClient: map 8% reduce 2% 14/03/18 22:12:38 INFO mapred.JobClient: map 9% reduce 2% 14/03/18 22:14:53 INFO mapred.JobClient: map 9% reduce 3% 14/03/18 22:24:30 INFO mapred.JobClient: map 10% reduce 3% 14/03/18 22:35:49 INFO mapred.JobClient: map 11% reduce 3% 14/03/18 22:47:41 INFO mapred.JobClient: map 12% reduce 3% 14/03/18 22:48:18 INFO mapred.JobClient: map 12% reduce 4% 14/03/18 22:59:26 INFO mapred.JobClient: map 13% reduce 4% 14/03/18 23:10:39 INFO mapred.JobClient: map 14% reduce 4% 14/03/18 23:21:32 INFO mapred.JobClient: map 15% reduce 4% 14/03/18 23:24:54 INFO mapred.JobClient: map 15% reduce 5% 14/03/18 23:32:48 INFO mapred.JobClient: map 16% reduce 5% 14/03/18 23:43:53 INFO mapred.JobClient: map 17% reduce 5% 14/03/18 23:54:57 INFO mapred.JobClient: map 18% reduce 5% 14/03/18 23:58:59 INFO mapred.JobClient: map 18% reduce 6% 14/03/19 00:05:59 INFO mapred.JobClient: map 19% reduce 6% 14/03/19 00:16:43 INFO mapred.JobClient: map 20% reduce 6% 14/03/19 00:17:30 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_m_000137_0, Status : FAILED Map output lost, rescheduling: getMapOutput(attempt_201403181916_0001_m_000137_0,0) failed : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:4070) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:914) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: Cannot run program "/bin/ls": error=12, Not enough space at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.hadoop.util.Shell.runCommand(Shell.java:200) at org.apache.hadoop.util.Shell.run(Shell.java:182) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) at org.apache.hadoop.util.Shell.execCommand(Shell.java:461) at org.apache.hadoop.util.Shell.execCommand(Shell.java:444) at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:712) at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:448) at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:110) at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:61) at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:4070) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:914) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.io.IOException: error=12, Not enough space at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:79) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021) ... 35 more at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:473) at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:110) at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:61) at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109) ... 23 more 14/03/19 00:17:38 INFO mapred.JobClient: map 19% reduce 6% 14/03/19 00:18:29 INFO mapred.JobClient: map 20% reduce 6% 14/03/19 00:29:27 INFO mapred.JobClient: map 21% reduce 6% 14/03/19 00:30:37 INFO mapred.JobClient: map 21% reduce 7% 14/03/19 00:40:25 INFO mapred.JobClient: map 22% reduce 7% 14/03/19 00:51:52 INFO mapred.JobClient: map 23% reduce 7% 14/03/19 01:02:57 INFO mapred.JobClient: map 24% reduce 7% 14/03/19 01:06:10 INFO mapred.JobClient: map 24% reduce 8% 14/03/19 01:14:12 INFO mapred.JobClient: map 25% reduce 8% 14/03/19 01:25:18 INFO mapred.JobClient: map 26% reduce 8% 14/03/19 01:35:29 INFO mapred.JobClient: map 27% reduce 8% 14/03/19 01:36:57 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_m_000191_0, Status : FAILED 14/03/19 01:36:57 WARN mapred.JobClient: Error reading task outputhttp://solaris:50060/tasklog?plaintext=true&attemptid=attempt_201403181916_0001_m_000191_0&filter=stdout 14/03/19 01:36:58 WARN mapred.JobClient: Error reading task outputhttp://solaris:50060/tasklog?plaintext=true&attemptid=attempt_201403181916_0001_m_000191_0&filter=stderr 14/03/19 01:38:34 INFO mapred.JobClient: map 27% reduce 9% 14/03/19 01:46:26 INFO mapred.JobClient: map 28% reduce 9% 14/03/19 01:57:37 INFO mapred.JobClient: map 29% reduce 9% 14/03/19 02:08:24 INFO mapred.JobClient: map 30% reduce 9% 14/03/19 02:10:46 INFO mapred.JobClient: map 30% reduce 10% 14/03/19 02:11:50 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_r_000000_0, Status : FAILED java.io.IOException: Task: attempt_201403181916_0001_r_000000_0 - The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403181916_0001/attempt_201403181916_0001_r_000000_0/output/map_105.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690) 14/03/19 02:11:54 INFO mapred.JobClient: map 30% reduce 5% 14/03/19 02:12:33 INFO mapred.JobClient: map 30% reduce 6% 14/03/19 02:13:00 INFO mapred.JobClient: map 30% reduce 7% 14/03/19 02:13:24 INFO mapred.JobClient: map 30% reduce 8% 14/03/19 02:13:48 INFO mapred.JobClient: map 30% reduce 9% 14/03/19 02:14:16 INFO mapred.JobClient: map 30% reduce 10% 14/03/19 02:19:40 INFO mapred.JobClient: map 31% reduce 10% 14/03/19 02:21:21 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_r_000000_1, Status : FAILED java.io.IOException: Task: attempt_201403181916_0001_r_000000_1 - The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403181916_0001/attempt_201403181916_0001_r_000000_1/output/map_195.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690) 14/03/19 02:21:25 INFO mapred.JobClient: map 31% reduce 5% 14/03/19 02:22:05 INFO mapred.JobClient: map 31% reduce 6% 14/03/19 02:22:27 INFO mapred.JobClient: map 31% reduce 7% 14/03/19 02:22:54 INFO mapred.JobClient: map 31% reduce 8% 14/03/19 02:23:22 INFO mapred.JobClient: map 31% reduce 9% 14/03/19 02:23:51 INFO mapred.JobClient: map 31% reduce 10% 14/03/19 02:24:08 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_r_000000_2, Status : FAILED java.io.IOException: Task: attempt_201403181916_0001_r_000000_2 - The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201403181916_0001/attempt_201403181916_0001_r_000000_2/output/map_180.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2690) 14/03/19 02:24:11 INFO mapred.JobClient: map 31% reduce 5% 14/03/19 02:24:47 INFO mapred.JobClient: map 31% reduce 6% 14/03/19 02:25:12 INFO mapred.JobClient: map 31% reduce 7% 14/03/19 02:25:39 INFO mapred.JobClient: map 31% reduce 8% 14/03/19 02:26:03 INFO mapred.JobClient: map 31% reduce 9% 14/03/19 02:26:28 INFO mapred.JobClient: map 31% reduce 10% 14/03/19 02:26:43 INFO mapred.JobClient: map 31% reduce 5% 14/03/19 02:26:48 INFO mapred.JobClient: Job complete: job_201403181916_0001 14/03/19 02:26:48 INFO mapred.JobClient: Counters: 20 14/03/19 02:26:48 INFO mapred.JobClient: Job Counters 14/03/19 02:26:48 INFO mapred.JobClient: Launched reduce tasks=5 14/03/19 02:26:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=85832128 14/03/19 02:26:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/03/19 02:26:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/03/19 02:26:48 INFO mapred.JobClient: Launched map tasks=226 14/03/19 02:26:48 INFO mapred.JobClient: Data-local map tasks=226 14/03/19 02:26:48 INFO mapred.JobClient: Failed reduce tasks=1 14/03/19 02:26:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=35806681 14/03/19 02:26:48 INFO mapred.JobClient: FileSystemCounters 14/03/19 02:26:48 INFO mapred.JobClient: HDFS_BYTES_READ=14761431506 14/03/19 02:26:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2670797786 14/03/19 02:26:48 INFO mapred.JobClient: File Input Format Counters 14/03/19 02:26:48 INFO mapred.JobClient: Bytes Read=14761399970 14/03/19 02:26:48 INFO mapred.JobClient: Map-Reduce Framework 14/03/19 02:26:48 INFO mapred.JobClient: Map output materialized bytes=2658182182 14/03/19 02:26:48 INFO mapred.JobClient: Combine output records=0 14/03/19 02:26:48 INFO mapred.JobClient: Map input records=4444339 14/03/19 02:26:48 INFO mapred.JobClient: Spilled Records=557011 14/03/19 02:26:48 INFO mapred.JobClient: Map output bytes=2655965336 14/03/19 02:26:48 INFO mapred.JobClient: Total committed heap usage (bytes)=39178272768 14/03/19 02:26:48 INFO mapred.JobClient: Combine input records=0 14/03/19 02:26:48 INFO mapred.JobClient: Map output records=557011 14/03/19 02:26:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=28689 Exception in thread "main" java.lang.IllegalStateException: Job failed! at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187) at org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) [hadoop@solaris ~]$ Regards, Mahmood