On Oct 27, 2008, at 4:26 PM, Philippe Lamarche wrote:
Hi,My goal is to run the example KMeans. I must download the synthetic controldata and put it on the dfs in "testdata".To be sure that everything is ok, I stated form a clean state on my laptop.I downloaded hadoop 0.18.1. I changed the conf/hadoop-site.xml to this: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-datastore/hadoop-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> I changed JAVA_HOME in hadoop-env.sh. I downloaded mahout from SVN, at revision 708282. I built both core and example from ant script. I copied apache-mahout-core-0.1-dev.jar to {hadoop-home}/lib.
What happens if you don't do this but use the "job" file instead (ant job in the examples dir)? I'm trying to replicate this, but am stuck at the moment.
I downloaded http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data I added the file to the dfs: [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop dfs -put /home/philippe/synthetic_control.data testdata I ran the example jar, but it failed : [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- examples-0.1-dev.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job 08/10/27 15:34:55 WARN mapred.JobClient: Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same. 08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to process: 108/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to process: 108/10/27 15:34:55 INFO mapred.JobClient: Running job: job_200810271532_000108/10/27 15:34:56 INFO mapred.JobClient: map 0% reduce 0%08/10/27 15:34:59 INFO mapred.JobClient: Job complete: job_200810271532_000108/10/27 15:34:59 INFO mapred.JobClient: Counters: 7 08/10/27 15:34:59 INFO mapred.JobClient: File Systems 08/10/27 15:34:59 INFO mapred.JobClient: HDFS bytes read=291644 08/10/27 15:34:59 INFO mapred.JobClient: HDFS bytes written=323660 08/10/27 15:34:59 INFO mapred.JobClient: Job Counters 08/10/27 15:34:59 INFO mapred.JobClient: Launched map tasks=2 08/10/27 15:34:59 INFO mapred.JobClient: Data-local map tasks=2 08/10/27 15:34:59 INFO mapred.JobClient: Map-Reduce Framework 08/10/27 15:34:59 INFO mapred.JobClient: Map input records=600 08/10/27 15:34:59 INFO mapred.JobClient: Map input bytes=288374 08/10/27 15:34:59 INFO mapred.JobClient: Map output records=600 08/10/27 15:34:59 WARN mapred.JobClient: Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same. 08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to process: 208/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to process: 208/10/27 15:35:00 INFO mapred.JobClient: Running job: job_200810271532_000208/10/27 15:35:01 INFO mapred.JobClient: map 0% reduce 0% 08/10/27 15:35:10 INFO mapred.JobClient: map 100% reduce 0%08/10/27 15:35:16 INFO mapred.JobClient: Job complete: job_200810271532_000208/10/27 15:35:16 INFO mapred.JobClient: Counters: 16 08/10/27 15:35:16 INFO mapred.JobClient: File Systems 08/10/27 15:35:16 INFO mapred.JobClient: HDFS bytes read=323660 08/10/27 15:35:16 INFO mapred.JobClient: HDFS bytes written=1447 08/10/27 15:35:16 INFO mapred.JobClient: Local bytes read=1389 08/10/27 15:35:16 INFO mapred.JobClient: Local bytes written=37878 08/10/27 15:35:16 INFO mapred.JobClient: Job Counters 08/10/27 15:35:16 INFO mapred.JobClient: Launched reduce tasks=1 08/10/27 15:35:16 INFO mapred.JobClient: Launched map tasks=2 08/10/27 15:35:16 INFO mapred.JobClient: Data-local map tasks=2 08/10/27 15:35:16 INFO mapred.JobClient: Map-Reduce Framework 08/10/27 15:35:16 INFO mapred.JobClient: Reduce input groups=1 08/10/27 15:35:16 INFO mapred.JobClient: Combine output records=29 08/10/27 15:35:16 INFO mapred.JobClient: Map input records=600 08/10/27 15:35:16 INFO mapred.JobClient: Reduce output records=1 08/10/27 15:35:16 INFO mapred.JobClient: Map output bytes=943020 08/10/27 15:35:16 INFO mapred.JobClient: Map input bytes=32366008/10/27 15:35:16 INFO mapred.JobClient: Combine input records=176008/10/27 15:35:16 INFO mapred.JobClient: Map output records=1732 08/10/27 15:35:16 INFO mapred.JobClient: Reduce input records=1 08/10/27 15:35:16 WARN mapred.JobClient: Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same. 08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to process: 208/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to process: 208/10/27 15:35:16 INFO mapred.JobClient: Running job: job_200810271532_000308/10/27 15:35:17 INFO mapred.JobClient: map 0% reduce 0% 08/10/27 15:35:24 INFO mapred.JobClient: map 100% reduce 0%08/10/27 15:35:28 INFO mapred.JobClient: Job complete: job_200810271532_000308/10/27 15:35:28 INFO mapred.JobClient: Counters: 16 08/10/27 15:35:28 INFO mapred.JobClient: File Systems 08/10/27 15:35:28 INFO mapred.JobClient: HDFS bytes read=32655408/10/27 15:35:28 INFO mapred.JobClient: HDFS bytes written=113726008/10/27 15:35:28 INFO mapred.JobClient: Local bytes read=114735808/10/27 15:35:28 INFO mapred.JobClient: Local bytes written=230449008/10/27 15:35:28 INFO mapred.JobClient: Job Counters 08/10/27 15:35:28 INFO mapred.JobClient: Launched reduce tasks=1 08/10/27 15:35:28 INFO mapred.JobClient: Launched map tasks=2 08/10/27 15:35:28 INFO mapred.JobClient: Data-local map tasks=2 08/10/27 15:35:28 INFO mapred.JobClient: Map-Reduce Framework 08/10/27 15:35:28 INFO mapred.JobClient: Reduce input groups=1 08/10/27 15:35:28 INFO mapred.JobClient: Combine output records=0 08/10/27 15:35:28 INFO mapred.JobClient: Map input records=600 08/10/27 15:35:28 INFO mapred.JobClient: Reduce output records=600 08/10/27 15:35:28 INFO mapred.JobClient: Map output bytes=1139660 08/10/27 15:35:28 INFO mapred.JobClient: Map input bytes=323660 08/10/27 15:35:28 INFO mapred.JobClient: Combine input records=0 08/10/27 15:35:28 INFO mapred.JobClient: Map output records=600 08/10/27 15:35:28 INFO mapred.JobClient: Reduce input records=600 08/10/27 15:35:28 INFO kmeans.KMeansDriver: Iteration 0 08/10/27 15:35:29 WARN mapred.JobClient: Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same. 08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to process: 208/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to process: 208/10/27 15:35:29 INFO mapred.JobClient: Running job: job_200810271532_000408/10/27 15:35:30 INFO mapred.JobClient: map 0% reduce 0% 08/10/27 15:35:37 INFO mapred.JobClient: map 100% reduce 0% 08/10/27 15:35:45 INFO mapred.JobClient: Task Id : attempt_200810271532_0004_r_000000_0, Status : FAILEDjava.io.IOException: attempt_200810271532_0004_r_000000_0The reduce copierfailed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) The failed attempts logs contain this: 008-10-27 15:35:40,133 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 2524 bytes (2524 raw bytes) into RAM from attempt_200810271532_0004_m_000000_0 2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask: Read 2524 bytes from map-output for attempt_200810271532_0004_m_000000_0 2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200810271532_0004_m_000000_0 -> (1358, 1158) from phil 2008-10-27 15:35:41,110 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2008-10-27 15:35:41,125 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left. 2008-10-27 15:35:41,173 INFO org.apache.hadoop.mapred.ReduceTask: Initiating in-memory merge with 2 segments... 2008-10-27 15:35:41,177 INFO org.apache.hadoop.mapred.Merger: Merging 2 sorted segments 2008-10-27 15:35:41,178 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 5011 bytes 2008-10-27 15:35:41,197 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200810271532_0004_r_000000_0 Merge of the inmemory files threw an exception: java.io.IOException: Intermedate merge failedat org.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.run(ReduceTask.java:2078)Caused by: java.lang.NumberFormatException: For input string: "["at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 1224)at java.lang.Double.parseDouble(Double.java:510)at org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60) at org .apache .mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256) at org .apache .mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java: 38) at org .apache .mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java: 31) at org.apache.hadoop.mapred.ReduceTask $ReduceCopier.combineAndSpill(ReduceTask.java:2174) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access $3100(ReduceTask.java:341) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)... 1 more 2008-10-27 15:35:41,197 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 0 files left. 2008-10-27 15:35:41,198 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: attempt_200810271532_0004_r_000000_0The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 2207)However, I can run the org.apache.mahout.clustering.kmeans unit tests without problems. I truly do not understand where the problems lies. Thanks for the help.On Sun, Oct 26, 2008 at 8:24 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:Same Mahout code, though, right? Can you provide details on how you were running it? On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:Unfortunately, I went straight from 0.17.2 to 0.18.1. It was working on0.17.2.On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <[EMAIL PROTECTED]wrote:Did this work with 0.18.0 or other prior versions for you?On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote: Hi,I just updated to hadoop 0.18.1 and got a clean version of mahout fromsvn.However, I am having problems with KMeans, that can be traced down to :2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: Merging2 sorted segments2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 5011 bytes2008-10-25 19:10:16,999 WARN org.apache.hadoop.mapred.ReduceTask:attempt_200810251826_0013_r_000000_0 Merge of the inmemory files threwan exception: java.io.IOException: Intermedate merge failed atorg.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)atorg.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.run(ReduceTask.java:2078)Caused by: java.lang.NumberFormatException: For input string: "[" atsun .misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 1224)at java.lang.Double.parseDouble(Double.java:510) atorg .apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java: 60)atorg .apache .mahout.matrix.AbstractVector.decodeVector(AbstractVector.java: 256)atorg .apache .mahout .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)atorg .apache .mahout .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)atorg.apache.hadoop.mapred.ReduceTask $ReduceCopier.combineAndSpill(ReduceTask.java:2174)atorg.apache.hadoop.mapred.ReduceTask$ReduceCopier.access $3100(ReduceTask.java:341)atorg.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)... 1 more 2008-10-25 19:10:16,999 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 0 files left. 2008-10-25 19:10:17,000 WARN org.apache.hadoop.mapred.TaskTracker: Error running childjava.io.IOException: attempt_200810251826_0013_r_000000_0The reducecopier failedat org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 255)atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 2207)This is while running the synthetic_control.data example, but I have thesame problems with any other input data. I am able to do other map-reduce job without problems. Here is the output of the jar task: [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- examples-0.1-dev.jarorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job08/10/25 19:09:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths toprocess : 108/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths toprocess : 1 08/10/25 19:09:28 INFO mapred.JobClient: Running job: job_200810251826_0010 08/10/25 19:09:29 INFO mapred.JobClient: map 0% reduce 0% 08/10/25 19:09:31 INFO mapred.JobClient: map 50% reduce 0% 08/10/25 19:09:32 INFO mapred.JobClient: Job complete: job_200810251826_0010 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7 08/10/25 19:09:32 INFO mapred.JobClient: File Systems08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes read=291644 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes written=32366008/10/25 19:09:32 INFO mapred.JobClient: Job Counters 08/10/25 19:09:32 INFO mapred.JobClient: Launched map tasks=208/10/25 19:09:32 INFO mapred.JobClient: Data-local map tasks=208/10/25 19:09:32 INFO mapred.JobClient: Map-Reduce Framework 08/10/25 19:09:32 INFO mapred.JobClient: Map input records=60008/10/25 19:09:32 INFO mapred.JobClient: Map input bytes=288374 08/10/25 19:09:32 INFO mapred.JobClient: Map output records=600 08/10/25 19:09:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths toprocess : 208/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths toprocess : 2 08/10/25 19:09:32 INFO mapred.JobClient: Running job: job_200810251826_0011 08/10/25 19:09:33 INFO mapred.JobClient: map 0% reduce 0% 08/10/25 19:09:37 INFO mapred.JobClient: map 50% reduce 0% 08/10/25 19:09:39 INFO mapred.JobClient: map 100% reduce 0% 08/10/25 19:09:44 INFO mapred.JobClient: map 100% reduce 16% 08/10/25 19:09:52 INFO mapred.JobClient: Job complete: job_200810251826_0011 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16 08/10/25 19:09:52 INFO mapred.JobClient: File Systems08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes read=323660 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes written=144708/10/25 19:09:52 INFO mapred.JobClient: Local bytes read=138908/10/25 19:09:52 INFO mapred.JobClient: Local bytes written=3787808/10/25 19:09:52 INFO mapred.JobClient: Job Counters08/10/25 19:09:52 INFO mapred.JobClient: Launched reduce tasks=108/10/25 19:09:52 INFO mapred.JobClient: Launched map tasks=208/10/25 19:09:52 INFO mapred.JobClient: Data-local map tasks=208/10/25 19:09:52 INFO mapred.JobClient: Map-Reduce Framework 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input groups=108/10/25 19:09:52 INFO mapred.JobClient: Combine output records=2908/10/25 19:09:52 INFO mapred.JobClient: Map input records=60008/10/25 19:09:52 INFO mapred.JobClient: Reduce output records=1 08/10/25 19:09:52 INFO mapred.JobClient: Map output bytes=943020 08/10/25 19:09:52 INFO mapred.JobClient: Map input bytes=323660 08/10/25 19:09:52 INFO mapred.JobClient: Combine input records=1760 08/10/25 19:09:52 INFO mapred.JobClient: Map output records=1732 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input records=1 08/10/25 19:09:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths toprocess : 208/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths toprocess : 2 08/10/25 19:09:53 INFO mapred.JobClient: Running job: job_200810251826_0012 08/10/25 19:09:54 INFO mapred.JobClient: map 0% reduce 0% 08/10/25 19:09:56 INFO mapred.JobClient: map 50% reduce 0% 08/10/25 19:09:58 INFO mapred.JobClient: map 100% reduce 0% 08/10/25 19:10:02 INFO mapred.JobClient: Job complete: job_200810251826_0012 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16 08/10/25 19:10:02 INFO mapred.JobClient: File Systems08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes read=326554 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes written=1137260 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes read=1147358 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes written=230449008/10/25 19:10:02 INFO mapred.JobClient: Job Counters08/10/25 19:10:02 INFO mapred.JobClient: Launched reduce tasks=108/10/25 19:10:02 INFO mapred.JobClient: Launched map tasks=208/10/25 19:10:02 INFO mapred.JobClient: Data-local map tasks=208/10/25 19:10:02 INFO mapred.JobClient: Map-Reduce Framework 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input groups=108/10/25 19:10:02 INFO mapred.JobClient: Combine output records=008/10/25 19:10:02 INFO mapred.JobClient: Map input records=60008/10/25 19:10:02 INFO mapred.JobClient: Reduce output records=600 08/10/25 19:10:02 INFO mapred.JobClient: Map output bytes=1139660 08/10/25 19:10:02 INFO mapred.JobClient: Map input bytes=323660 08/10/25 19:10:02 INFO mapred.JobClient: Combine input records=0 08/10/25 19:10:02 INFO mapred.JobClient: Map output records=600 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input records=60008/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 008/10/25 19:10:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths toprocess : 208/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths toprocess : 2 08/10/25 19:10:03 INFO mapred.JobClient: Running job: job_200810251826_0013 08/10/25 19:10:04 INFO mapred.JobClient: map 0% reduce 0% 08/10/25 19:10:08 INFO mapred.JobClient: map 50% reduce 0% 08/10/25 19:10:09 INFO mapred.JobClient: map 100% reduce 0% 08/10/25 19:10:21 INFO mapred.JobClient: Task Id : attempt_200810251826_0013_r_000000_0, Status : FAILEDjava.io.IOException: attempt_200810251826_0013_r_000000_0The reducecopier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 2207)I am not sure if I am doing something wrong here. Thanks for the help, Philippe.-------------------------- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ-------------------------- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
-------------------------- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
