No, I am doing some work with text clustering. I was struggling with memory problems on hadoop 0.17.2 and an unknown version of Mahout. I usually update mahout often but i could not say when or the exact version. I decided to update all my things and my memory problems where transformed into this.
I feel that I updated Hadoop first, got the error and then, while trying to resolve it, updated Mahout. There is only one way to find out: I'll try it out on hadoop 0.17.2. On Mon, Oct 27, 2008 at 9:33 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > > On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote: > > Unfortunately, I went straight from 0.17.2 to 0.18.1. It was working on >> 0.17.2. >> >> > BTW, are you saying the same exact code was working on 0.17.2 or are you > referring to some older Mahout code that worked on 17.2? > > > > >> >> On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <[EMAIL PROTECTED] >> >wrote: >> >> Did this work with 0.18.0 or other prior versions for you? >>> >>> >>> >>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote: >>> >>> Hi, >>> >>>> >>>> I just updated to hadoop 0.18.1 and got a clean version of mahout from >>>> svn. >>>> However, I am having problems with KMeans, that can be traced down to : >>>> >>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: Merging >>>> 2 sorted segments >>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: Down to >>>> the last merge-pass, with 2 segments left of total size: 5011 bytes >>>> 2008-10-25 19:10:16,999 WARN org.apache.hadoop.mapred.ReduceTask: >>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory files threw >>>> an exception: java.io.IOException: Intermedate merge failed >>>> at >>>> >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147) >>>> at >>>> >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078) >>>> Caused by: java.lang.NumberFormatException: For input string: "[" >>>> at >>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224) >>>> at java.lang.Double.parseDouble(Double.java:510) >>>> at >>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60) >>>> at >>>> >>>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256) >>>> at >>>> >>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38) >>>> at >>>> >>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31) >>>> at >>>> >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174) >>>> at >>>> >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341) >>>> at >>>> >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134) >>>> ... 1 more >>>> >>>> 2008-10-25 19:10:16,999 INFO org.apache.hadoop.mapred.ReduceTask: >>>> In-memory merge complete: 0 files left. >>>> 2008-10-25 19:10:17,000 WARN org.apache.hadoop.mapred.TaskTracker: >>>> Error running child >>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The reduce >>>> copier failed >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) >>>> >>>> >>>> This is while running the synthetic_control.data example, but I have the >>>> same problems with any other input data. >>>> >>>> I am able to do other map-reduce job without problems. >>>> >>>> Here is the output of the jar task: >>>> >>>> [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar >>>> >>>> >>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar >>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 1 >>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 1 >>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job: >>>> job_200810251826_0010 >>>> 08/10/25 19:09:29 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/25 19:09:31 INFO mapred.JobClient: map 50% reduce 0% >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete: >>>> job_200810251826_0010 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: File Systems >>>> 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes read=291644 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes written=323660 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job Counters >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Launched map tasks=2 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Data-local map tasks=2 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map-Reduce Framework >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map input records=600 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map input bytes=288374 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map output records=600 >>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job: >>>> job_200810251826_0011 >>>> 08/10/25 19:09:33 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/25 19:09:37 INFO mapred.JobClient: map 50% reduce 0% >>>> 08/10/25 19:09:39 INFO mapred.JobClient: map 100% reduce 0% >>>> 08/10/25 19:09:44 INFO mapred.JobClient: map 100% reduce 16% >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete: >>>> job_200810251826_0011 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: File Systems >>>> 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes read=323660 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes written=1447 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Local bytes read=1389 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Local bytes written=37878 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job Counters >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Launched reduce tasks=1 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Launched map tasks=2 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Data-local map tasks=2 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map-Reduce Framework >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input groups=1 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Combine output records=29 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map input records=600 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce output records=1 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map output bytes=943020 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map input bytes=323660 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Combine input records=1760 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map output records=1732 >>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input records=1 >>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job: >>>> job_200810251826_0012 >>>> 08/10/25 19:09:54 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/25 19:09:56 INFO mapred.JobClient: map 50% reduce 0% >>>> 08/10/25 19:09:58 INFO mapred.JobClient: map 100% reduce 0% >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete: >>>> job_200810251826_0012 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: File Systems >>>> 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes read=326554 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes written=1137260 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes read=1147358 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes written=2304490 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job Counters >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Launched reduce tasks=1 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Launched map tasks=2 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Data-local map tasks=2 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map-Reduce Framework >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input groups=1 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Combine output records=0 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map input records=600 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce output records=600 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map output bytes=1139660 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map input bytes=323660 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Combine input records=0 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map output records=600 >>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input records=600 >>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0 >>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use GenericOptionsParser for >>>> parsing the arguments. Applications should implement Tool for the same. >>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job: >>>> job_200810251826_0013 >>>> 08/10/25 19:10:04 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/25 19:10:08 INFO mapred.JobClient: map 50% reduce 0% >>>> 08/10/25 19:10:09 INFO mapred.JobClient: map 100% reduce 0% >>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id : >>>> attempt_200810251826_0013_r_000000_0, Status : FAILED >>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The reduce >>>> copier >>>> failed >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) >>>> >>>> >>>> I am not sure if I am doing something wrong here. >>>> >>>> Thanks for the help, >>>> >>>> Philippe. >>>> >>>> >>> -------------------------- >>> Grant Ingersoll >>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >>> http://www.lucenebootcamp.com >>> >>> >>> Lucene Helpful Hints: >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> http://wiki.apache.org/lucene-java/LuceneFAQ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > -------------------------- > Grant Ingersoll > Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. > http://www.lucenebootcamp.com > > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > >
