Re: Problems with KMeans clustering

Philippe Lamarche Sun, 26 Oct 2008 07:46:42 -0700

Unfortunately, I went straight from 0.17.2 to 0.18.1.  It was working on
0.17.2.




On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

> Did this work with 0.18.0 or other prior versions for you?
>
>
>
> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>
>  Hi,
>>
>> I just updated to hadoop 0.18.1 and got a clean version of mahout from
>> svn.
>> However, I am having problems with KMeans, that can be traced down to :
>>
>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: Merging
>> 2 sorted segments
>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: Down to
>> the last merge-pass, with 2 segments left of total size: 5011 bytes
>> 2008-10-25 19:10:16,999 WARN org.apache.hadoop.mapred.ReduceTask:
>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory files threw
>> an exception: java.io.IOException: Intermedate merge failed
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)
>> Caused by: java.lang.NumberFormatException: For input string: "["
>>        at
>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>        at java.lang.Double.parseDouble(Double.java:510)
>>        at
>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>        at
>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
>>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>        ... 1 more
>>
>> 2008-10-25 19:10:16,999 INFO org.apache.hadoop.mapred.ReduceTask:
>> In-memory merge complete: 0 files left.
>> 2008-10-25 19:10:17,000 WARN org.apache.hadoop.mapred.TaskTracker:
>> Error running child
>> java.io.IOException: attempt_200810251826_0013_r_000000_0The reduce
>> copier failed
>>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>        at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>
>>
>> This is while running the synthetic_control.data example, but I have the
>> same problems with any other input data.
>>
>> I am able to do other map-reduce job without problems.
>>
>> Here is the output of the jar task:
>>
>> [EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar
>>
>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> 08/10/25 19:09:27 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 1
>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 1
>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>> job_200810251826_0010
>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>> job_200810251826_0010
>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes read=291644
>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes written=323660
>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input records=600
>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input bytes=288374
>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output records=600
>> 08/10/25 19:09:32 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>> job_200810251826_0011
>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce 16%
>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>> job_200810251826_0011
>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes read=323660
>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes written=1447
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes read=1389
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes written=37878
>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input groups=1
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine output records=29
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input records=600
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce output records=1
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output bytes=943020
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine input records=1760
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output records=1732
>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input records=1
>> 08/10/25 19:09:53 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>> job_200810251826_0012
>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>> job_200810251826_0012
>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes read=326554
>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes written=1137260
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes read=1147358
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes written=2304490
>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input groups=1
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine output records=0
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input records=600
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce output records=600
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output bytes=1139660
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine input records=0
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output records=600
>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input records=600
>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>> 08/10/25 19:10:02 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>> job_200810251826_0013
>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>> java.io.IOException: attempt_200810251826_0013_r_000000_0The reduce copier
>> failed
>>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>   at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>
>>
>> I am not sure if I am doing something wrong here.
>>
>> Thanks for the help,
>>
>> Philippe.
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>

Re: Problems with KMeans clustering

Reply via email to