[ https://issues.apache.org/jira/browse/MAHOUT-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suneel Marthi updated MAHOUT-1469: ---------------------------------- Fix Version/s: (was: 1.0) 0.10.0 > Streaming KMeans fails when executed in MapReduce mode and > REDUCE_STREAMING_KMEANS is set to true > ------------------------------------------------------------------------------------------------- > > Key: MAHOUT-1469 > URL: https://issues.apache.org/jira/browse/MAHOUT-1469 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.9 > Reporter: Suneel Marthi > Assignee: Suneel Marthi > Labels: legacy > Fix For: 0.10.0 > > > Centroids are not being generated when executed in MR mode with -rskm flag > set. > {Code} > 14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282 > 14/03/20 02:42:12 INFO mapred.JobClient: map 100% reduce 0% > 14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: > 0 > 14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001 > java.lang.IllegalArgumentException: Must have nonzero number of training and > test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0] > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:148) > at > org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) > at > org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) > at > org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) > at > org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) > at > org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > 14/03/20 02:42:14 INFO mapred.JobClient: Job complete: > job_local1374896815_0001 > 14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16 > 14/03/20 02:42:14 INFO mapred.JobClient: File Input Format Counters > 14/03/20 02:42:14 INFO mapred.JobClient: Bytes Read=17156391 > 14/03/20 02:42:14 INFO mapred.JobClient: FileSystemCounters > 14/03/20 02:42:14 INFO mapred.JobClient: FILE_BYTES_READ=41925624 > 14/03/20 02:42:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=25974741 > 14/03/20 02:42:14 INFO mapred.JobClient: Map-Reduce Framework > 14/03/20 02:42:14 INFO mapred.JobClient: Map output materialized > bytes=956293 > 14/03/20 02:42:14 INFO mapred.JobClient: Map input records=21578 > 14/03/20 02:42:14 INFO mapred.JobClient: Reduce shuffle bytes=0 > 14/03/20 02:42:14 INFO mapred.JobClient: Spilled Records=282 > 14/03/20 02:42:14 INFO mapred.JobClient: Map output bytes=1788012 > 14/03/20 02:42:14 INFO mapred.JobClient: Total committed heap usage > (bytes)=217214976 > 14/03/20 02:42:14 INFO mapred.JobClient: Combine input records=0 > 14/03/20 02:42:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=163 > 14/03/20 02:42:14 INFO mapred.JobClient: Reduce input records=0 > 14/03/20 02:42:14 INFO mapred.JobClient: Reduce input groups=0 > 14/03/20 02:42:14 INFO mapred.JobClient: Combine output records=0 > 14/03/20 02:42:14 INFO mapred.JobClient: Reduce output records=0 > 14/03/20 02:42:14 INFO mapred.JobClient: Map output records=282 > 14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes: > 8.437816666666667) > {Code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)