Jeff Grant Ingersoll (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]Grant Ingersoll reopened MAHOUT-99: ----------------------------------- Hi Pallavi,I'm getting: 09/03/18 11:13:56 WARN mapred.LocalJobRunner: job_local_0001java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1938) at org.apache.mahout.clustering.kmeans.Cluster.decodeCluster(Cluster.java:81) at org.apache.mahout.clustering.kmeans.KMeansUtil.configureWithClusterInfo(KMeansUtil.java:80) at org.apache.mahout.clustering.kmeans.KMeansMapper.configure(KMeansMapper.java:66) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) when running http://cwiki.apache.org/MAHOUT/syntheticcontroldata.htmlImproving speed of KMeans ------------------------- Key: MAHOUT-99 URL: https://issues.apache.org/jira/browse/MAHOUT-99 Project: Mahout Issue Type: Improvement Components: Clustering Reporter: Pallavi Palleti Assignee: Grant Ingersoll Fix For: 0.1 Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch Improved the speed of KMeans by passing only cluster ID from mapper to reducer. Previously, whole Cluster Info as formatted s`tring was being sent. Also removed the implicit assumption of Combiner runs only once approach and the code is modified accordingly so that it won't create a bug when combiner runs zero or more than once.
PGP.sig
Description: PGP signature