[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pallavi Palleti updated MAHOUT-99: ---------------------------------- Attachment: MAHOUT-99.patch I have fixed sequencefile issue. Modified code SequenceFile where ever possible. And also, with the new KMeansClusterMapper, we don't need outputMapper code in Job.java in SyntheticControl. So, I commented that. Thanks Pallavi > Improving speed of KMeans > ------------------------- > > Key: MAHOUT-99 > URL: https://issues.apache.org/jira/browse/MAHOUT-99 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Reporter: Pallavi Palleti > Assignee: Grant Ingersoll > Fix For: 0.1 > > Attachments: MAHOUT-99-1.patch, MAHOUT-99.patch, Mahout-99.patch, > MAHOUT-99.patch > > > Improved the speed of KMeans by passing only cluster ID from mapper to > reducer. Previously, whole Cluster Info as formatted s`tring was being sent. > Also removed the implicit assumption of Combiner runs only once approach and > the code is modified accordingly so that it won't create a bug when combiner > runs zero or more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.