Jeff
Palleti, Pallavi wrote:
Yeah. But, I am wondering how the testcases succeeded? I ran them using "mvn clean install" command. Thanks Pallavi -----Original Message-----From: Jeff Eastman [mailto:j...@windwardsolutions.com] Sent: Thursday, March 19, 2009 9:56 AMTo: mahout-dev@lucene.apache.org Subject: Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans The Synthetic Control kMeans job calls the Canopy job to build its initial clusters as is commonly done. If the kMeans record format was changed and the Canopy not changed accordingly, then everything would still compile but there would be a mismatch when the kMeans mapper tried to read in the clusters. Jeff Richard Tomsett (JIRA) wrote:[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683 252#action_12683252 ] Richard Tomsett commented on MAHOUT-99: --------------------------------------- Yup, just downloaded the latest trunk and run with Hadoop 0.19.1 and I get the same error on the Synthetic Control example. It seems to be because the new KMeans code uses a KeyValueLineRecordReader object to read the input cluster centres from the canopy clustering output, but the canopy clustering job outputs a SequenceFile (and the old KMeans code read in a SequenceFile for the cluster centres). Think that's the problem at least, I''ll have a quick play.Improving speed of KMeans ------------------------- Key: MAHOUT-99 URL: https://issues.apache.org/jira/browse/MAHOUT-99 Project: Mahout Issue Type: Improvement Components: Clustering Reporter: Pallavi Palleti Assignee: Grant Ingersoll Fix For: 0.1Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patchImproved the speed of KMeans by passing only cluster ID from mapper to reducer. Previously, whole Cluster Info as formatted s`tring was being sent. Also removed the implicit assumption of Combiner runs only once approach and the code is modified accordingly so that it won't create a bug when combiner runs zero or more than once.
PGP.sig
Description: PGP signature