Hi Grant, I am Rohini and work in the same team as Pallavi is. Pallavi is out of Office till the end of this month. I will be taking care of this issue now.
I will look into the issue you have pointed out and get back to you. Thanks, -Rohini -----Original Message----- From: Grant Ingersoll (JIRA) [mailto:[EMAIL PROTECTED] Sent: Sunday, December 07, 2008 7:32 AM To: mahout-dev@lucene.apache.org Subject: [jira] Commented: (MAHOUT-99) Improving speed of KMeans [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira. plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654168# action_12654168 ] Grant Ingersoll commented on MAHOUT-99: --------------------------------------- Hi Pallavi, The core code works, but the change to the KMeansDriver causes a compile error in examples in the Kmeans demo code b/c it now asks for the number of map tasks and the number of centroids. Could you document these new parameters and put in reasonable defaults and update the patch? One thing I'm not certain of, though, is why we need to pass in the number of map tasks, isn't that a config thing already when you setup Hadoop? > Improving speed of KMeans > ------------------------- > > Key: MAHOUT-99 > URL: https://issues.apache.org/jira/browse/MAHOUT-99 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Reporter: Pallavi Palleti > Assignee: Grant Ingersoll > Attachments: MAHOUT-99.patch > > > Improved the speed of KMeans by passing only cluster ID from mapper to reducer. Previously, whole Cluster Info as formatted s`tring was being sent. > Also removed the implicit assumption of Combiner runs only once approach and the code is modified accordingly so that it won't create a bug when combiner runs zero or more than once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.