[ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654168#action_12654168
 ] 

Grant Ingersoll commented on MAHOUT-99:
---------------------------------------

Hi Pallavi,

The core code works, but the change to the KMeansDriver causes a compile error 
in examples in the Kmeans demo code b/c it now asks for the number of map tasks 
and the number of centroids.  Could you document these new parameters and put 
in reasonable defaults and update the patch?

One thing I'm not certain of, though, is why we need to pass in the number of 
map tasks, isn't that a config thing already when you setup Hadoop?  

> Improving speed of KMeans
> -------------------------
>
>                 Key: MAHOUT-99
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-99
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: MAHOUT-99.patch
>
>
> Improved the speed of KMeans by passing only cluster ID from mapper to 
> reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
> Also removed the implicit assumption of Combiner runs only once approach and 
> the code is modified accordingly so that it won't create a bug when combiner 
> runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to