[ 
https://issues.apache.org/jira/browse/SPARK-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath closed SPARK-6000.
---------------------------------
    Resolution: Duplicate

> Batch K-Means clusters should support "mini-batch" updates
> ----------------------------------------------------------
>
>                 Key: SPARK-6000
>                 URL: https://issues.apache.org/jira/browse/SPARK-6000
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.2.1
>            Reporter: Derrick Burns
>            Priority: Minor
>
> One of the ways of improving the performance of the K-means clustering 
> algorithm is to sample the points on each round of the Lloyd's algorithm and 
> to only use those samples to update the cluster centers.  (Note that this is 
> similar to the update algorithm of streaming K-means.)  The Spark K-Means 
> clusterer should support the mini-batch algorithm for large data sets. 
> The K-Means implementation at 
> https://github.com/derrickburns/generalized-kmeans-clustering supports the 
> mini-batch algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to