[ https://issues.apache.org/jira/browse/SPARK-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-3424: --------------------------------- Assignee: Derrick Burns > KMeans Plus Plus is too slow > ---------------------------- > > Key: SPARK-3424 > URL: https://issues.apache.org/jira/browse/SPARK-3424 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.0.2 > Reporter: Derrick Burns > Assignee: Derrick Burns > > The KMeansPlusPlus algorithm is implemented in time O( m k^2), where m is > the rounds of the KMeansParallel algorithm and k is the number of clusters. > This can be dramatically improved by maintaining the distance the closest > cluster center from round to round and then incrementally updating that value > for each point. This incremental update is O(1) time, this reduces the > running time for K Means Plus Plus to O( m k ). For large k, this is > significant. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org