[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140434#comment-15140434 ]
Tsai Li Ming commented on SPARK-3220: ------------------------------------- [~derrickburns], Is your private fork at https://github.com/derrickburns/generalized-kmeans-clustering ? I am having the same problem here: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Kmeans-using-1-core-only-Was-Slowness-in-Kmeans-calculating-fastSquaredDistance-td16304.html > K-Means clusterer should perform K-Means initialization in parallel > ------------------------------------------------------------------- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org