Derrick Burns created SPARK-3220:
------------------------------------

             Summary: K-Means clusterer should perform K-Means initialization 
in parallel
                 Key: SPARK-3220
                 URL: https://issues.apache.org/jira/browse/SPARK-3220
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Derrick Burns


The LocalKMeans method should be replaced with a parallel implementation.  As 
it stands now, it becomes a bottleneck for large data sets. 

I have implemented this functionality in my version of the clusterer.  However, 
I see that there are hundreds of outstanding pull requests.  If someone on the 
team wants to sponsor the pull request, I will create one.  Otherwise, I will 
just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to