wenweijian created SPARK-43297:
----------------------------------

             Summary: Make improvement to LocalKMeans
                 Key: SPARK-43297
                 URL: https://issues.apache.org/jira/browse/SPARK-43297
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 3.3.0
            Reporter: wenweijian


There are two initializationMode in Kmeans, random mode and parallel mode.

The ParallelMode is using kmeansPlusPlus to generate the centers point, but the 
kMeansPlusPlus is a local method which runs in the driver.

If the scale of points is huge, the kMeansPlusPlus will run for a long time, 
because it is a single thread method running in the driiver.

We can make this method run in parallel to make it faster, such as using 
Parallel collections. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to