Distance Calculation in Spark K means clustering

ashensw Sun, 30 Aug 2015 23:59:28 -0700

Hi all,

I am currently working on some K means clustering project. I want to get the
distances of each data point to it's cluster center after building the K
means model. Currently I get the cluster centers of each data point by
sending the JavaRDD<Vector> which includes all the data points to K means
predict function. Then It returns a JavaRDD<Integer> which consist of all
cluster Indexes of each data point. Then I convert those JavaRDD's to lists
using collect function and use them to calculate Euclidean distances. But
since this process involve a collect function seems like it's more time
consuming.


Is there any other efficient way to calculate these distances of each data
points to their cluster centers? 

And also I want to know the distance measure that Spark K means algorithm
use to build the model. Is it euclidean distance or squared euclidean
distance?

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Distance-Calculation-in-Spark-K-means-clustering-tp24516.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Distance Calculation in Spark K means clustering

Reply via email to