Hi all, I am currently working on some K means clustering project. I want to get the distances of each data point to it's cluster center after building the K means model. Currently I get the cluster centers of each data point by sending the JavaRDD<Vector> which includes all the data points to K means predict function. Then It returns a JavaRDD<Integer> which consist of all cluster Indexes of each data point. Then I convert those JavaRDD's to lists using collect function and use them to calculate Euclidean distances. But since this process involve a collect function seems like it's more time consuming.
Is there any other efficient way to calculate these distances of each data points to their cluster centers? And also I want to know the distance measure that Spark K means algorithm use to build the model. Is it euclidean distance or squared euclidean distance? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Distance-Calculation-in-Spark-K-means-clustering-tp24516.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org