Hi,

For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of time
(16+ mints).

It takes lot of time at this task;

org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)

Can this be improved?

-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

Reply via email to