Hi, For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of time (16+ mints).
It takes lot of time at this task; org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) Can this be improved? -- Thanks & regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/