Hi, How are you running K-Means? What is your k? What is the dimension of your dataset (columns)? Which Spark version are you using?
Thanks, Burak On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com> wrote: > Hi, > > For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of > time (16+ mints). > > It takes lot of time at this task; > > org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) > org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) > > Can this be improved? > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > >