Hi David, Can you also try with Spark 1.3 if possible? I believe there was a 2x improvement on K-Means between 1.2 and 1.3.
Thanks, Burak On Sat, Mar 28, 2015 at 9:04 PM, davidshen84 <davidshe...@gmail.com> wrote: > Hi Jao, > > Sorry to pop up this old thread. I am have the same problem like you did. I > want to know if you have figured out how to improve k-means on Spark. > > I am using Spark 1.2.0. My data set is about 270k vectors, each has about > 350 dimensions. If I set k=500, the job takes about 3hrs on my cluster. The > cluster has 7 executors, each has 8 cores... > > If I set k=5000 which is the required value for my task, the job goes on > forever... > > > Thanks, > David > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Why-KMeans-with-mllib-is-so-slow-tp20480p22273.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >