OH, the job I talked about has ran more than 11 hrs without a result...it doesn't make sense.
On Fri, Mar 27, 2015 at 9:48 AM Xi Shen <davidshe...@gmail.com> wrote: > Hi Burak, > > My iterations is set to 500. But I think it should also stop of the > centroid coverages, right? > > My spark is 1.2.0, working in windows 64 bit. My data set is about 40k > vectors, each vector has about 300 features, all normalised. All work node > have sufficient memory and disk space. > > Thanks, > David > > On Fri, 27 Mar 2015 02:48 Burak Yavuz <brk...@gmail.com> wrote: > >> Hi David, >> >> When the number of runs are large and the data is not properly >> partitioned, it seems that K-Means is hanging according to my experience. >> Especially setting the number of runs to something high drastically >> increases the work in executors. If that's not the case, can you give more >> info on what Spark version you are using, your setup, and your dataset? >> >> Thanks, >> Burak >> On Mar 26, 2015 5:10 AM, "Xi Shen" <davidshe...@gmail.com> wrote: >> >>> Hi, >>> >>> When I run k-means cluster with Spark, I got this in the last two lines >>> in the log: >>> >>> 15/03/26 11:42:42 INFO spark.ContextCleaner: Cleaned broadcast 26 >>> 15/03/26 11:42:42 INFO spark.ContextCleaner: Cleaned shuffle 5 >>> >>> >>> >>> Then it hangs for a long time. There's no active job. The driver machine >>> is idle. I cannot access the work node, I am not sure if they are busy. >>> >>> I understand k-means may take a long time to finish. But why no active >>> job? no log? >>> >>> >>> Thanks, >>> David >>> >>>