Re: Why k-means cluster hang for a long time?

Xi Shen Thu, 26 Mar 2015 16:03:50 -0700

OH, the job I talked about has ran more than 11 hrs without a result...it
doesn't make sense.



On Fri, Mar 27, 2015 at 9:48 AM Xi Shen <davidshe...@gmail.com> wrote:

> Hi Burak,
>
> My iterations is set to 500. But I think it should also stop of the
> centroid coverages, right?
>
> My spark is 1.2.0, working in windows 64 bit. My data set is about 40k
> vectors, each vector has about 300 features, all normalised. All work node
> have sufficient memory and disk space.
>
> Thanks,
> David
>
> On Fri, 27 Mar 2015 02:48 Burak Yavuz <brk...@gmail.com> wrote:
>
>> Hi David,
>>
>> When the number of runs are large and the data is not properly
>> partitioned, it seems that K-Means is hanging according to my experience.
>> Especially setting the number of runs to something high drastically
>> increases the work in executors. If that's not the case, can you give more
>> info on what Spark version you are using, your setup, and your dataset?
>>
>> Thanks,
>> Burak
>> On Mar 26, 2015 5:10 AM, "Xi Shen" <davidshe...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> When I run k-means cluster with Spark, I got this in the last two lines
>>> in the log:
>>>
>>> 15/03/26 11:42:42 INFO spark.ContextCleaner: Cleaned broadcast 26
>>> 15/03/26 11:42:42 INFO spark.ContextCleaner: Cleaned shuffle 5
>>>
>>>
>>>
>>> Then it hangs for a long time. There's no active job. The driver machine
>>> is idle. I cannot access the work node, I am not sure if they are busy.
>>>
>>> I understand k-means may take a long time to finish. But why no active
>>> job? no log?
>>>
>>>
>>> Thanks,
>>> David
>>>
>>>

Re: Why k-means cluster hang for a long time?

Reply via email to