So you were able to execute the minimal example I posted? I mean that the application doesn't progresses, it hangs (I would be OK if it was just slower). It doesn't seem to me a configuration issue.
On Fri, Sep 2, 2016 at 1:07 AM, Sean Owen <so...@cloudera.com> wrote: > Hm, what do you mean? k-means|| init is certainly slower because it's > making passes over the data in order to pick better initial centroids. > The idea is that you might then spend fewer iterations converging > later, and converge to a better clustering. > > Your problem doesn't seem to be related to scale. You aren't even > running out of memory it seems. Your memory settings are causing YARN > to kill the executors for using more memory than they advertise. That > could mean it never proceeds if this happens a lot. > > I don't have any problems with it. > > On Thu, Sep 1, 2016 at 11:35 PM, Georgios Samaras > <georgesamaras...@gmail.com> wrote: > > Dear all, > > > > the random initialization works well, but the default initialization is > > k-means|| and has made me struggle. Also, I had heard people one year ago > > struggling with it too, and everybody would just skip it and use random, > but > > I cannot keep it inside me! > > > > I have posted a minimal example here.. > > > > Please advice, > > George Samaras >