Hi Xingrui,
I have create JIRA https://issues.apache.org/jira/browse/SPARK-6706, and
attached the sample code. But I could not attache the test data. I will
update the bug once I found a place to host the test data.
Thanks,
David
On Tue, Mar 31, 2015 at 8:18 AM Xiangrui Meng men...@gmail.com
This PR updated the k-means|| initialization:
https://github.com/apache/spark/commit/ca7910d6dd7693be2a675a0d6a6fcc9eb0aaeb5d,
which was included in 1.3.0. It should fix kmean|| initialization with
large k. Please create a JIRA for this issue and send me the code and the
dataset to produce this
Hi,
I have opened a couple of threads asking about k-means performance problem
in Spark. I think I made a little progress.
Previous I use the simplest way of KMeans.train(rdd, k, maxIterations). It
uses the kmeans|| initialization algorithm which supposedly to be a
faster version of kmeans++ and