Re: KMeans questions

2015-07-02 Thread Feynman Liang
SPARK-7879 https://issues.apache.org/jira/browse/SPARK-7879 seems to address your use case (running KMeans on a dataframe and having the results added as an additional column) On Wed, Jul 1, 2015 at 5:53 PM, Eric Friedman eric.d.fried...@gmail.com wrote: In preparing a DataFrame (spark 1.4) to

KMeans questions

2015-07-01 Thread Eric Friedman
In preparing a DataFrame (spark 1.4) to use with MLlib's kmeans.train method, is there a cleaner way to create the Vectors than this? data.map{r = Vectors.dense(r.getDouble(0), r.getDouble(3), r.getDouble(4), r.getDouble(5), r.getDouble(6))} Second, once I train the model and call predict on my