Re: getting the cluster elements from kmeans run

2015-02-11 Thread Suneel Marthi
KMeansModel only returns the cluster centroids. To get the # of elements in each cluster, try calling kmeans.predict() on each of the points in the data used to build the model. See

Re: K-Means final cluster centers

2015-02-05 Thread Suneel Marthi
There's a kMeansModel.clusterCenters() available if u r looking to get the centers from KMeansModel. From: SK skrishna...@gmail.com To: user@spark.apache.org Sent: Thursday, February 5, 2015 5:35 PM Subject: K-Means final cluster centers Hi, I am trying to get the final cluster

Re: Row similarities

2015-01-17 Thread Suneel Marthi
Andrew, u would be better off using Mahout's RowSimilarityJob for what u r trying to accomplish.  1.  It does give u pair-wise distances 2.  U can specify the Distance measure u r looking to use 3.  There's the old MapReduce impl and the Spark DSL impl per ur preference. From: Andrew

Re: Clustering text data with MLlib

2014-12-29 Thread Suneel Marthi
Here's the Streaming KMeans from Spark 1.2http://spark.apache.org/docs/latest/mllib-clustering.html#examples-1 Steaming KMeans still needs an initial 'k' to be specified, it then progresses to come up with an optimal 'k' IIRC. From: Sean Owen so...@cloudera.com To: jatinpreet

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Suneel Marthi
Mahout does have a kmeans which can be executed in mapreduce and iterative modes. Sent from my iPhone On Mar 25, 2014, at 9:25 AM, Prashant Sharma scrapco...@gmail.com wrote: I think Mahout uses FuzzyKmeans, which is different algorithm and it is not iterative. Prashant Sharma On