Re: Spark Mllib kmeans execution

2016-03-02 Thread Sonal Goyal
It will run distributed On Mar 2, 2016 3:00 PM, "Priya Ch" wrote: > Hi All, > > I am running k-means clustering algorithm. Now, when I am running the > algorithm as - > > val conf = new SparkConf > val sc = new SparkContext(conf) > . > . > val kmeans = new

Spark Mllib kmeans execution

2016-03-02 Thread Priya Ch
Hi All, I am running k-means clustering algorithm. Now, when I am running the algorithm as - val conf = new SparkConf val sc = new SparkContext(conf) . . val kmeans = new KMeans() val model = kmeans.run(RDD[Vector]) . . . The 'kmeans' object gets created on driver. Now does *kmeans.run() *get

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2016-01-01 Thread Yanbo Liang
rtitions that don't fit on disk and read them from there >> when they are needed. >> Actually, it's not necessary to set so large driver memory in your case, >> because KMeans use low memory for driver if your k is not very large. >> >> Cheers >> Yanbo >> &g

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-31 Thread Jia Zou
GMT+08:00 Jia Zou <jacqueline...@gmail.com>: > >> I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU >> cores and 30GB memory. Executor memory is set to 15GB, and driver memory is >> set to 15GB. >> >> The observation is that, when inpu

Re: Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-30 Thread Yanbo Liang
driver memory in your case, because KMeans use low memory for driver if your k is not very large. Cheers Yanbo 2015-12-30 22:20 GMT+08:00 Jia Zou <jacqueline...@gmail.com>: > I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU > cores and 30GB memory. Executor m

Spark MLLib KMeans Performance on Amazon EC2 M3.2xlarge

2015-12-30 Thread Jia Zou
I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU cores and 30GB memory. Executor memory is set to 15GB, and driver memory is set to 15GB. The observation is that, when input data size is smaller than 15GB, the performance is quite stable. However, when input data becomes

Re: spark mllib kmeans

2015-05-21 Thread Pa Rö
i want evaluate some different distance measure for time-space clustering. so i need a api for implement my own function in java. 2015-05-19 22:08 GMT+02:00 Xiangrui Meng men...@gmail.com: Just curious, what distance measure do you need? -Xiangrui On Mon, May 11, 2015 at 8:28 AM, Jaonary

Re: spark mllib kmeans

2015-05-19 Thread Xiangrui Meng
Just curious, what distance measure do you need? -Xiangrui On Mon, May 11, 2015 at 8:28 AM, Jaonary Rabarisoa jaon...@gmail.com wrote: take a look at this https://github.com/derrickburns/generalized-kmeans-clustering Best, Jao On Mon, May 11, 2015 at 3:55 PM, Driesprong, Fokko

spark mllib kmeans

2015-05-11 Thread Pa Rö
hi, it is possible to use a custom distance measure and a other data typ as vector? i want cluster temporal geo datas. best regards paul

Re: spark mllib kmeans

2015-05-11 Thread Driesprong, Fokko
Hi Paul, I would say that it should be possible, but you'll need a different distance measure which conforms to your coordinate system. 2015-05-11 14:59 GMT+02:00 Pa Rö paul.roewer1...@googlemail.com: hi, it is possible to use a custom distance measure and a other data typ as vector? i

Spark MLLib KMeans Top Terms

2015-03-19 Thread mvsundaresan
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLib-KMeans-Top-Terms-tp22154.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user