Re: KMeans for large training data

2014-07-12 Thread durin
for this behavior? Best regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407p9508.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KMeans for large training data

2014-07-12 Thread Aaron Davidson
-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407p9509.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

KMeans for large training data

2014-07-11 Thread durin
code (where it gets slow) is this: What could I do to use more executors, and generally speed this up? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407.html Sent from the Apache Spark User List mailing list archive

Re: KMeans for large training data

2014-07-11 Thread Sean Owen
it wrong. The relevant code (where it gets slow) is this: What could I do to use more executors, and generally speed this up? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-for-large-training-data-tp9407.html Sent from the Apache Spark User

Re: KMeans for large training data

2014-07-11 Thread Sean Owen
On Fri, Jul 11, 2014 at 7:32 PM, durin m...@simon-schaefer.net wrote: How would you get more partitions? You can specify this as the second arg to methods that read your data originally, like: sc.textFile(..., 20) I ran broadcastVector.value.repartition(5), but