Re: Computing mean and standard deviation by key

2014-09-12 Thread rzykov
. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14062.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Computing mean and standard deviation by key

2014-09-12 Thread David Rowe
and std dev for Paired RDDs (key, value)? Now I'm using an approach with ReduceByKey but want to make my code more concise and readable. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14062.html

Re: Computing mean and standard deviation by key

2014-09-12 Thread Sean Owen
[Double]] .values.stats -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14065.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Computing mean and standard deviation by key

2014-09-12 Thread David Rowe
]] .values.stats -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14065.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Computing mean and standard deviation by key

2014-08-04 Thread Ron Gonzalez
          print(stddev: + stddev)           stddev         } I hope that helps -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11334.html Sent from the Apache Spark User List mailing list archive

Computing mean and standard deviation by key

2014-08-01 Thread kriskalish
going down the wrong path? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Computing mean and standard deviation by key

2014-08-01 Thread Kristopher Kalish
The reason I want an RDD is because I'm assuming that iterating the individual elements of an RDD on the driver of the cluster is much slower than coming up with the mean and standard deviation using a map-reduce-based algorithm. I don't know the intimate details of Spark's implementation, but it

Re: Computing mean and standard deviation by key

2014-08-01 Thread Sean Owen
You're certainly not iterating on the driver. The Iterable you process in your function is on the cluster and done in parallel. On Fri, Aug 1, 2014 at 8:36 PM, Kristopher Kalish k...@kalish.net wrote: The reason I want an RDD is because I'm assuming that iterating the individual elements of an

Re: Computing mean and standard deviation by key

2014-08-01 Thread Evan R. Sparks
? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Computing mean and standard deviation by key

2014-08-01 Thread Sean Owen
iterable.foreach{ y = sum = sum + y.foo count = count + 1 } val mean = sum/count; // save mean to database... } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11207.html

Re: Computing mean and standard deviation by key

2014-08-01 Thread Evan R. Sparks
mean = sum/count; // save mean to database... } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11207.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Computing mean and standard deviation by key

2014-08-01 Thread Ron Gonzalez
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11214.html Sent from the Apache Spark User List mailing list archive at Nabble.com.