[SparkSQL] How to calculate stddev on a DataFrame?

2015-03-25 Thread Haopu Wang
Hi,

 

I have a DataFrame object and I want to do types of aggregations like
count, sum, variance, stddev, etc.

 

DataFrame has DSL to do simple aggregations like count and sum.

 

How about variance and stddev?

 

Thank you for any suggestions!

 



Re: [SparkSQL] How to calculate stddev on a DataFrame?

2015-03-25 Thread Corey Nolet
I would do sum square. This would allow you to keep an ongoing value as an
associative operation (in an aggregator) and then calculate the variance 
std deviation after the fact.

On Wed, Mar 25, 2015 at 10:28 PM, Haopu Wang hw...@qilinsoft.com wrote:

  Hi,



 I have a DataFrame object and I want to do types of aggregations like
 count, sum, variance, stddev, etc.



 DataFrame has DSL to do simple aggregations like count and sum.



 How about variance and stddev?



 Thank you for any suggestions!





Re: [SparkSQL] How to calculate stddev on a DataFrame?

2015-03-25 Thread Denny Lee
Perhaps this email reference may be able to help from a DataFrame
perspective:
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E


On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang hw...@qilinsoft.com wrote:

  Hi,



 I have a DataFrame object and I want to do types of aggregations like
 count, sum, variance, stddev, etc.



 DataFrame has DSL to do simple aggregations like count and sum.



 How about variance and stddev?



 Thank you for any suggestions!