[ 
https://issues.apache.org/jira/browse/SPARK-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945877#comment-14945877
 ] 

Seth Hendrickson commented on SPARK-10641:
------------------------------------------

[~mengxr] I submitted a PR as work in progress. I had written my implementation 
before stddev got merged in and so right now they are separate. The main 
difference is the way that the subclasses implement `evaluateExpression` (the 
lower order moments are computed the same with some syntax differences). I 
added in functionality to avoid computing higher order moments when they are 
not asked for.

The optimization you suggest for duplicate computation between skewness and 
kurtosis has not yet been addressed. I believe the same code duplication would 
occur for 

{{df.groupBy("key").agg(var("a"), avg("a"))}}

since both aggregates compute the average. We'll also have to keep an eye on 
the benchmark testing according to your comment below. Thanks for the feedback!

> skewness and kurtosis support
> -----------------------------
>
>                 Key: SPARK-10641
>                 URL: https://issues.apache.org/jira/browse/SPARK-10641
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SQL
>            Reporter: Jihong MA
>            Assignee: Seth Hendrickson
>
> Implementing skewness and kurtosis support based on following algorithm:
> https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to