[ https://issues.apache.org/jira/browse/SPARK-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945877#comment-14945877 ]
Seth Hendrickson commented on SPARK-10641: ------------------------------------------ [~mengxr] I submitted a PR as work in progress. I had written my implementation before stddev got merged in and so right now they are separate. The main difference is the way that the subclasses implement `evaluateExpression` (the lower order moments are computed the same with some syntax differences). I added in functionality to avoid computing higher order moments when they are not asked for. The optimization you suggest for duplicate computation between skewness and kurtosis has not yet been addressed. I believe the same code duplication would occur for {{df.groupBy("key").agg(var("a"), avg("a"))}} since both aggregates compute the average. We'll also have to keep an eye on the benchmark testing according to your comment below. Thanks for the feedback! > skewness and kurtosis support > ----------------------------- > > Key: SPARK-10641 > URL: https://issues.apache.org/jira/browse/SPARK-10641 > Project: Spark > Issue Type: New Feature > Components: ML, SQL > Reporter: Jihong MA > Assignee: Seth Hendrickson > > Implementing skewness and kurtosis support based on following algorithm: > https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org