[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139798279 LGTM, merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139736176 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139736177 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42366/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139735880 [Test build #42366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42366/console) for PR 6297 at commit [`6351fc8`](https://github.com/apache/spark/commit/6351fc89874426d2fb83606c6547cde4b64427a2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends StddevAgg(child) ` * `case class StddevPop(child: Expression) extends StddevAgg(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg(child) ` * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate ` * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 ` * `case class Stddev(child: Expression) extends StddevAgg1(child) ` * `case class StddevPop(child: Expression) extends StddevAgg1(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg1(child) ` * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 ` * `case class ComputePartialStdFunction (` * `case class MergePartialStd(` * `case class MergePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139721982 [Test build #42366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42366/consoleFull) for PR 6297 at commit [`6351fc8`](https://github.com/apache/spark/commit/6351fc89874426d2fb83606c6547cde4b64427a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139721907 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-139721904 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r39227939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -295,6 +295,60 @@ object functions { def min(columnName: String): Column = min(Column(columnName)) /** + * Aggregate function: returns the unbiased sample standard deviation + * of the expression in a group. + * + * @group agg_funcs + * @since 1.6.0 + */ + def stddev(e: Column): Column = Stddev(e.expr) + + /** + * Aggregate function: returns the unbiased sample standard deviation + * of the column in a group. + * + * @group agg_funcs + * @since 1.6.0 + */ + def stddev(columnName: String): Column = stddev(Column(columnName)) --- End diff -- We may don't want this one anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r39227597 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -249,6 +249,155 @@ case class Min(child: Expression) extends AlgebraicAggregate { override val evaluateExpression = min } +// Compute the sample standard deviation of a column +case class Stddev(child: Expression) extends StddevAgg(child) { + + override def isSample: Boolean = true + override def prettyName: String = "stddev" +} + +// Compute the population standard deviation of a column +case class StddevPop(child: Expression) extends StddevAgg(child) { + + override def isSample: Boolean = false + override def prettyName: String = "stddev_pop" +} + +// Compute the sample standard deviation of a column +case class StddevSamp(child: Expression) extends StddevAgg(child) { + + override def isSample: Boolean = true + override def prettyName: String = "stddev_samp" +} + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +abstract class StddevAgg(child: Expression) extends AlgebraicAggregate { + + override def children: Seq[Expression] = child :: Nil + + override def nullable: Boolean = true + + def isSample: Boolean + + // Return data type. + override def dataType: DataType = resultType + + // Expected input data type. + // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the + // new version at planning time (after analysis phase). For now, NullType is added at here + // to make it resolved when we have cases like `select stddev(null)`. + // We can use our analyzer to cast NullType to the default data type of the NumericType once + // we remove the old aggregate functions. Then, we will not need NullType at here. + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + + private val resultType = child.dataType match { +case DecimalType.Fixed(p, s) => --- End diff -- I think it should always return Double, because Sqrt() only works with Double, also other databases just return Double/float. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-138036449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42062/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-138036447 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-138036422 [Test build #42062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42062/console) for PR 6297 at commit [`6035648`](https://github.com/apache/spark/commit/603564855c47f081988727dd6fbab9da0ab3ff63). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends StddevAgg(child) ` * `case class StddevPop(child: Expression) extends StddevAgg(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg(child) ` * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate ` * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 ` * `case class Stddev(child: Expression) extends StddevAgg1(child) ` * `case class StddevPop(child: Expression) extends StddevAgg1(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg1(child) ` * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 ` * `case class ComputePartialStdFunction (` * `case class MergePartialStd(` * `case class MergePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-138015991 [Test build #42062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42062/consoleFull) for PR 6297 at commit [`6035648`](https://github.com/apache/spark/commit/603564855c47f081988727dd6fbab9da0ab3ff63). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-138015843 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-138015838 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137802508 R style check failure is caused by commit of SPARK-8951 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137796647 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137796648 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42006/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137796643 [Test build #42006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42006/console) for PR 6297 at commit [`a81d0fc`](https://github.com/apache/spark/commit/a81d0fc13532c9fdf484e2627f4605ff57f5046c). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends StddevAgg(child) ` * `case class StddevPop(child: Expression) extends StddevAgg(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg(child) ` * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate ` * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 ` * `case class Stddev(child: Expression) extends StddevAgg1(child) ` * `case class StddevPop(child: Expression) extends StddevAgg1(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg1(child) ` * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 ` * `case class ComputePartialStdFunction (` * `case class MergePartialStd(` * `case class MergePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137796024 [Test build #42006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42006/consoleFull) for PR 6297 at commit [`a81d0fc`](https://github.com/apache/spark/commit/a81d0fc13532c9fdf484e2627f4605ff57f5046c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137794776 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137794798 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135852765 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41748/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135852763 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135852699 [Test build #41748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41748/console) for PR 6297 at commit [`0902ceb`](https://github.com/apache/spark/commit/0902ceb5cc00e73da9ceefb38b8d7b6531033fde). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends StddevAgg(child) ` * `case class StddevPop(child: Expression) extends StddevAgg(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg(child) ` * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate ` * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 ` * `case class Stddev(child: Expression) extends StddevAgg1(child) ` * `case class StddevPop(child: Expression) extends StddevAgg1(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg1(child) ` * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 ` * `case class ComputePartialStdFunction (` * `case class MergePartialStd(` * `case class MergePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135820655 [Test build #41748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41748/consoleFull) for PR 6297 at commit [`0902ceb`](https://github.com/apache/spark/commit/0902ceb5cc00e73da9ceefb38b8d7b6531033fde). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135817886 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135817978 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135693660 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41732/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135693657 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135693383 [Test build #41732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41732/console) for PR 6297 at commit [`f4c725c`](https://github.com/apache/spark/commit/f4c725c47d179f66971ac63b18fe2c35f7432ffa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends StddevAgg(child) ` * `case class StddevPop(child: Expression) extends StddevAgg(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg(child) ` * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate ` * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 ` * `case class Stddev(child: Expression) extends StddevAgg1(child) ` * `case class StddevPop(child: Expression) extends StddevAgg1(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg1(child) ` * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 ` * `case class ComputePartialStdFunction (` * `case class MergePartialStd(` * `case class MergePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135657532 [Test build #41732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41732/consoleFull) for PR 6297 at commit [`f4c725c`](https://github.com/apache/spark/commit/f4c725c47d179f66971ac63b18fe2c35f7432ffa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135657176 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135657157 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135655316 [Test build #41730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41730/console) for PR 6297 at commit [`25425ac`](https://github.com/apache/spark/commit/25425ac742e231bd45b62ee31bf217a94e568e55). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends StddevAgg(child) ` * `case class StddevPop(child: Expression) extends StddevAgg(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg(child) ` * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate ` * `abstract class StddevAgg1(child: Expression) extends UnaryExpression with PartialAggregate1 ` * `case class Stddev(child: Expression) extends StddevAgg1(child) ` * `case class StddevPop(child: Expression) extends StddevAgg1(child) ` * `case class StddevSamp(child: Expression) extends StddevAgg1(child) ` * `case class ComputePartialStd(child: Expression) extends UnaryExpression with AggregateExpression1 ` * `case class ComputePartialStdFunction (` * `case class MergePartialStd(child: Expression, isSample: Boolean) extends UnaryExpression with AggregateExpression1 ` * `case class MergePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135655318 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135655321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41730/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135654368 [Test build #41730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41730/consoleFull) for PR 6297 at commit [`25425ac`](https://github.com/apache/spark/commit/25425ac742e231bd45b62ee31bf217a94e568e55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135650454 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-135650541 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-125978916 @JihongMA Will you get time to implement the function based on the new API? It will be good if we can merge it before the 1.5 deadline for new features (end of this month). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124724353 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124724335 [Test build #38399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38399/console) for PR 6297 at commit [`87fd2dc`](https://github.com/apache/spark/commit/87fd2dcf3720d25106d265333089120b377de655). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class InternalRow extends Serializable ` * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` * `class GenericRow(protected[sql] val values: Array[Any]) extends Row ` * `class GenericInternalRow(protected[sql] val values: Array[Any]) extends InternalRow ` * `class GenericInternalRowWithSchema(values: Array[Any], val schema: StructType)` * `class GenericMutableRow(val values: Array[Any]) extends MutableRow ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124722519 [Test build #38399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38399/consoleFull) for PR 6297 at commit [`87fd2dc`](https://github.com/apache/spark/commit/87fd2dcf3720d25106d265333089120b377de655). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124720733 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124720700 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124720386 Please don't test it yet, need to make change to accomodate API change introduced by other JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121797696 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121797646 [Test build #37428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37428/console) for PR 6297 at commit [`a4cfe74`](https://github.com/apache/spark/commit/a4cfe747e29868c7a733f4b88329255ab208). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121785283 [Test build #37428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37428/consoleFull) for PR 6297 at commit [`a4cfe74`](https://github.com/apache/spark/commit/a4cfe747e29868c7a733f4b88329255ab208). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121784799 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121784886 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121773151 [Test build #37411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37411/console) for PR 6297 at commit [`43fb84f`](https://github.com/apache/spark/commit/43fb84f5a6a55c65b9e262ef9ce573c78073f557). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121773207 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121757829 [Test build #37411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37411/consoleFull) for PR 6297 at commit [`43fb84f`](https://github.com/apache/spark/commit/43fb84f5a6a55c65b9e262ef9ce573c78073f557). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121757701 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121757668 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121727475 [Test build #37398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37398/console) for PR 6297 at commit [`1ca4373`](https://github.com/apache/spark/commit/1ca437381c8eae045e234eacd780ffd1d2ff9f48). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121727545 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r34717145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: AggregateExpression) extends Agg if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else null } } + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] { + override def nullable: Boolean = true + override def dataType: DataType = child.dataType match { +case DecimalType.Fixed(_, _) | DecimalType.Unlimited => + DecimalType.Unlimited +case _ => + DoubleType + } + override def toString: String = s"STDDEV($child)" + override def asPartial: SplitEvaluation = { +val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() +SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) + } + override def newInstance(): StddevFunction = new StddevFunction(child, this) +} + +case class ComputePartialStd(child: Expression) extends AggregateExpression { +def this() = this(null) + +override def children: Seq[Expression] = child :: Nil +override def nullable: Boolean = false +override def dataType: DataType = child.dataType match { + case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited) + case _ => ArrayType(DoubleType) +} +override def toString: String = s"computePartialStddev($child)" +override def newInstance(): ComputePartialStdFunction = + new ComputePartialStdFunction(child, this) +} + +case class CombinePartialStd(child: Expression) extends AggregateExpression { + def this() = this(null) + + override def children: Seq[Expression] = child:: Nil + override def nullable: Boolean = false + override def dataType: DataType = child.dataType match { +case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited +case _ => DoubleType + } + override def toString: String = s"CombinePartialStd($child)" + override def newInstance(): CombinePartialStdFunction = { +new CombinePartialStdFunction(child, this) + } +} + +case class ComputePartialStdFunction ( +expr: Expression, +base: AggregateExpression +) extends AggregateFunction { + def this() = this(null, null) // Required for serialization + + private val computeType = expr.dataType --- End diff -- the result of ComputePartialStd is array of (partial count, partial avg, partial Mk), the input data type of expr will be either Decimal.Unlimited or Double depending on if Child is Decimal type or any other numeric type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r34714426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: AggregateExpression) extends Agg if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else null } } + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] { + override def nullable: Boolean = true + override def dataType: DataType = child.dataType match { +case DecimalType.Fixed(_, _) | DecimalType.Unlimited => + DecimalType.Unlimited +case _ => + DoubleType + } + override def toString: String = s"STDDEV($child)" + override def asPartial: SplitEvaluation = { +val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() +SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) + } + override def newInstance(): StddevFunction = new StddevFunction(child, this) +} + +case class ComputePartialStd(child: Expression) extends AggregateExpression { +def this() = this(null) + +override def children: Seq[Expression] = child :: Nil +override def nullable: Boolean = false +override def dataType: DataType = child.dataType match { + case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited) + case _ => ArrayType(DoubleType) +} +override def toString: String = s"computePartialStddev($child)" +override def newInstance(): ComputePartialStdFunction = + new ComputePartialStdFunction(child, this) +} + +case class CombinePartialStd(child: Expression) extends AggregateExpression { + def this() = this(null) + + override def children: Seq[Expression] = child:: Nil + override def nullable: Boolean = false + override def dataType: DataType = child.dataType match { +case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited +case _ => DoubleType + } + override def toString: String = s"CombinePartialStd($child)" + override def newInstance(): CombinePartialStdFunction = { +new CombinePartialStdFunction(child, this) + } +} + +case class ComputePartialStdFunction ( +expr: Expression, +base: AggregateExpression +) extends AggregateFunction { + def this() = this(null, null) // Required for serialization + + private val computeType = expr.dataType --- End diff -- I guess I missed something. What is `expr` at here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121707474 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121707803 [Test build #37398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37398/consoleFull) for PR 6297 at commit [`1ca4373`](https://github.com/apache/spark/commit/1ca437381c8eae045e234eacd780ffd1d2ff9f48). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121707510 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r34706204 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: AggregateExpression) extends Agg if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else null } } + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] { + override def nullable: Boolean = true + override def dataType: DataType = child.dataType match { +case DecimalType.Fixed(_, _) | DecimalType.Unlimited => + DecimalType.Unlimited +case _ => + DoubleType + } + override def toString: String = s"STDDEV($child)" + override def asPartial: SplitEvaluation = { +val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() +SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) + } + override def newInstance(): StddevFunction = new StddevFunction(child, this) +} + +case class ComputePartialStd(child: Expression) extends AggregateExpression { +def this() = this(null) + +override def children: Seq[Expression] = child :: Nil +override def nullable: Boolean = false +override def dataType: DataType = child.dataType match { + case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited) + case _ => ArrayType(DoubleType) +} +override def toString: String = s"computePartialStddev($child)" +override def newInstance(): ComputePartialStdFunction = + new ComputePartialStdFunction(child, this) +} + +case class CombinePartialStd(child: Expression) extends AggregateExpression { + def this() = this(null) + + override def children: Seq[Expression] = child:: Nil + override def nullable: Boolean = false + override def dataType: DataType = child.dataType match { +case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited +case _ => DoubleType + } + override def toString: String = s"CombinePartialStd($child)" + override def newInstance(): CombinePartialStdFunction = { +new CombinePartialStdFunction(child, this) + } +} + +case class ComputePartialStdFunction ( +expr: Expression, +base: AggregateExpression +) extends AggregateFunction { + def this() = this(null, null) // Required for serialization + + private val computeType = expr.dataType --- End diff -- @yhuai at line 777 the datatype of the expr is casted to the type which will be used to compute the partial result. override def dataType: DataType = child.dataType match { case DecimalType.Fixed(_, _) | DecimalType.Unlimited => DecimalType.Unlimited case _ => DoubleType } override def toString: String = s"STDDEV($child)" override def asPartial: SplitEvaluation = { val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r34627743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: AggregateExpression) extends Agg if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else null } } + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] { + override def nullable: Boolean = true + override def dataType: DataType = child.dataType match { +case DecimalType.Fixed(_, _) | DecimalType.Unlimited => + DecimalType.Unlimited +case _ => + DoubleType + } + override def toString: String = s"STDDEV($child)" + override def asPartial: SplitEvaluation = { +val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() +SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) + } + override def newInstance(): StddevFunction = new StddevFunction(child, this) +} + +case class ComputePartialStd(child: Expression) extends AggregateExpression { +def this() = this(null) + +override def children: Seq[Expression] = child :: Nil +override def nullable: Boolean = false +override def dataType: DataType = child.dataType match { + case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited) + case _ => ArrayType(DoubleType) +} +override def toString: String = s"computePartialStddev($child)" +override def newInstance(): ComputePartialStdFunction = + new ComputePartialStdFunction(child, this) +} + +case class CombinePartialStd(child: Expression) extends AggregateExpression { + def this() = this(null) + + override def children: Seq[Expression] = child:: Nil + override def nullable: Boolean = false + override def dataType: DataType = child.dataType match { +case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited +case _ => DoubleType + } + override def toString: String = s"CombinePartialStd($child)" + override def newInstance(): CombinePartialStdFunction = { +new CombinePartialStdFunction(child, this) + } +} + +case class ComputePartialStdFunction ( +expr: Expression, +base: AggregateExpression +) extends AggregateFunction { + def this() = this(null, null) // Required for serialization + + private val computeType = expr.dataType --- End diff -- Is `computeType` used for intermediate result? Why it just takes the datatype of the expr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121405267 @JihongMA Can you change your test to use fixed precision decimal types (or double type) for now? We need a fundamental fix of our decimal types. I think we should not block this work on that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121399547 [Test build #37267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37267/console) for PR 6297 at commit [`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121399587 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121394398 [Test build #37267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37267/consoleFull) for PR 6297 at commit [`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121392628 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121392716 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121392552 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121391850 #7212 has been merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121385974 I will merge https://github.com/apache/spark/pull/7212 soon. So this one will be unblocked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121326693 Thanks for testing out the code changes. the test failure is caused by SPARK-8800 and waiting for the fix to be merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121323391 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121323347 [Test build #37243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37243/console) for PR 6297 at commit [`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121318863 [Test build #37243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37243/consoleFull) for PR 6297 at commit [`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121318114 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121318157 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121072605 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-117799636 the issue introduced by SPARK-8359 was fixed via SPARK-8677, but causing accuracy issue over Decimal data, that issue need to be fixed first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-117268484 sorry, the code is not ready to be merged as I noticed one more issue with Decimal type, fixing it and will let you know once I am ready plus code style fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-116959293 [Test build #36089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36089/console) for PR 6297 at commit [`e46c964`](https://github.com/apache/spark/commit/e46c9648aeede81680ca091ee6860ff7d9766cfa). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class ComputePartialStd(child: Expression) extends AggregateExpression ` * `case class CombinePartialStd(child: Expression) extends AggregateExpression ` * `case class ComputePartialStdFunction (` * `case class CombinePartialStdFunction(` * `case class StddevFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-116959304 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-116957360 [Test build #36089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36089/consoleFull) for PR 6297 at commit [`e46c964`](https://github.com/apache/spark/commit/e46c9648aeede81680ca091ee6860ff7d9766cfa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-116955519 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-116955534 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-116955337 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-115977926 while preparing the code change to address review comments. I noticed the fix for SPARK-8359 is causing issue with decimal type, I put a comment there on that JIRA and hoping them to fix it as one of the test case I added for testing decimal type is failing due to the fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-115422767 I will incorporate the comments shortly. Thank you Michael for reviewing the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user sujkh85 commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33313527 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest { val describeResult = Seq( Row("count", "4", "4"), Row("mean","33.0","178.0"), - Row("stddev", "16.583123951777", "10.0"), + Row("stddev", "19.148542155126762", "11.547005383792516"), --- End diff -- NAVER - http://www.naver.com/ su...@naver.com ëê» ë³´ë´ì ë©ì¼ ì´ ë¤ìê³¼ ê°ì ì´ì ë¡ ì ì¡ ì¤í¨íìµëë¤. ë°ë ì¬ëì´ íìëì ë©ì¼ì ìì ì°¨ë¨ íììµëë¤. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33313504 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest { val describeResult = Seq( Row("count", "4", "4"), Row("mean","33.0","178.0"), - Row("stddev", "16.583123951777", "10.0"), + Row("stddev", "19.148542155126762", "11.547005383792516"), --- End diff -- in the case it is (count -1) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33313463 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest { val describeResult = Seq( Row("count", "4", "4"), Row("mean","33.0","178.0"), - Row("stddev", "16.583123951777", "10.0"), + Row("stddev", "19.148542155126762", "11.547005383792516"), --- End diff -- Yes, the answer was wrong before, the calculation should divide n-1 not n (number of rows) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-114325358 Thanks for working on this! I made some style comments, mostly from stuff you can find here: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide One thing I'll add is that we are hoping to do a pretty large refactoring of aggregates in [SPARK-4233](https://issues.apache.org/jira/browse/SPARK-4233). If you don't have time to update now, it might be easier to close this for now and reopen once that refactoring is done (hopefully in a few weeks). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33001725 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest { val describeResult = Seq( Row("count", "4", "4"), Row("mean","33.0","178.0"), - Row("stddev", "16.583123951777", "10.0"), + Row("stddev", "19.148542155126762", "11.547005383792516"), --- End diff -- Were we calculating the wrong answer before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33001587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -746,3 +746,219 @@ case class LastFunction(expr: Expression, base: AggregateExpression) extends Agg if (result != null) expr.eval(result.asInstanceOf[Row]) else null } } + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] { + + override def nullable: Boolean = true + override def dataType: DataType = child.dataType match { +case DecimalType.Fixed(_, _) | DecimalType.Unlimited => + DecimalType.Unlimited +case _=> + DoubleType + } + override def toString: String = s"STDDEV($child)" + override def asPartial: SplitEvaluation = { +val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() +SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) + } + override def newInstance(): StddevFunction = new StddevFunction(child, this) +} + +case class ComputePartialStd(child: Expression) extends AggregateExpression { +def this() = this(null) + +override def children: Seq[Expression] = child :: Nil +override def nullable: Boolean = false +override def dataType: DataType = child.dataType match { + case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited) + case _ => ArrayType(DoubleType) +} +override def toString: String = s"computePartialStddev($child)" +override def newInstance(): ComputePartialStdFunction = + new ComputePartialStdFunction(child, this) +} + +case class CombinePartialStd(child: Expression) extends AggregateExpression { + def this() = this(null) + + override def children: Seq[Expression] = child:: Nil + override def nullable: Boolean = false + override def dataType: DataType = child.dataType match { +case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited +case _ => DoubleType + } + override def toString: String = s"CombinePartialStd($child)" + override def newInstance(): CombinePartialStdFunction = { +new CombinePartialStdFunction(child, this) + } +} + +case class ComputePartialStdFunction ( + expr: Expression, + base: AggregateExpression +) extends AggregateFunction { + def this() = this(null, null) // Required for serialization + + private val computeType = expr.dataType + private val zero = Cast(Literal(0), computeType) + private var partialCount: Long = 0L + + // the mean of data processed so far + private val partialAvg :MutableLiteral = MutableLiteral(zero.eval(null), computeType) + + // update average based on this formula: + // avg = avg + (value - avg)/count + private def avgAddFunction (value: Literal) : Expression= { +val delta = Subtract(Cast(value, computeType), partialAvg) +Add (partialAvg, Divide(delta, Cast(Literal(partialCount), computeType))) + } + + // the sum of squares of difference from mean + private val partialMk :MutableLiteral = MutableLiteral(zero.eval(null), computeType) + + // update sum of square of difference from mean based on following formula: + // Mk = Mk + (value - preAvg) * (value - updatedAvg) + private def mkAddFunction(value: Literal, prePartialAvg: MutableLiteral) : Expression = { +val delta1 = Subtract(Cast(value, computeType), prePartialAvg) +val delta2 = Subtract(Cast(value, computeType), partialAvg) +Add(partialMk, Multiply(delta1, delta2)) + } + + override def update(input: Row): Unit = { +val evaluatedExpr = expr.eval(input) +if (evaluatedExpr != null) { + val exprValue = Literal.create(evaluatedExpr, expr.dataType) + val prePartialAvg = partialAvg.copy() + partialCount += 1 + partialAvg.update(avgAddFunction(exprValue), input) + partialMk.update(mkAddFunction(exprValue, prePartialAvg), input) +} + } + + override def eval(input: Row): Any = { +Seq(Cast(Literal(partialCount), computeType).eval(null), +partialAvg.eval(null), +partialMk.eval(null)) + } +} + +case class CombinePartialStdFunction( + expr: Expression, --- End diff -- indent 4 spaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if th
[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33001595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -746,3 +746,219 @@ case class LastFunction(expr: Expression, base: AggregateExpression) extends Agg if (result != null) expr.eval(result.asInstanceOf[Row]) else null } } + +// Compute standard deviation based on online algorithm specified here: +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance +case class Stddev(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] { + + override def nullable: Boolean = true + override def dataType: DataType = child.dataType match { +case DecimalType.Fixed(_, _) | DecimalType.Unlimited => + DecimalType.Unlimited +case _=> + DoubleType + } + override def toString: String = s"STDDEV($child)" + override def asPartial: SplitEvaluation = { +val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), "PartialStddev")() +SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd :: Nil) + } + override def newInstance(): StddevFunction = new StddevFunction(child, this) +} + +case class ComputePartialStd(child: Expression) extends AggregateExpression { +def this() = this(null) + +override def children: Seq[Expression] = child :: Nil +override def nullable: Boolean = false +override def dataType: DataType = child.dataType match { + case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited) + case _ => ArrayType(DoubleType) +} +override def toString: String = s"computePartialStddev($child)" +override def newInstance(): ComputePartialStdFunction = + new ComputePartialStdFunction(child, this) +} + +case class CombinePartialStd(child: Expression) extends AggregateExpression { + def this() = this(null) + + override def children: Seq[Expression] = child:: Nil + override def nullable: Boolean = false + override def dataType: DataType = child.dataType match { +case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited +case _ => DoubleType + } + override def toString: String = s"CombinePartialStd($child)" + override def newInstance(): CombinePartialStdFunction = { +new CombinePartialStdFunction(child, this) + } +} + +case class ComputePartialStdFunction ( + expr: Expression, + base: AggregateExpression +) extends AggregateFunction { + def this() = this(null, null) // Required for serialization + + private val computeType = expr.dataType + private val zero = Cast(Literal(0), computeType) + private var partialCount: Long = 0L + + // the mean of data processed so far + private val partialAvg :MutableLiteral = MutableLiteral(zero.eval(null), computeType) + + // update average based on this formula: + // avg = avg + (value - avg)/count + private def avgAddFunction (value: Literal) : Expression= { +val delta = Subtract(Cast(value, computeType), partialAvg) +Add (partialAvg, Divide(delta, Cast(Literal(partialCount), computeType))) + } + + // the sum of squares of difference from mean + private val partialMk :MutableLiteral = MutableLiteral(zero.eval(null), computeType) + + // update sum of square of difference from mean based on following formula: + // Mk = Mk + (value - preAvg) * (value - updatedAvg) + private def mkAddFunction(value: Literal, prePartialAvg: MutableLiteral) : Expression = { +val delta1 = Subtract(Cast(value, computeType), prePartialAvg) +val delta2 = Subtract(Cast(value, computeType), partialAvg) +Add(partialMk, Multiply(delta1, delta2)) + } + + override def update(input: Row): Unit = { +val evaluatedExpr = expr.eval(input) +if (evaluatedExpr != null) { + val exprValue = Literal.create(evaluatedExpr, expr.dataType) + val prePartialAvg = partialAvg.copy() + partialCount += 1 + partialAvg.update(avgAddFunction(exprValue), input) + partialMk.update(mkAddFunction(exprValue, prePartialAvg), input) +} + } + + override def eval(input: Row): Any = { +Seq(Cast(Literal(partialCount), computeType).eval(null), +partialAvg.eval(null), +partialMk.eval(null)) + } +} + +case class CombinePartialStdFunction( + expr: Expression, + base: AggregateExpression +) extends AggregateFunction { + def this() = this (null, null) // Required for serialization + + private val computeType = expr.dataType match { +case ArrayType(DecimalType.U