[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6297


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-12 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139798279
  
LGTM, merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139736176
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139736177
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42366/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139735880
  
  [Test build #42366 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42366/console)
 for   PR 6297 at commit 
[`6351fc8`](https://github.com/apache/spark/commit/6351fc89874426d2fb83606c6547cde4b64427a2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends StddevAgg(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
  * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate 
`
  * `abstract class StddevAgg1(child: Expression) extends UnaryExpression 
with PartialAggregate1 `
  * `case class Stddev(child: Expression) extends StddevAgg1(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
  * `case class ComputePartialStd(child: Expression) extends 
UnaryExpression with AggregateExpression1 `
  * `case class ComputePartialStdFunction (`
  * `case class MergePartialStd(`
  * `case class MergePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139721982
  
  [Test build #42366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42366/consoleFull)
 for   PR 6297 at commit 
[`6351fc8`](https://github.com/apache/spark/commit/6351fc89874426d2fb83606c6547cde4b64427a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139721907
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-139721904
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-10 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r39227939
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -295,6 +295,60 @@ object functions {
   def min(columnName: String): Column = min(Column(columnName))
 
   /**
+   * Aggregate function: returns the unbiased sample standard deviation
+   * of the expression in a group.
+   *
+   * @group agg_funcs
+   * @since 1.6.0
+   */
+  def stddev(e: Column): Column = Stddev(e.expr)
+
+  /**
+   * Aggregate function: returns the unbiased sample standard deviation
+   * of the column in a group.
+   *
+   * @group agg_funcs
+   * @since 1.6.0
+   */
+  def stddev(columnName: String): Column = stddev(Column(columnName))
--- End diff --

We may don't want this one anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-10 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r39227597
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -249,6 +249,155 @@ case class Min(child: Expression) extends 
AlgebraicAggregate {
   override val evaluateExpression = min
 }
 
+// Compute the sample standard deviation of a column
+case class Stddev(child: Expression) extends StddevAgg(child) {
+
+  override def isSample: Boolean = true
+  override def prettyName: String = "stddev"
+}
+
+// Compute the population standard deviation of a column
+case class StddevPop(child: Expression) extends StddevAgg(child) {
+
+  override def isSample: Boolean = false
+  override def prettyName: String = "stddev_pop"
+}
+
+// Compute the sample standard deviation of a column
+case class StddevSamp(child: Expression) extends StddevAgg(child) {
+
+  override def isSample: Boolean = true
+  override def prettyName: String = "stddev_samp"
+}
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+abstract class StddevAgg(child: Expression) extends AlgebraicAggregate {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def nullable: Boolean = true
+
+  def isSample: Boolean
+
+  // Return data type.
+  override def dataType: DataType = resultType
+
+  // Expected input data type.
+  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
+  // new version at planning time (after analysis phase). For now, 
NullType is added at here
+  // to make it resolved when we have cases like `select stddev(null)`.
+  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
+  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+
+  private val resultType = child.dataType match {
+case DecimalType.Fixed(p, s) =>
--- End diff --

I think it should always return Double, because Sqrt() only works with 
Double, also other databases just return Double/float.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-138036449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42062/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-138036447
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-138036422
  
  [Test build #42062 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42062/console)
 for   PR 6297 at commit 
[`6035648`](https://github.com/apache/spark/commit/603564855c47f081988727dd6fbab9da0ab3ff63).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends StddevAgg(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
  * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate 
`
  * `abstract class StddevAgg1(child: Expression) extends UnaryExpression 
with PartialAggregate1 `
  * `case class Stddev(child: Expression) extends StddevAgg1(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
  * `case class ComputePartialStd(child: Expression) extends 
UnaryExpression with AggregateExpression1 `
  * `case class ComputePartialStdFunction (`
  * `case class MergePartialStd(`
  * `case class MergePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-138015991
  
  [Test build #42062 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42062/consoleFull)
 for   PR 6297 at commit 
[`6035648`](https://github.com/apache/spark/commit/603564855c47f081988727dd6fbab9da0ab3ff63).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-138015843
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-138015838
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137802508
  
R style check failure is caused by commit of SPARK-8951


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137796647
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137796648
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42006/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137796643
  
  [Test build #42006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42006/console)
 for   PR 6297 at commit 
[`a81d0fc`](https://github.com/apache/spark/commit/a81d0fc13532c9fdf484e2627f4605ff57f5046c).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends StddevAgg(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
  * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate 
`
  * `abstract class StddevAgg1(child: Expression) extends UnaryExpression 
with PartialAggregate1 `
  * `case class Stddev(child: Expression) extends StddevAgg1(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
  * `case class ComputePartialStd(child: Expression) extends 
UnaryExpression with AggregateExpression1 `
  * `case class ComputePartialStdFunction (`
  * `case class MergePartialStd(`
  * `case class MergePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137796024
  
  [Test build #42006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42006/consoleFull)
 for   PR 6297 at commit 
[`a81d0fc`](https://github.com/apache/spark/commit/a81d0fc13532c9fdf484e2627f4605ff57f5046c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137794776
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-137794798
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135852765
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41748/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135852763
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135852699
  
  [Test build #41748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41748/console)
 for   PR 6297 at commit 
[`0902ceb`](https://github.com/apache/spark/commit/0902ceb5cc00e73da9ceefb38b8d7b6531033fde).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends StddevAgg(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
  * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate 
`
  * `abstract class StddevAgg1(child: Expression) extends UnaryExpression 
with PartialAggregate1 `
  * `case class Stddev(child: Expression) extends StddevAgg1(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
  * `case class ComputePartialStd(child: Expression) extends 
UnaryExpression with AggregateExpression1 `
  * `case class ComputePartialStdFunction (`
  * `case class MergePartialStd(`
  * `case class MergePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135820655
  
  [Test build #41748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41748/consoleFull)
 for   PR 6297 at commit 
[`0902ceb`](https://github.com/apache/spark/commit/0902ceb5cc00e73da9ceefb38b8d7b6531033fde).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135817886
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135817978
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135693660
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41732/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135693657
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135693383
  
  [Test build #41732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41732/console)
 for   PR 6297 at commit 
[`f4c725c`](https://github.com/apache/spark/commit/f4c725c47d179f66971ac63b18fe2c35f7432ffa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends StddevAgg(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
  * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate 
`
  * `abstract class StddevAgg1(child: Expression) extends UnaryExpression 
with PartialAggregate1 `
  * `case class Stddev(child: Expression) extends StddevAgg1(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
  * `case class ComputePartialStd(child: Expression) extends 
UnaryExpression with AggregateExpression1 `
  * `case class ComputePartialStdFunction (`
  * `case class MergePartialStd(`
  * `case class MergePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135657532
  
  [Test build #41732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41732/consoleFull)
 for   PR 6297 at commit 
[`f4c725c`](https://github.com/apache/spark/commit/f4c725c47d179f66971ac63b18fe2c35f7432ffa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135657176
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135657157
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135655316
  
  [Test build #41730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41730/console)
 for   PR 6297 at commit 
[`25425ac`](https://github.com/apache/spark/commit/25425ac742e231bd45b62ee31bf217a94e568e55).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends StddevAgg(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg(child) `
  * `abstract class StddevAgg(child: Expression) extends AlgebraicAggregate 
`
  * `abstract class StddevAgg1(child: Expression) extends UnaryExpression 
with PartialAggregate1 `
  * `case class Stddev(child: Expression) extends StddevAgg1(child) `
  * `case class StddevPop(child: Expression) extends StddevAgg1(child) `
  * `case class StddevSamp(child: Expression) extends StddevAgg1(child) `
  * `case class ComputePartialStd(child: Expression) extends 
UnaryExpression with AggregateExpression1 `
  * `case class ComputePartialStdFunction (`
  * `case class MergePartialStd(child: Expression, isSample: Boolean) 
extends UnaryExpression with AggregateExpression1 `
  * `case class MergePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135655318
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135655321
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41730/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135654368
  
  [Test build #41730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41730/consoleFull)
 for   PR 6297 at commit 
[`25425ac`](https://github.com/apache/spark/commit/25425ac742e231bd45b62ee31bf217a94e568e55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135650454
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-135650541
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-29 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-125978916
  
@JihongMA Will you get time to implement the function based on the new API? 
It will be good if we can merge it before the 1.5 deadline for new features 
(end of this month).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-124724353
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-124724335
  
  [Test build #38399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38399/console)
 for   PR 6297 at commit 
[`87fd2dc`](https://github.com/apache/spark/commit/87fd2dcf3720d25106d265333089120b377de655).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class InternalRow extends Serializable `
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`
  * `class GenericRow(protected[sql] val values: Array[Any]) extends Row `
  * `class GenericInternalRow(protected[sql] val values: Array[Any]) 
extends InternalRow `
  * `class GenericInternalRowWithSchema(values: Array[Any], val schema: 
StructType)`
  * `class GenericMutableRow(val values: Array[Any]) extends MutableRow `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-124722519
  
  [Test build #38399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38399/consoleFull)
 for   PR 6297 at commit 
[`87fd2dc`](https://github.com/apache/spark/commit/87fd2dcf3720d25106d265333089120b377de655).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-124720733
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-124720700
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-124720386
  
Please don't test it yet, need to make change to accomodate API change 
introduced by other JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121797696
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121797646
  
  [Test build #37428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37428/console)
 for   PR 6297 at commit 
[`a4cfe74`](https://github.com/apache/spark/commit/a4cfe747e29868c7a733f4b88329255ab208).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121785283
  
  [Test build #37428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37428/consoleFull)
 for   PR 6297 at commit 
[`a4cfe74`](https://github.com/apache/spark/commit/a4cfe747e29868c7a733f4b88329255ab208).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121784799
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121784886
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121773151
  
  [Test build #37411 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37411/console)
 for   PR 6297 at commit 
[`43fb84f`](https://github.com/apache/spark/commit/43fb84f5a6a55c65b9e262ef9ce573c78073f557).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121773207
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121757829
  
  [Test build #37411 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37411/consoleFull)
 for   PR 6297 at commit 
[`43fb84f`](https://github.com/apache/spark/commit/43fb84f5a6a55c65b9e262ef9ce573c78073f557).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121757701
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121757668
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121727475
  
  [Test build #37398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37398/console)
 for   PR 6297 at commit 
[`1ca4373`](https://github.com/apache/spark/commit/1ca437381c8eae045e234eacd780ffd1d2ff9f48).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121727545
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread JihongMA
Github user JihongMA commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r34717145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: 
AggregateExpression) extends Agg
 if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else 
null
   }
 }
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+  override def nullable: Boolean = true
+  override def dataType: DataType = child.dataType match {
+case DecimalType.Fixed(_, _) | DecimalType.Unlimited =>
+  DecimalType.Unlimited
+case _ =>
+  DoubleType
+  }
+  override def toString: String = s"STDDEV($child)"
+  override def asPartial: SplitEvaluation = {
+val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
+SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
+  }
+  override def newInstance(): StddevFunction = new StddevFunction(child, 
this)
+}
+
+case class ComputePartialStd(child: Expression) extends 
AggregateExpression {
+def this() = this(null)
+
+override def children: Seq[Expression] = child :: Nil
+override def nullable: Boolean = false
+override def dataType: DataType = child.dataType match {
+  case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited)
+  case _ => ArrayType(DoubleType)
+}
+override def toString: String = s"computePartialStddev($child)"
+override def newInstance(): ComputePartialStdFunction =
+  new ComputePartialStdFunction(child, this)
+}
+
+case class CombinePartialStd(child: Expression) extends 
AggregateExpression {
+  def this() = this(null)
+
+  override def children: Seq[Expression] = child:: Nil
+  override def nullable: Boolean = false
+  override def dataType: DataType = child.dataType match {
+case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited
+case _ => DoubleType
+  }
+  override def toString: String = s"CombinePartialStd($child)"
+  override def newInstance(): CombinePartialStdFunction = {
+new CombinePartialStdFunction(child, this)
+  }
+}
+
+case class ComputePartialStdFunction (
+expr: Expression,
+base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this(null, null)  // Required for serialization
+
+  private val computeType = expr.dataType
--- End diff --

the result of ComputePartialStd is array of (partial count, partial avg, 
partial Mk), the input data type of expr will be either Decimal.Unlimited or 
Double depending on if Child is Decimal type or any other numeric type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r34714426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: 
AggregateExpression) extends Agg
 if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else 
null
   }
 }
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+  override def nullable: Boolean = true
+  override def dataType: DataType = child.dataType match {
+case DecimalType.Fixed(_, _) | DecimalType.Unlimited =>
+  DecimalType.Unlimited
+case _ =>
+  DoubleType
+  }
+  override def toString: String = s"STDDEV($child)"
+  override def asPartial: SplitEvaluation = {
+val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
+SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
+  }
+  override def newInstance(): StddevFunction = new StddevFunction(child, 
this)
+}
+
+case class ComputePartialStd(child: Expression) extends 
AggregateExpression {
+def this() = this(null)
+
+override def children: Seq[Expression] = child :: Nil
+override def nullable: Boolean = false
+override def dataType: DataType = child.dataType match {
+  case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited)
+  case _ => ArrayType(DoubleType)
+}
+override def toString: String = s"computePartialStddev($child)"
+override def newInstance(): ComputePartialStdFunction =
+  new ComputePartialStdFunction(child, this)
+}
+
+case class CombinePartialStd(child: Expression) extends 
AggregateExpression {
+  def this() = this(null)
+
+  override def children: Seq[Expression] = child:: Nil
+  override def nullable: Boolean = false
+  override def dataType: DataType = child.dataType match {
+case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited
+case _ => DoubleType
+  }
+  override def toString: String = s"CombinePartialStd($child)"
+  override def newInstance(): CombinePartialStdFunction = {
+new CombinePartialStdFunction(child, this)
+  }
+}
+
+case class ComputePartialStdFunction (
+expr: Expression,
+base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this(null, null)  // Required for serialization
+
+  private val computeType = expr.dataType
--- End diff --

I guess I missed something. What is `expr` at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121707474
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121707803
  
  [Test build #37398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37398/consoleFull)
 for   PR 6297 at commit 
[`1ca4373`](https://github.com/apache/spark/commit/1ca437381c8eae045e234eacd780ffd1d2ff9f48).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121707510
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread JihongMA
Github user JihongMA commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r34706204
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: 
AggregateExpression) extends Agg
 if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else 
null
   }
 }
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+  override def nullable: Boolean = true
+  override def dataType: DataType = child.dataType match {
+case DecimalType.Fixed(_, _) | DecimalType.Unlimited =>
+  DecimalType.Unlimited
+case _ =>
+  DoubleType
+  }
+  override def toString: String = s"STDDEV($child)"
+  override def asPartial: SplitEvaluation = {
+val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
+SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
+  }
+  override def newInstance(): StddevFunction = new StddevFunction(child, 
this)
+}
+
+case class ComputePartialStd(child: Expression) extends 
AggregateExpression {
+def this() = this(null)
+
+override def children: Seq[Expression] = child :: Nil
+override def nullable: Boolean = false
+override def dataType: DataType = child.dataType match {
+  case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited)
+  case _ => ArrayType(DoubleType)
+}
+override def toString: String = s"computePartialStddev($child)"
+override def newInstance(): ComputePartialStdFunction =
+  new ComputePartialStdFunction(child, this)
+}
+
+case class CombinePartialStd(child: Expression) extends 
AggregateExpression {
+  def this() = this(null)
+
+  override def children: Seq[Expression] = child:: Nil
+  override def nullable: Boolean = false
+  override def dataType: DataType = child.dataType match {
+case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited
+case _ => DoubleType
+  }
+  override def toString: String = s"CombinePartialStd($child)"
+  override def newInstance(): CombinePartialStdFunction = {
+new CombinePartialStdFunction(child, this)
+  }
+}
+
+case class ComputePartialStdFunction (
+expr: Expression,
+base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this(null, null)  // Required for serialization
+
+  private val computeType = expr.dataType
--- End diff --

@yhuai  at line 777 the datatype of the expr is casted to the type which 
will be used to compute the partial result.  

  override def dataType: DataType = child.dataType match {
case DecimalType.Fixed(_, _) | DecimalType.Unlimited =>
  DecimalType.Unlimited
case _ =>
  DoubleType
  }
  override def toString: String = s"STDDEV($child)"
  override def asPartial: SplitEvaluation = {
val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
  }




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r34627743
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -761,3 +761,216 @@ case class LastFunction(expr: Expression, base: 
AggregateExpression) extends Agg
 if (result != null) expr.eval(result.asInstanceOf[InternalRow]) else 
null
   }
 }
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+  override def nullable: Boolean = true
+  override def dataType: DataType = child.dataType match {
+case DecimalType.Fixed(_, _) | DecimalType.Unlimited =>
+  DecimalType.Unlimited
+case _ =>
+  DoubleType
+  }
+  override def toString: String = s"STDDEV($child)"
+  override def asPartial: SplitEvaluation = {
+val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
+SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
+  }
+  override def newInstance(): StddevFunction = new StddevFunction(child, 
this)
+}
+
+case class ComputePartialStd(child: Expression) extends 
AggregateExpression {
+def this() = this(null)
+
+override def children: Seq[Expression] = child :: Nil
+override def nullable: Boolean = false
+override def dataType: DataType = child.dataType match {
+  case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited)
+  case _ => ArrayType(DoubleType)
+}
+override def toString: String = s"computePartialStddev($child)"
+override def newInstance(): ComputePartialStdFunction =
+  new ComputePartialStdFunction(child, this)
+}
+
+case class CombinePartialStd(child: Expression) extends 
AggregateExpression {
+  def this() = this(null)
+
+  override def children: Seq[Expression] = child:: Nil
+  override def nullable: Boolean = false
+  override def dataType: DataType = child.dataType match {
+case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited
+case _ => DoubleType
+  }
+  override def toString: String = s"CombinePartialStd($child)"
+  override def newInstance(): CombinePartialStdFunction = {
+new CombinePartialStdFunction(child, this)
+  }
+}
+
+case class ComputePartialStdFunction (
+expr: Expression,
+base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this(null, null)  // Required for serialization
+
+  private val computeType = expr.dataType
--- End diff --

Is `computeType` used for intermediate result? Why it just takes the 
datatype of the expr? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121405267
  
@JihongMA Can you change your test to use fixed precision decimal types (or 
double type) for now? We need a fundamental fix of our decimal types. I think 
we should not block this work on that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121399547
  
  [Test build #37267 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37267/console)
 for   PR 6297 at commit 
[`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121399587
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121394398
  
  [Test build #37267 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37267/consoleFull)
 for   PR 6297 at commit 
[`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121392628
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121392716
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121392552
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121391850
  
#7212 has been merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121385974
  
I will merge https://github.com/apache/spark/pull/7212 soon. So this one 
will be unblocked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121326693
  
Thanks for testing out the code changes. the test failure is caused by 
SPARK-8800 and waiting for the fix to be merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121323391
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121323347
  
  [Test build #37243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37243/console)
 for   PR 6297 at commit 
[`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121318863
  
  [Test build #37243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37243/consoleFull)
 for   PR 6297 at commit 
[`c752054`](https://github.com/apache/spark/commit/c7520546671215b3064ecc131f117f5677c6d9fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121318114
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121318157
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-121072605
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-01 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-117799636
  
the issue introduced by SPARK-8359 was fixed via SPARK-8677,  but causing 
accuracy issue over Decimal data, that issue need to be fixed first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-30 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-117268484
  
sorry, the code is not ready to be merged as I noticed one more issue with 
Decimal type, fixing it and will let you know once I am ready plus code style 
fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-116959293
  
  [Test build #36089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36089/console)
 for   PR 6297 at commit 
[`e46c964`](https://github.com/apache/spark/commit/e46c9648aeede81680ca091ee6860ff7d9766cfa).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] `
  * `case class ComputePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class CombinePartialStd(child: Expression) extends 
AggregateExpression `
  * `case class ComputePartialStdFunction (`
  * `case class CombinePartialStdFunction(`
  * `case class StddevFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-116959304
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-116957360
  
  [Test build #36089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36089/consoleFull)
 for   PR 6297 at commit 
[`e46c964`](https://github.com/apache/spark/commit/e46c9648aeede81680ca091ee6860ff7d9766cfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-116955519
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-116955534
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-29 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-116955337
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-27 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-115977926
  
while preparing the code change to address review comments. I noticed the 
fix for SPARK-8359 is causing issue with decimal type, I put a comment there on 
that JIRA and hoping them to fix it as one of the test case I added for testing 
decimal type is failing due to the fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread JihongMA
Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-115422767
  
I will incorporate the comments shortly. Thank you Michael for reviewing 
the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread sujkh85
Github user sujkh85 commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r33313527
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest {
 val describeResult = Seq(
   Row("count",   "4",   "4"),
   Row("mean","33.0","178.0"),
-  Row("stddev",  "16.583123951777", "10.0"),
+  Row("stddev",  "19.148542155126762", "11.547005383792516"),
--- End diff --


NAVER - http://www.naver.com/


su...@naver.com 님께 보내신 메일  이 다음과 같은 이유로 전송 
실패했습니다.



받는 사람이 회원님의 메일을 수신차단 하였습니다. 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread JihongMA
Github user JihongMA commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r33313504
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest {
 val describeResult = Seq(
   Row("count",   "4",   "4"),
   Row("mean","33.0","178.0"),
-  Row("stddev",  "16.583123951777", "10.0"),
+  Row("stddev",  "19.148542155126762", "11.547005383792516"),
--- End diff --

in the case it is  (count -1)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread JihongMA
Github user JihongMA commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r33313463
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest {
 val describeResult = Seq(
   Row("count",   "4",   "4"),
   Row("mean","33.0","178.0"),
-  Row("stddev",  "16.583123951777", "10.0"),
+  Row("stddev",  "19.148542155126762", "11.547005383792516"),
--- End diff --

Yes, the answer was wrong before, the calculation should divide n-1 not n 
(number of rows)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-22 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/6297#issuecomment-114325358
  
Thanks for working on this!  I made some style comments, mostly from stuff 
you can find here: 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

One thing I'll add is that we are hoping to do a pretty large refactoring 
of aggregates in 
[SPARK-4233](https://issues.apache.org/jira/browse/SPARK-4233).  If you don't 
have time to update now, it might be easier to close this for now and reopen 
once that refactoring is done (hopefully in a few weeks).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r33001725
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest {
 val describeResult = Seq(
   Row("count",   "4",   "4"),
   Row("mean","33.0","178.0"),
-  Row("stddev",  "16.583123951777", "10.0"),
+  Row("stddev",  "19.148542155126762", "11.547005383792516"),
--- End diff --

Were we calculating the wrong answer before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r33001587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -746,3 +746,219 @@ case class LastFunction(expr: Expression, base: 
AggregateExpression) extends Agg
 if (result != null) expr.eval(result.asInstanceOf[Row]) else null
   }
 }
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+
+  override def nullable: Boolean = true
+  override def dataType: DataType = child.dataType match {
+case DecimalType.Fixed(_, _) | DecimalType.Unlimited  => 
+  DecimalType.Unlimited 
+case _=> 
+  DoubleType
+  }
+  override def toString: String = s"STDDEV($child)"
+  override def asPartial: SplitEvaluation = {
+val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
+SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
+  }
+  override def newInstance(): StddevFunction = new StddevFunction(child, 
this)
+}
+
+case class ComputePartialStd(child: Expression) extends 
AggregateExpression {
+def this() = this(null)
+
+override def children: Seq[Expression] = child :: Nil
+override def nullable: Boolean = false
+override def dataType: DataType = child.dataType match {
+  case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited)
+  case _ => ArrayType(DoubleType)
+}
+override def toString: String = s"computePartialStddev($child)"
+override def newInstance(): ComputePartialStdFunction = 
+  new ComputePartialStdFunction(child, this)
+}
+
+case class CombinePartialStd(child: Expression) extends 
AggregateExpression {
+  def this() = this(null)
+
+  override def children: Seq[Expression] = child:: Nil
+  override def nullable: Boolean = false
+  override def dataType: DataType = child.dataType match {
+case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited
+case _ => DoubleType
+  } 
+  override def toString: String = s"CombinePartialStd($child)"
+  override def newInstance(): CombinePartialStdFunction = {
+new CombinePartialStdFunction(child, this)
+  }
+}
+
+case class ComputePartialStdFunction (
+  expr: Expression,
+  base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this(null, null)  // Required for serialization
+
+  private val computeType  =  expr.dataType
+  private val zero = Cast(Literal(0), computeType)
+  private var partialCount: Long = 0L
+
+  // the mean of data processed so far
+  private val partialAvg :MutableLiteral = MutableLiteral(zero.eval(null), 
computeType)
+
+  // update average based on this formula:
+  // avg = avg + (value - avg)/count
+  private def avgAddFunction (value: Literal) : Expression= {
+val delta = Subtract(Cast(value, computeType), partialAvg)
+Add (partialAvg, Divide(delta, Cast(Literal(partialCount), 
computeType)))
+  }
+
+  // the sum of squares of difference from mean
+  private val partialMk :MutableLiteral = MutableLiteral(zero.eval(null), 
computeType)
+
+  // update sum of square of difference from mean based on following 
formula:
+  // Mk = Mk + (value - preAvg) * (value - updatedAvg)
+  private def mkAddFunction(value: Literal, prePartialAvg: MutableLiteral) 
: Expression = {
+val delta1 = Subtract(Cast(value, computeType), prePartialAvg)
+val delta2 = Subtract(Cast(value, computeType), partialAvg)
+Add(partialMk, Multiply(delta1, delta2))
+  }
+
+  override def update(input: Row): Unit = {
+val evaluatedExpr = expr.eval(input)
+if (evaluatedExpr != null) {
+  val exprValue = Literal.create(evaluatedExpr, expr.dataType)
+  val prePartialAvg = partialAvg.copy()
+  partialCount += 1
+  partialAvg.update(avgAddFunction(exprValue), input)
+  partialMk.update(mkAddFunction(exprValue, prePartialAvg), input)
+}
+  }
+
+  override def eval(input: Row): Any = {
+Seq(Cast(Literal(partialCount), computeType).eval(null), 
+partialAvg.eval(null), 
+partialMk.eval(null))
+  }
+}
+
+case class CombinePartialStdFunction(
+  expr: Expression, 
--- End diff --

indent 4 spaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if th

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/6297#discussion_r33001595
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -746,3 +746,219 @@ case class LastFunction(expr: Expression, base: 
AggregateExpression) extends Agg
 if (result != null) expr.eval(result.asInstanceOf[Row]) else null
   }
 }
+
+// Compute standard deviation based on online algorithm specified here:
+// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+case class Stddev(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+
+  override def nullable: Boolean = true
+  override def dataType: DataType = child.dataType match {
+case DecimalType.Fixed(_, _) | DecimalType.Unlimited  => 
+  DecimalType.Unlimited 
+case _=> 
+  DoubleType
+  }
+  override def toString: String = s"STDDEV($child)"
+  override def asPartial: SplitEvaluation = {
+val partialStd = Alias(ComputePartialStd(Cast(child, dataType)), 
"PartialStddev")()
+SplitEvaluation(CombinePartialStd(partialStd.toAttribute), partialStd 
:: Nil)
+  }
+  override def newInstance(): StddevFunction = new StddevFunction(child, 
this)
+}
+
+case class ComputePartialStd(child: Expression) extends 
AggregateExpression {
+def this() = this(null)
+
+override def children: Seq[Expression] = child :: Nil
+override def nullable: Boolean = false
+override def dataType: DataType = child.dataType match {
+  case DecimalType.Unlimited => ArrayType(DecimalType.Unlimited)
+  case _ => ArrayType(DoubleType)
+}
+override def toString: String = s"computePartialStddev($child)"
+override def newInstance(): ComputePartialStdFunction = 
+  new ComputePartialStdFunction(child, this)
+}
+
+case class CombinePartialStd(child: Expression) extends 
AggregateExpression {
+  def this() = this(null)
+
+  override def children: Seq[Expression] = child:: Nil
+  override def nullable: Boolean = false
+  override def dataType: DataType = child.dataType match {
+case ArrayType(DecimalType.Unlimited, _) => DecimalType.Unlimited
+case _ => DoubleType
+  } 
+  override def toString: String = s"CombinePartialStd($child)"
+  override def newInstance(): CombinePartialStdFunction = {
+new CombinePartialStdFunction(child, this)
+  }
+}
+
+case class ComputePartialStdFunction (
+  expr: Expression,
+  base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this(null, null)  // Required for serialization
+
+  private val computeType  =  expr.dataType
+  private val zero = Cast(Literal(0), computeType)
+  private var partialCount: Long = 0L
+
+  // the mean of data processed so far
+  private val partialAvg :MutableLiteral = MutableLiteral(zero.eval(null), 
computeType)
+
+  // update average based on this formula:
+  // avg = avg + (value - avg)/count
+  private def avgAddFunction (value: Literal) : Expression= {
+val delta = Subtract(Cast(value, computeType), partialAvg)
+Add (partialAvg, Divide(delta, Cast(Literal(partialCount), 
computeType)))
+  }
+
+  // the sum of squares of difference from mean
+  private val partialMk :MutableLiteral = MutableLiteral(zero.eval(null), 
computeType)
+
+  // update sum of square of difference from mean based on following 
formula:
+  // Mk = Mk + (value - preAvg) * (value - updatedAvg)
+  private def mkAddFunction(value: Literal, prePartialAvg: MutableLiteral) 
: Expression = {
+val delta1 = Subtract(Cast(value, computeType), prePartialAvg)
+val delta2 = Subtract(Cast(value, computeType), partialAvg)
+Add(partialMk, Multiply(delta1, delta2))
+  }
+
+  override def update(input: Row): Unit = {
+val evaluatedExpr = expr.eval(input)
+if (evaluatedExpr != null) {
+  val exprValue = Literal.create(evaluatedExpr, expr.dataType)
+  val prePartialAvg = partialAvg.copy()
+  partialCount += 1
+  partialAvg.update(avgAddFunction(exprValue), input)
+  partialMk.update(mkAddFunction(exprValue, prePartialAvg), input)
+}
+  }
+
+  override def eval(input: Row): Any = {
+Seq(Cast(Literal(partialCount), computeType).eval(null), 
+partialAvg.eval(null), 
+partialMk.eval(null))
+  }
+}
+
+case class CombinePartialStdFunction(
+  expr: Expression, 
+  base: AggregateExpression
+) extends AggregateFunction {
+  def this() = this (null, null) // Required for serialization
+
+  private val computeType = expr.dataType match {
+case ArrayType(DecimalType.U

  1   2   >