[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13403


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68188989
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   * @since 2.1.0
--- End diff --

Sure. No doubt about that.
I'm just curious whether we need to replace them someday.
Anyway, I'm almost fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68188408
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   * @since 2.1.0
--- End diff --

It's valid, though I don't think we generally use them. I would just use 
the `@Since` tag along in Scala, since it is processed differently on purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68185536
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   * @since 2.1.0
--- End diff --

Ur, by the way, some scala code also use java since annotation.
- SparkSession.scala, StreamingQuery.scala, StreamingQueryManager.scala
Is it they designed for Java compatible?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68184739
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   * @since 2.1.0
--- End diff --

Oh, I did a stupid mistake again. Then, the correct form is 
`@Since("2.1.0")`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68184371
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   * @since 2.1.0
--- End diff --

We need to use the Scala `@Since` annotation. Have a look how this is 
annotated elsewhere in Scala. For Java, yeah, we only use the `@since` javadoc 
tag because that's all that is available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68008205
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,20 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   */
+  def popStdev(): Double = self.withScope {
--- End diff --

Sorry, I fixed the previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68007866
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,20 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   */
+  def popStdev(): Double = self.withScope {
--- End diff --

I agree with you. I'll add `@since 2.0.0` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r68007343
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -74,6 +74,20 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
   }
 
   /**
+   * Compute the population standard deviation of this RDD's elements.
+   */
+  def popStdev(): Double = self.withScope {
--- End diff --

Oh, I forgot: these need `@Since` annotations. Hm, I'm not clear whether we 
should merge this for 2.0.0 now. I wouldn't mind but we have an RC now. It's 
not super urgent. I might mark this as since 2.1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67931444
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(rdd.variance - rdd.popVariance) < 1e-14)
+assert(abs(rdd.stdev - rdd.popStdev) < 1e-14)
+assert(abs(2.0 - rdd.sampleVariance) < 0.01)
+assert(abs(1.41 - rdd.sampleStdev) < 0.01)
--- End diff --

Right. I'll fix that like `assert(abs(Math.sqrt(2.0) - rdd.sampleStdev) < 
1e-14)`, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67931118
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(rdd.variance - rdd.popVariance) < 1e-14)
+assert(abs(rdd.stdev - rdd.popStdev) < 1e-14)
+assert(abs(2.0 - rdd.sampleVariance) < 0.01)
--- End diff --

Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67928879
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(rdd.variance - rdd.popVariance) < 1e-14)
+assert(abs(rdd.stdev - rdd.popStdev) < 1e-14)
+assert(abs(2.0 - rdd.sampleVariance) < 0.01)
--- End diff --

The tolerance should be smaller too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67928887
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(rdd.variance - rdd.popVariance) < 1e-14)
+assert(abs(rdd.stdev - rdd.popStdev) < 1e-14)
+assert(abs(2.0 - rdd.sampleVariance) < 0.01)
+assert(abs(1.41 - rdd.sampleStdev) < 0.01)
--- End diff --

Is it `sqrt(2)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67924857
  
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -733,8 +733,10 @@ public Boolean call(Double x) {
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(6.2, rdd.variance(), 0.01);
+assertEquals(rdd.variance(), rdd.popVariance());
 assertEquals(7.46667, rdd.sampleVariance(), 0.01);
 assertEquals(2.49444, rdd.stdev(), 0.01);
+assertEquals(rdd.stdev(), rdd.popStdev(), 0.01);
--- End diff --

I think it should be approximate equality but with a very small tolerance, 
e.g. 1e-14. Both calls trigger reduce jobs and we don't have guarantees on the 
ordering or reduce operations, which might lead to small numerical errors. 
Maybe it doesn't apply to the test case here, but it is still a best practice 
to not test strict equality of floating-point numbers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67922737
  
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -733,8 +733,10 @@ public Boolean call(Double x) {
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(6.2, rdd.variance(), 0.01);
+assertEquals(rdd.variance(), rdd.popVariance());
 assertEquals(7.46667, rdd.sampleVariance(), 0.01);
 assertEquals(2.49444, rdd.stdev(), 0.01);
+assertEquals(rdd.stdev(), rdd.popStdev(), 0.01);
--- End diff --

Oops. Sorry, I'll review my code again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67922274
  
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -733,8 +733,10 @@ public Boolean call(Double x) {
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(6.2, rdd.variance(), 0.01);
+assertEquals(rdd.variance(), rdd.popVariance());
 assertEquals(7.46667, rdd.sampleVariance(), 0.01);
 assertEquals(2.49444, rdd.stdev(), 0.01);
+assertEquals(rdd.stdev(), rdd.popStdev(), 0.01);
--- End diff --

This still asserts approximate equality, when it should be exactly equal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67922312
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(rdd.variance - rdd.popVariance) < 0.01)
--- End diff --

Same here, should be exactly equal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67904001
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -47,12 +47,12 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
 stats().mean
   }
 
-  /** Compute the variance of this RDD's elements. */
+  /** Compute the population variance of this RDD's elements. */
   def variance(): Double = self.withScope {
 stats().variance
   }
 
-  /** Compute the standard deviation of this RDD's elements. */
+  /** Compute the population standard deviation of this RDD's elements. */
   def stdev(): Double = self.withScope {
--- End diff --

Oh, for this, I want to keep the current code for code consistency since 
all other functions `variance/mean/sampleDev/...` has the same code, 
`self.withScope { stats().~~~ }`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67903762
  
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -733,8 +733,10 @@ public Boolean call(Double x) {
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(6.2, rdd.variance(), 0.01);
+assertEquals(rdd.variance(), rdd.popVariance(), 0.01);
--- End diff --

I agree!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67822629
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -47,12 +47,12 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
 stats().mean
   }
 
-  /** Compute the variance of this RDD's elements. */
+  /** Compute the population variance of this RDD's elements. */
   def variance(): Double = self.withScope {
 stats().variance
   }
 
-  /** Compute the standard deviation of this RDD's elements. */
+  /** Compute the population standard deviation of this RDD's elements. */
   def stdev(): Double = self.withScope {
--- End diff --

One last tweak: these methods should call the "pop" equivalents rather than 
implement the same thing twice, even though it's so simple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67822577
  
--- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
@@ -733,8 +733,10 @@ public Boolean call(Double x) {
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(20/6.0, rdd.mean(), 0.01);
 assertEquals(6.2, rdd.variance(), 0.01);
+assertEquals(rdd.variance(), rdd.popVariance(), 0.01);
--- End diff --

I think these need to assert exact equality; there's no reason that they 
would ever be different at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67597273
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(1.0 - rdd.popVariance) < 0.01)
--- End diff --

That sounds better since it's more explicit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67597228
  
--- Diff: core/src/main/scala/org/apache/spark/util/StatCounter.scala ---
@@ -125,9 +128,12 @@ class StatCounter(values: TraversableOnce[Double]) 
extends Serializable {
 }
   }
 
-  /** Return the standard deviation of the values. */
+  /** Return the population standard deviation of the values. */
   def stdev: Double = math.sqrt(variance)
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-18 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67597157
  
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite with 
SharedSparkContext with Priva
 assert(abs(6.0/2 - rdd.mean) < 0.01)
 assert(abs(1.0 - rdd.variance) < 0.01)
 assert(abs(1.0 - rdd.stdev) < 0.01)
+assert(abs(1.0 - rdd.popVariance) < 0.01)
--- End diff --

Perhaps assert that variance == popVariance instead and likewise for stdev? 
because really we need them to be identical. Otherwise LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-18 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67597151
  
--- Diff: core/src/main/scala/org/apache/spark/util/StatCounter.scala ---
@@ -125,9 +128,12 @@ class StatCounter(values: TraversableOnce[Double]) 
extends Serializable {
 }
   }
 
-  /** Return the standard deviation of the values. */
+  /** Return the population standard deviation of the values. */
   def stdev: Double = math.sqrt(variance)
--- End diff --

For consistency, this can call popStdev?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13403#discussion_r67567009
  
--- Diff: core/src/main/scala/org/apache/spark/util/StatCounter.scala ---
@@ -104,8 +104,11 @@ class StatCounter(values: TraversableOnce[Double]) 
extends Serializable {
 
   def min: Double = minValue
 
-  /** Return the variance of the values. */
-  def variance: Double = {
+  /** Return the population variance of the values. */
+  def variance: Double = popVariance
--- End diff --

Here is the only changed part of legacy code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org