[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-06 Thread MechCoder
Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69770334
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val varianceData: RDD[LabeledPoint] = TreeTests.varianceData(sc)
+val varianceDF = TreeTests.setMetadata(varianceData, Map.empty[Int, 
Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val transformVarDF = dt.fit(varianceDF).transform(varianceDF)
+val calculatedVariances = 
transformVarDF.select(dt.getVarianceCol).collect().map {
--- End diff --

Ah, I see. Thanks for the note. I wasn't familiar with the Dataset API till 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69705146
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val varianceData: RDD[LabeledPoint] = TreeTests.varianceData(sc)
+val varianceDF = TreeTests.setMetadata(varianceData, Map.empty[Int, 
Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val transformVarDF = dt.fit(varianceDF).transform(varianceDF)
+val calculatedVariances = 
transformVarDF.select(dt.getVarianceCol).collect().map {
--- End diff --

@jaceklaskowski I think it's better to do such improvement in a separate PR 
which will involve lots of similar changes for all test suites? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13981


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-06 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69703134
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val varianceData: RDD[LabeledPoint] = TreeTests.varianceData(sc)
+val varianceDF = TreeTests.setMetadata(varianceData, Map.empty[Int, 
Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val transformVarDF = dt.fit(varianceDF).transform(varianceDF)
+val calculatedVariances = 
transformVarDF.select(dt.getVarianceCol).collect().map {
--- End diff --

Before `collect` so you work with type-safe `Dataset` (not a `DataFrame` 
which is `Dataset[Row]`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-05 Thread MechCoder
Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69660324
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val varianceData: RDD[LabeledPoint] = TreeTests.varianceData(sc)
+val varianceDF = TreeTests.setMetadata(varianceData, Map.empty[Int, 
Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val transformVarDF = dt.fit(varianceDF).transform(varianceDF)
+val calculatedVariances = 
transformVarDF.select(dt.getVarianceCol).collect().map {
--- End diff --

You mean after the collect?

It fails with `is not a member of Seq[org.apache.spark.sql.Row]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69631781
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +97,25 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val varianceData: RDD[LabeledPoint] = TreeTests.varianceData(sc)
+val varianceDF = TreeTests.setMetadata(varianceData, Map.empty[Int, 
Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val transformVarDF = dt.fit(varianceDF).transform(varianceDF)
+val calculatedVariances = 
transformVarDF.select(dt.getVarianceCol).collect().map {
--- End diff --

I wonder if `toDS` worked here and you'd have `variance` "simpler".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-04 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69422844
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

You are right, please ignore my last comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-04 Thread MechCoder
Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69419165
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

If you would like to reduce the number of warnings, then this should be 
kept as is (unless I am misunderstanding something)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-03 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69410786
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -36,10 +37,21 @@ class DecisionTreeRegressorSuite
 
   private var categoricalDataPointsRDD: RDD[LabeledPoint] = _
 
+  private var toyData: RDD[LabeledPoint] = _
+
   override def beforeAll() {
 super.beforeAll()
+
 categoricalDataPointsRDD =
   
sc.parallelize(OldDecisionTreeSuite.generateCategoricalDataPoints().map(_.asML))
+toyData = sc.parallelize(Seq(
--- End diff --

Move ```toyData``` to ```TreeTests```. You can refer [Feature importance 
with toy 
data](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala#L108).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-03 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69410553
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

I'd like to remove the explicit setting since the default value(32) meets 
your needs. We want to make Jenkins logging clean and reduce the number of 
warnings if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread MechCoder
Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69363260
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

"Explicit is better than implicit" ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69350493
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,22 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val calculatedVariances = 
dt.fit(toyDF).transform(toyDF).select("variance").collect().map {
--- End diff --

Can we use `.select(dt.getVarianceCol)` instead? Also, the closing brace 
should go on the next line, like:

```scala
.map {
  case Row(...)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69348071
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

Ok, not sure it needs to be set since the test is fine without it. The 
warning is more just helpful info logging. I'm fine to leave it since it 
doesn't matter either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread MechCoder
Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69333966
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

I verified and my intuition was correct. I get this warning for the default 
setting:

 WARN DecisionTreeMetadata: DecisionTree reducing maxBins from 32 to 6 (= 
number of training instances)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread MechCoder
Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69332114
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

Because there are 6 datapoints, and I want each datapoint to be a split.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69298953
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val expectVariances = 
dt.fit(toyDF).transform(toyDF).select("variance").collect().map {
+  case Row(variance: Double) => variance }
+val trueVariances = Array(0.667, 0.667, 0.667, 2.667, 2.667, 2.667)
+trueVariances.zip(expectVariances).foreach(x => x._1 ~== x._2 absTol 
1e-3)
--- End diff --

Although this technically works, it is less confusing if use `assert` and 
unpack the tuple. Like

```scala
...foreach { case (actual, expected) =>
  assert(actual ~== expected absTol 1e-3)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69298648
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
+  .setSeed(0)
+val expectVariances = 
dt.fit(toyDF).transform(toyDF).select("variance").collect().map {
--- End diff --

`expectedVariances` and `trueVariances` are mixed up here. Expected should 
be the theoretical value computed below. Also, it would be good to leave a 
comment explaining where those expected values came from.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-01 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69298384
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

Not sure why we need to set maxBins here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-06-29 Thread MechCoder
GitHub user MechCoder opened a pull request:

https://github.com/apache/spark/pull/13981

[SPARK-16307] [ML] Add test to verify the predicted variances of a DT on 
toy data

## What changes were proposed in this pull request?

The current tests assumes that `impurity.calculate()` returns the variance 
correctly. It should be better to make the tests independent of this 
assumption. In other words verify that the variance computed equals the 
variance computed manually on a small tree.

## How was this patch tested?

The patch is a test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MechCoder/spark dt_variance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13981






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org