[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108050269
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,22 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("SAPRK-20043: " +
--- End diff --

oops, copy & paste. thanks for correction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108050236
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,22 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("SAPRK-20043: " +
--- End diff --

Sorry, I had a typo: SPARK, not SAPRK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108050110
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+
+val categoricalData: DataFrame =
+  TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
+
+// BUG: see SPARK-20043
+val dt = new DecisionTreeClassifier().setImpurity("Gini")
--- End diff --

set maxDepth = 2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108050099
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108050103
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+
+val categoricalData: DataFrame =
+  TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
+
+// BUG: see SPARK-20043
--- End diff --

removed the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108050101
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+
+val categoricalData: DataFrame =
+  TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108048393
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
--- End diff --

To be more specific, how about:
"SAPRK-20043: ImpurityCalculator builder fails for uppercase impurity type 
Gini in model read/write"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108048376
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+
+val categoricalData: DataFrame =
+  TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
+
+// BUG: see SPARK-20043
--- End diff --

I'd put the JIRA number in the test title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108048375
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+
+val categoricalData: DataFrame =
+  TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
--- End diff --

To simplify this, you can write ```TreeTests.setMetadata(rdd, 
Map.empty[Int, Int], numClasses = 2)```.
Since this is not testing categorical features, there's no need to throw 
them in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel can't recongn...

2017-03-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/17407#discussion_r108048398
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("read/write: ImpurityCalculator builder did not recognize impurity 
type: Gini") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+
+val categoricalData: DataFrame =
+  TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
+
+// BUG: see SPARK-20043
+val dt = new DecisionTreeClassifier().setImpurity("Gini")
--- End diff --

To make this faster, set maxDepth = 2 (something small)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org