[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...
Github user facaiy closed the pull request at: https://github.com/apache/spark/pull/17503 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17503#discussion_r113360409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -61,6 +61,8 @@ import org.apache.spark.mllib.tree.impurity.{Entropy, Gini, Impurity, Variance} * @param subsamplingRate Fraction of the training data used for learning decision tree. * @param useNodeIdCache If this is true, instead of passing trees to executors, the algorithm will * maintain a separate RDD of node Id cache for each row. + * @param canMergeChildren Merge pairs of leaf nodes of the same parent which --- End diff -- A new parameter is added in Strategy class, which fails Mima tests. How to deal with it? ```bash [error] * synthetic method $default$13()Int in object org.apache.spark.mllib.tree.configuration.Strategy has a different result type in current version, where it is Boolean rather than Int ``` [see failed logs](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3675/consoleFull) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/17503 [SPARK-3159][MLlib] Check for reducible DecisionTree add canMergeChildren param: find the pairs of leave of the same parent which output the same prediction, and merge them. ## How was this patch tested? 1. [x] add unit test: verify whether implementation is correct. 2. [ ] add unit test: verity whether setCanMergeChildren works. 3. [ ] perhaps we need create a sample which can produce a reducible tree, and test it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/facaiy/spark CLN/check_for_reducible_decision_tree Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17503.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17503 commit fab2a0e5a3c4db8beeaa78d98253d11e408f3b56 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-03-31T02:33:38Z TST: create new test suite commit f5d52cce500290165ac7d8bad5aa38041ed21c54 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-03-31T02:35:26Z TST: helper method for construcing binary tree commit b9248b7ae2e1048d93e44e2d3687c2f9fd286ce8 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-03-31T07:09:51Z TST: helper method, show tree node info commit be12f4f23a5fd53870bc97b6cbdb8fa0a094f2c1 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-03-31T07:21:41Z TST: helper method, check if pairs of leave with same prediction exists commit b52420201576610613c782146dc9d6c2dc6ebb0c Author: é¢åæï¼Yan Facaiï¼ Date: 2017-03-31T07:28:28Z TST: helper method for modifying nodes commit 98a73f952d1a199cf581cde2636d6dc831ae4ee3 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-03-31T07:41:46Z ENH: merge the pairs of leave with same prediction of same parent commit 632325d0e0d45d7fe9325686f90dbdc64b149960 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-04-01T01:10:07Z ENH: add mergeLeave param in Strategy commit 12052958d30d015be537fbd1169da4406869fb3d Author: é¢åæï¼Yan Facaiï¼ Date: 2017-04-01T01:18:50Z ENH: support mergeChild when training commit 434c762de76be2f1b4ec939ccba9c2ecb45c1c04 Author: é¢åæï¼Yan Facaiï¼ Date: 2017-04-01T02:48:37Z ENH: add canMergeChildren param in DecisionTreeParams commit 5162552a8db92283a514e94adeef439f6fb8f80e Author: é¢åæï¼Yan Facaiï¼ Date: 2017-04-01T02:54:57Z ENH: add set method in tree classifier commit 21b1a851c89cd0a060720503bdc4a9441155236b Author: é¢åæï¼Yan Facaiï¼ Date: 2017-04-01T04:19:07Z ENH: stat: merge counts of each tree commit 25b712a37bbc31cc9b8ff2b6330d79fd437cb17c Author: é¢åæï¼Yan Facaiï¼ Date: 2017-04-01T06:16:01Z BUG: depth=0 tree has none of children --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org