[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2018-03-03 Thread facaiy
Github user facaiy closed the pull request at:

https://github.com/apache/spark/pull/17503


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2017-04-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17503#discussion_r113360409
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 
---
@@ -61,6 +61,8 @@ import org.apache.spark.mllib.tree.impurity.{Entropy, 
Gini, Impurity, Variance}
  * @param subsamplingRate Fraction of the training data used for learning 
decision tree.
  * @param useNodeIdCache If this is true, instead of passing trees to 
executors, the algorithm will
  *   maintain a separate RDD of node Id cache for each 
row.
+ * @param canMergeChildren Merge pairs of leaf nodes of the same parent 
which
--- End diff --

A new parameter is added in Strategy class, which fails Mima tests. How to 
deal with it?

```bash
[error]  * synthetic method $default$13()Int in object 
org.apache.spark.mllib.tree.configuration.Strategy has a different result type 
in current version, where it is Boolean rather than Int
```
[see failed 
logs](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3675/consoleFull)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2017-04-01 Thread facaiy
GitHub user facaiy opened a pull request:

https://github.com/apache/spark/pull/17503

[SPARK-3159][MLlib] Check for reducible DecisionTree

add canMergeChildren param: find the pairs of leave of the same parent 
which output the same prediction, and merge them.

## How was this patch tested?

1. [x] add unit test: verify whether implementation is correct.
2. [ ] add unit test: verity whether setCanMergeChildren works.
3. [ ] perhaps we need create a sample which can produce a reducible tree, 
and test it. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/facaiy/spark 
CLN/check_for_reducible_decision_tree

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17503.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17503


commit fab2a0e5a3c4db8beeaa78d98253d11e408f3b56
Author: 颜发才(Yan Facai) 
Date:   2017-03-31T02:33:38Z

TST: create new test suite

commit f5d52cce500290165ac7d8bad5aa38041ed21c54
Author: 颜发才(Yan Facai) 
Date:   2017-03-31T02:35:26Z

TST: helper method for construcing binary tree

commit b9248b7ae2e1048d93e44e2d3687c2f9fd286ce8
Author: 颜发才(Yan Facai) 
Date:   2017-03-31T07:09:51Z

TST: helper method, show tree node info

commit be12f4f23a5fd53870bc97b6cbdb8fa0a094f2c1
Author: 颜发才(Yan Facai) 
Date:   2017-03-31T07:21:41Z

TST: helper method, check if pairs of leave with same prediction exists

commit b52420201576610613c782146dc9d6c2dc6ebb0c
Author: 颜发才(Yan Facai) 
Date:   2017-03-31T07:28:28Z

TST: helper method for modifying nodes

commit 98a73f952d1a199cf581cde2636d6dc831ae4ee3
Author: 颜发才(Yan Facai) 
Date:   2017-03-31T07:41:46Z

ENH: merge the pairs of leave with same prediction of same parent

commit 632325d0e0d45d7fe9325686f90dbdc64b149960
Author: 颜发才(Yan Facai) 
Date:   2017-04-01T01:10:07Z

ENH: add mergeLeave param in Strategy

commit 12052958d30d015be537fbd1169da4406869fb3d
Author: 颜发才(Yan Facai) 
Date:   2017-04-01T01:18:50Z

ENH: support mergeChild when training

commit 434c762de76be2f1b4ec939ccba9c2ecb45c1c04
Author: 颜发才(Yan Facai) 
Date:   2017-04-01T02:48:37Z

ENH: add canMergeChildren param in DecisionTreeParams

commit 5162552a8db92283a514e94adeef439f6fb8f80e
Author: 颜发才(Yan Facai) 
Date:   2017-04-01T02:54:57Z

ENH: add set method in tree classifier

commit 21b1a851c89cd0a060720503bdc4a9441155236b
Author: 颜发才(Yan Facai) 
Date:   2017-04-01T04:19:07Z

ENH: stat: merge counts of each tree

commit 25b712a37bbc31cc9b8ff2b6330d79fd437cb17c
Author: 颜发才(Yan Facai) 
Date:   2017-04-01T06:16:01Z

BUG: depth=0 tree has none of children




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org