[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 OK. I will waiting @smurching to merge split parts of #19433 get merged first, and then I will update this PR. --- - To

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-13 Thread smurching
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19666 Discussed with @jkbradley, I'll split up #19433 so that the parts of it that'd potentially conflict with this PR (refactoring RandomForest.scala into utility classes) can be merged first. ---

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19666 Btw, this is going to conflict with https://github.com/apache/spark/pull/19433 a lot. @WeichenXu123 and @smurching have you planned for merging one before the other? ---

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-09 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/19666 Thank you, @WeichenXu123 . You can also use the condition "include the first bin" to filter left splits. Perhaps it is better. ---

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @facaiy Your idea looks also reasonable. So we can use the condition "exclude the first bin" to do the pruning (filter out the other half symmetric splits). This condition looks simpler than

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-09 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/19666 In fact, I'm not sure whether the idea is right, so no hesitate to correct me. I assume the algorithm requires O(N^2) complexity. ---

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-09 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/19666 Hi, I write a demo with python. I'll be happy if it could be useful. For N bins, say `[x_1, x_2, ..., x_N]`, since all its splits contain either `x_1` or not, so we can choose the half

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 Also cc @smurching Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @facaiy Thanks for your review! I put more explanation on the design purpose of `traverseUnorderedSplits`. But, if you have better solution, no hesitate to tell me! ---

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-07 Thread facaiy
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/19666 I believe that unordered features will benefit a lot from the idea, however I have two questions: 1. I'm a little confused by 964L in `traverseUnorderedSplits`. Is it a backtracking algorithm?

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @smurching I guess if iterating over gray code will have higher time complexity O(n * 2^n), (Not very sure, maybe there's some high efficient algos?) , the recursive traverse in my PR only

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19666 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83481/ Test FAILed. ---

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19666 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19666 **[Test build #83481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83481/testReport)** for PR 19666 at commit

[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...

2017-11-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19666 **[Test build #83481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83481/testReport)** for PR 19666 at commit