Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19666
OK. I will waiting @smurching to merge split parts of #19433 get merged
first, and then I will update this PR.
---
-
To
Github user smurching commented on the issue:
https://github.com/apache/spark/pull/19666
Discussed with @jkbradley, I'll split up #19433 so that the parts of it
that'd potentially conflict with this PR (refactoring RandomForest.scala into
utility classes) can be merged first.
---
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/19666
Btw, this is going to conflict with
https://github.com/apache/spark/pull/19433 a lot. @WeichenXu123 and @smurching
have you planned for merging one before the other?
---
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/19666
Thank you, @WeichenXu123 . You can also use the condition "include the
first bin" to filter left splits. Perhaps it is better.
---
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19666
@facaiy Your idea looks also reasonable. So we can use the condition
"exclude the first bin" to do the pruning (filter out the other half symmetric
splits). This condition looks simpler than
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/19666
In fact, I'm not sure whether the idea is right, so no hesitate to correct
me. I assume the algorithm requires O(N^2) complexity.
---
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/19666
Hi, I write a demo with python. I'll be happy if it could be useful.
For N bins, say `[x_1, x_2, ..., x_N]`, since all its splits contain either
`x_1` or not, so we can choose the half
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19666
Also cc @smurching Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19666
@facaiy Thanks for your review! I put more explanation on the design
purpose of `traverseUnorderedSplits`. But, if you have better solution, no
hesitate to tell me!
---
Github user facaiy commented on the issue:
https://github.com/apache/spark/pull/19666
I believe that unordered features will benefit a lot from the idea, however
I have two questions:
1. I'm a little confused by 964L in `traverseUnorderedSplits`. Is it a
backtracking algorithm?
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19666
@smurching I guess if iterating over gray code will have higher time
complexity O(n * 2^n), (Not very sure, maybe there's some high efficient
algos?) , the recursive traverse in my PR only
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19666
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83481/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19666
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19666
**[Test build #83481 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83481/testReport)**
for PR 19666 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19666
**[Test build #83481 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83481/testReport)**
for PR 19666 at commit
15 matches
Mail list logo