GitHub user facaiy reopened a pull request:

    https://github.com/apache/spark/pull/17383

    [SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data

    ## What changes were proposed in this pull request?
    
    DecisionTree should take advantage of sparse feature vectors. Aggregation 
over training data could handle the empty/zero-valued data elements more 
efficiently.
    
    
    ## How was this patch tested?
    
    Modifying Inner implementation won't change behavior of DecisionTree module,
    hence all unit tests before should pass.
    
    Some performance benchmark perhaps are need.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/facaiy/spark ENH/use_sparsity_in_decision_tree

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17383
    
----
commit d2eea0645110b3bcc6c0b905bc55e43e0af9debb
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-22T05:45:58Z

    CLN: use Vector to implement binnedFeatures in TreePoint

commit 9ce6b813beffb9d58e7b2907425a1262610256be
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-22T09:15:30Z

    BUG: fix for incompatible argument of predictImpl method

commit 37f05f9b0386acc8bea048e72aff2b9c37ca4ca6
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-22T09:18:04Z

    CLN: create sparse vector when converting to TreePoint

commit c9664ce6c94b98cbc76253817e637d9a968e4bd6
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-22T09:21:59Z

    CLN: change Array to Vector in TreePoint when created

commit d6ef9e512ea4a58db2dccf3e7cca95f9e8b0df8f
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-23T02:12:22Z

    PREP: use Vector[Int] to store binnedFeature

commit 59eb779a9d4f711e7b28d31d579cc49e3d3cc370
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-23T03:50:14Z

    CLN: change binnedFeatures from def to val

commit 9cbe577b408e987f3026d01316f5a7f2d4c5cfb2
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-28T00:57:42Z

    CLN: use filter to select non-zero bits

commit b5b0dc8683b6e2d7d274aa8d39932dec61e6193d
Author: 颜发才(Yan Facai) <facai....@gmail.com>
Date:   2017-03-28T01:03:55Z

    BUG: fix, compile fails

commit cf7e3d8e03f73df725336d0d5a9dd6cc16e7bf95
Author: Yan Facai (颜发才) <facai....@gmail.com>
Date:   2017-07-05T05:42:09Z

    Merge branch 'master' into ENH/use_sparsity_in_decision_tree

commit 032d50d8c8a851671ba2754cec817d0f6e9ae70f
Author: Yan Facai (颜发才) <facai....@gmail.com>
Date:   2017-07-05T06:20:38Z

    CLN: use BSV in predictImpl

commit 257ddf773eb47499962d6cc57fd1323324dd4ab8
Author: Yan Facai (颜发才) <facai....@gmail.com>
Date:   2017-07-05T06:42:24Z

    ENH: create subclass TreeSparsePoint

commit 8a919735f9474283d263df78feb2e176f66917f3
Author: Yan Facai (颜发才) <facai....@gmail.com>
Date:   2017-07-05T06:58:54Z

    ENH: use TreeDensePoint when numFeatures < 10000

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to