GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/17383
[SPARK-3165][MLlib][WIP] DecisionTree does not use sparsity in data ## What changes were proposed in this pull request? DecisionTree should take advantage of sparse feature vectors. Aggregation over training data could handle the empty/zero-valued data elements more efficiently. ## How was this patch tested? Modifying Inner implementation won't change behavior of DecisionTree module, hence all unit tests before should pass. Some performance benchmark perhaps are need. You can merge this pull request into a Git repository by running: $ git pull https://github.com/facaiy/spark ENH/use_sparsity_in_decision_tree Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17383.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17383 ---- commit d2eea0645110b3bcc6c0b905bc55e43e0af9debb Author: é¢åæï¼Yan Facaiï¼ <facai....@gmail.com> Date: 2017-03-22T05:45:58Z CLN: use Vector to implement binnedFeatures in TreePoint ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org