[GitHub] spark issue #17383: [SPARK-3165][MLlib] DecisionTree use sparsity in data

facaiy Tue, 26 Sep 2017 20:16:38 -0700

Github user facaiy commented on the issue:

    https://github.com/apache/spark/pull/17383
  
    Hi, since the work has been done for a long time, I take a review by 
myself. 
    
    After careful review, as SparseVector is compressed sparse row format, so 
the only benefit of the PR would be for data storage but in the cost of 
performance. But for tree-method, it is uncommon to handle a super large 
dimension features. Hence, it cannot satisfy me.
    
    I prefer to [SPARK-3717: DecisionTree, RandomForest: Partition by 
feature](https://issues.apache.org/jira/browse/SPARK-3717) as an alternative, 
which will be benefits in both performance and storage if I understand 
correctly. So the PR is closed. Thank everyone for review / comment.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17383: [SPARK-3165][MLlib] DecisionTree use sparsity in data

Reply via email to