GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/20758

    [SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree 
nodes

    ## What changes were proposed in this pull request?
    
    Provide label/impurity stats for spark.ml decision tree nodes.
    
    API:
    ```
    class TreeClassifierStatInfo
       def getLabelCount(label: Int): Double
    
    class TreeRegressorStatInfo
       def getCount(): Double
       def getSum(): Double
       def getSquareSum(): Double
    
    class Node
       ....
       +++ def statInfo: TreeStatInfo
    
    trait TreeStatInfo
       def asTreeClassifierStatInfo: TreeClassifierStatInfo
       def asTreeRegressorStatInfo: TreeRegressorStatInfo
    ```
    
    ## How was this patch tested?
    
    UT added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark tree_stat_api

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20758
    
----
commit e57ffaaad1666577d956c1f8f734f97569b93969
Author: WeichenXu <weichen.xu@...>
Date:   2018-03-07T10:37:22Z

    init pr

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to