GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/20786

    [SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree 
nodes

    ## What changes were proposed in this pull request?
    
    API:
    ```
    trait ClassificationNode extends Node
      def getLabelCount(label: Int): Double
    
    trait RegressionNode extends Node
      def getCount(): Double
      def getSum(): Double
      def getSquareSum(): Double
    
    // turn LeafNode to be trait
    trait LeafNode extends Node {
      def prediction: Double
      def impurity: Double
      ...
    }
    
    class ClassificationLeafNode extends ClassificationNode with LeafNode
    
    class RegressionLeafNode extends RegressionNode with LeafNode
    
    // turn InternalNode to be trait
    trait InternalNode extends Node{
      def gain: Double
      def leftChild: Node
      def rightChild: Node
      def split: Split
      ...
    }
    
    class ClassificationInternalNode extends ClassificationNode with 
InternalNode
    
    class RegressionInternalNode extends RegressionNode with InternalNode
    
    ```
    
    ### Note: this class hierarchy change will break binary compatibility but 
will keep source compatibility.
    
    ## How was this patch tested?
    
    UT will be added soon.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark tree_stat_api_2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20786
    
----
commit 9dacad30df747add2317d10484550f8f0173a672
Author: WeichenXu <weichen.xu@...>
Date:   2018-03-09T14:38:30Z

    init pr

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to