GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20786
[SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree nodes ## What changes were proposed in this pull request? API: ``` trait ClassificationNode extends Node def getLabelCount(label: Int): Double trait RegressionNode extends Node def getCount(): Double def getSum(): Double def getSquareSum(): Double // turn LeafNode to be trait trait LeafNode extends Node { def prediction: Double def impurity: Double ... } class ClassificationLeafNode extends ClassificationNode with LeafNode class RegressionLeafNode extends RegressionNode with LeafNode // turn InternalNode to be trait trait InternalNode extends Node{ def gain: Double def leftChild: Node def rightChild: Node def split: Split ... } class ClassificationInternalNode extends ClassificationNode with InternalNode class RegressionInternalNode extends RegressionNode with InternalNode ``` ### Note: this class hierarchy change will break binary compatibility but will keep source compatibility. ## How was this patch tested? UT will be added soon. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark tree_stat_api_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20786.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20786 ---- commit 9dacad30df747add2317d10484550f8f0173a672 Author: WeichenXu <weichen.xu@...> Date: 2018-03-09T14:38:30Z init pr ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org