[ 
https://issues.apache.org/jira/browse/SPARK-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293259#comment-17293259
 ] 

Julian King commented on SPARK-3159:
------------------------------------

I also need the probability estimates for the tree, not the classifier output.

Does the code (after the accepted PR) means that nodes will always be merged if 
the classification output is the same? This radically reduces the utility of 
decision trees for insight generation. 

We are encountering a situation where the decision tree refuses to split even a 
single node in situations where it should, and are wondering whether it relates 
to this behaviour.

Is there any way to disable this? [~asolimando]

> Check for reducible DecisionTree
> --------------------------------
>
>                 Key: SPARK-3159
>                 URL: https://issues.apache.org/jira/browse/SPARK-3159
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Alessandro Solimando
>            Priority: Minor
>             Fix For: 2.4.0
>
>         Attachments: image-2020-05-24-23-00-38-419.png
>
>
> Improvement: test-time computation
> Currently, pairs of leaf nodes with the same parent can both output the same 
> prediction.  This happens since the splitting criterion (e.g., Gini) is not 
> the same as prediction accuracy/MSE; the splitting criterion can sometimes be 
> improved even when both children would still output the same prediction 
> (e.g., based on the majority label for classification).
> We could check the tree and reduce it if possible after training.
> Note: This happens with scikit-learn as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to