[ https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492906#comment-14492906 ]
Max Kaznady commented on SPARK-3727: ------------------------------------ Yes, probabilities have to be added to other models too, like LogisticRegression. Right now they are hardcoded in two places but not outputted in PySpark. I think is makes sense to split into PySpark, then classification, then probabilities, and then group different types of algorithms, all of which output probabilities: Logistic Regression, Random Forest, etc. Can also add probabilities for trees by counting the number of leaf 1's and 0's. What do you think? > DecisionTree, RandomForest: More prediction functionality > --------------------------------------------------------- > > Key: SPARK-3727 > URL: https://issues.apache.org/jira/browse/SPARK-3727 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Joseph K. Bradley > > DecisionTree and RandomForest currently predict the most likely label for > classification and the mean for regression. Other info about predictions > would be useful. > For classification: estimated probability of each possible label > For regression: variance of estimate > RandomForest could also create aggregate predictions in multiple ways: > * Predict mean or median value for regression. > * Compute variance of estimates (across all trees) for both classification > and regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org