[ https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611244#comment-14611244 ]
Joseph K. Bradley commented on SPARK-8069: ------------------------------------------ I like the idea of including it in an abstraction like ClassificationModel and ProbabilisticClassificationModel, unless it is too difficult. If a developer does not want to support thresholds/cutoffs (or wants to modify the API), the developer does not have to use the abstraction. The main difficulty I see is in trying to specify thresholds in a uniform way: * Thresholding rawPrediction vs. probability: It would be easy to mimic the R randomForest package for thresholding probabilities, for which we know which values are in the range [0,1]. That won't work well for rawPrediction values, which could be negative. ** We could initially only support thresholding for ProbabilisticClassificationModel. I expect to modify trees & tree ensembles to subclass ProbabilisticClassificationModel in release 1.5 (WIP). ** Do you have ideas for thresholding for rawPrediction? * Binary vs. multiclass: It would be nice to think of a way to naturally support binary, though it might mean modifying or deprecating HasThreshold. Once we decide on a good way to specify thresholds, then perhaps the binary case can be handled by providing a setter as in HasThreshold ({{setThreshold(value: Double)}}) but returning the generalized threshold in the getter ({{Vector getThreshold}}). > Add support for cutoff to RandomForestClassifier > ------------------------------------------------ > > Key: SPARK-8069 > URL: https://issues.apache.org/jira/browse/SPARK-8069 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: holdenk > Priority: Minor > Original Estimate: 240h > Remaining Estimate: 240h > > Consider adding support for cutoffs similar to > http://cran.r-project.org/web/packages/randomForest/randomForest.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org