Thanks for your note Max. Part of the picture is how predictions would be used. If they are used in a "forced choice" way (quite a shame because the best decision is often no decision - get more data) things are different. If there are gray zones or predicted probabilities are of interest then I'd avoid ROC area as a measure and use penalized likelihood (speaking in crude generality).
Frank Max Kuhn wrote: > > Frank, > > It depends on how you define "optimal". While I'm not a big fan of > using the area under the ROC to characterize performance, there are a > lot of times when likelihood measures are clearly sub-optimal in > performance. Using resampled accuracy (or Kappa) instead of deviance > (out-of-bag or not) is likely to produce more inaccurate models (not > shocking, right?). > > The best example is determining the number of boosting iterations. >>From Friedman (2001): ``[...] degrading the likelihood by overfitting > actually improves misclassification error rates. Although perhaps > counterintuitive, this is not a contradiction; likelihood and error > rate measure different aspects of fit quality.'' > > My argument here assumes that you are fitting a model for the purposes > of prediction rather than interpretation. This particular case > involves random forests, so I'm hoping that statistical inference is > not the goal. > > > Ref: Friedman. Greedy function approximation: a gradient boosting > machine. Annals of Statistics (2001) pp. 1189-1232 > > > Thanks, > > Max > > On Fri, May 13, 2011 at 8:11 AM, Frank Harrell > <f.harr...@vanderbilt.edu> wrote: >> Using anything other than deviance (or likelihood) as the objective >> function >> will result in a suboptimal model. >> Frank >> >> ----- >> Frank Harrell >> Department of Biostatistics, Vanderbilt University >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Can-ROC-be-used-as-a-metric-for-optimal-model-selection-for-randomForest-tp3519003p3520043.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > > Max > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Can-ROC-be-used-as-a-metric-for-optimal-model-selection-for-randomForest-tp3519003p3521274.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.