On Fri, 18 Jul 2003 17:26:44 -0400, Frank E Harrell Jr wrote: > The choice of a single cutpoint presents a host of statistical and > subject matter deficiencies, but if you really need one (which implies > that your internal utility function is the same as the consumers') you > are right that you need to make the cross-validation take the cutpoint > search into account. The bootstrap is probably the best approach. Have > an algorithm for choosing the "best" cutpoint and repeat that algorithm > 200 times by replacing the original dataset with samples with > replacement from the original (using the same total number of > observations). You can get a confidence interval for the cutpoint this > way.
Thanks for your reply. If I could choose a single cutpoint, I'd certainly do that. No need in creating an ROC in that case. However, I do not know the utility function. I am merely trying to find a way to create a 'fair' ROC. By that, I mean an ROC that gives an estimate of the generalization error that is not flawed by the bias due to selecting the optimal cutpoint. Maybe some clarification of this bias: Suppose we have a very simple classification algorithm: f(x)=x. Now, we create an ROC based on the cut-off value (f(x)>C is class A, f(x)<C is class B). How is, for each value of C, the point in the ROC determined? I'd say any validation is unnecessary in this case (we didn't estimate any parameter other than the cut-off value). However, if we draw the ROC, pick a cut-off value and test the algorithm on a different data set, the error is likely to be larger than the one in the ROC. Regards, Koen . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
