If you define a cost function for a given threshold k as

   cost(k) = FP(k) + lambda * FN(k)

then choose k that minimises cost. FP and FN are false positives and
false negatives at threshold k. 

You change lambda to a value greater than 1 if you want to penalise FN
more than FP. There are many situations where this is desirable. For
example when you have highly unbalanced class sizes. For example
consider a problem where you want to predict rare events and you will be
penalised much more heavily if you miss an event than a non-event.


I believe the ROC was designed to compare two methods over a range of
thresholds and not for choosing the threshold itself.

Regards, Adai



On Fri, 2006-03-31 at 08:01 -0500, Tim Howard wrote:
> Jose - 
> 
> I've struggled a bit with the same question, said another way: "how do you 
> find the value in a ROC curve that minimizes false positives while maximizing 
> true positives"?
> 
> Here's something I've come up with. I'd be curious to hear from the list 
> whether anyone thinks this code might get stuck in local minima, or if it 
> does find the global minimum each time. (I think it's ok).
> 
> >From your ROC object you need to grab the sensitivity (=true positive rate) 
> >and specificity (= 1- false positive rate) and the cutoff levels.  Then find 
> >the value that minimizes abs(sensitivity-specificity), or  
> >sqrt((1-sens)^2)+(1-spec)^2)) as follows:
> 
> absMin <- extract[which.min(abs(extract$sens-extract$spec)),];
> sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),];
> 
> In this example, 'extract' is a dataframe containing three columns: 
> extract$sens = sensitivity values, extract$spec = specificity values, 
> extract$votes = cutoff values. The command subsets the dataframe to a single 
> row containing the desired cutoff and the sens and spec values that are 
> associated with it.
> 
> Most of the time these two answers (abs or sqrt) are the same, sometimes they 
> differ quite a bit. 
> 
> I do not see this application of ROC curves very often. A question for those 
> much more knowledgeable than I.... is there a problem with using ROC curves 
> in this manner?
> 
> Tim Howard
> 
> 
> 
> 
> Date: Fri, 31 Mar 2006 11:58:14 +0200
> From: "Anadon Herrera, Jose Daniel" <[EMAIL PROTECTED]>
> Subject: [R] ROC optimal threshold
> To: "'r-help@stat.math.ethz.ch'" <r-help@stat.math.ethz.ch>
> Message-ID:
>       <[EMAIL PROTECTED]>
> Content-Type: text/plain;     charset=iso-8859-1
> 
> hello,
> 
> I am using the ROC package to evaluate predictive models
> I have successfully plot the ROC curve, however
> 
> ?is there anyway to obtain the value of operating point=optimal threshold
> value (i.e. the nearest point of the curve to the top-left corner of the
> axes)?
> 
> thank you very much,
> 
> 
> jose daniel anadon
> area de ecologia
> universidad miguel hernandez
> 
> espa?a
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to