I'm not clear on why we are emphasizing the trapezoidal rule when the Wilcoxon approach gives you everything plus a standard error.

Frank


Ravi Varadhan wrote:
I assume that you have an ordered pair (x, y) data, where x = sensitivity, and 
y = 1 - specificity.  Your `x' values may or may not be equally spaced.  Here 
is how you could solve your problem.  I show this with an example where we can 
compute the area-under the curve exactly:

# Area under the curve
#
# Trapezoidal rule
# x values need not be equally spaced
#
trapezoid <- function(x,y) sum(diff(x)*(y[-1]+y[-length(y)]))/2
#
#
# Simpson's rule when `n' is odd
# Composite Simpson and Trapezoidal rules when `n' is even
# x values must be equally spaced
#
simpson <- function(x, y){
n <- length(y)
odd <- n %% 2
if (odd) area <- 1/3*sum( y[1] + 2*sum(y[seq(3,(n-2),by=2)]) + 4*sum(y[seq(2,(n-1),by=2)]) + y[n])

if (!odd) area <- 1/3*sum( y[1] + 2*sum(y[seq(3,(n-3),by=2)]) + 4*sum(y[seq(2,(n-2),by=2)]) + y[n-1]) + 1/2*(y[n-1] + y[n])

dx <- x[2] - x[1]
return(area * dx)
}
#
# An example for AUC calculation
x <- seq(0, 1, length=21)

roc <- function(x, a) x + a * x * (1 - x)

plot(x, roc(x, a=0.5), type="l")
lines(x, roc(x, a=0.8), col=2)
lines(x, roc(x, a=1.2), col=3)
abline(b=1, lty=2)

y <- roc(x, a=1)

trapezoid(x, y)  # exact answer is 2/3

simpson(x, y) # exact answer is 2/3

As you can see the Simpson's rule is more accurate, but the difference should not matter 
in applications, as long as you have sufficient number of points for sensitivity and 
specificity.  Also, note that the improved accuracy of Simpson's rule is more fully 
realized when there are "odd" number of `x' values.  If the number of points is 
even, the trapezoidal rule at the end point degrades the accuracy of Simpson 
approximation.

Hope this helps,
Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


----- Original Message -----
From: "olivier.abz" <0509...@rgu.ac.uk>
Date: Thursday, October 22, 2009 10:24 am
Subject: [R]  How to calculate the area under the curve
To: r-help@r-project.org


Hi all,
I would like to calculate the area under the ROC curve for my predictive
model. I have managed to plot points giving me the ROC curve. However, I do not know how to get the value of the area under. Does anybody know of a function that would give the result I want using an
array of specificity and an array of sensitivity as input?

Thanks,
Olivier
--
View this message in context: Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list

PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to