On Wed, 12 Jan 2005, Frank E Harrell Jr wrote: >Dan Bolser wrote: >> Hello, >> >> I am making some use of ROC curve analysis. >> >> I find much help on the mailing list, and I have used the Area Under the >> Curve (AUC) functions from the ROC function in the bioconductor project... >> >> http://www.bioconductor.org/repository/release1.5/package/Source/ >> ROC_1.0.13.tar.gz >> >> However, I read here... >> >> http://www.medcalc.be/manual/mpage06-13b.php >> >> "The 95% confidence interval for the area can be used to test the >> hypothesis that the theoretical area is 0.5. If the confidence interval >> does not include the 0.5 value, then there is evidence that the laboratory >> test does have an ability to distinguish between the two groups (Hanley & >> McNeil, 1982; Zweig & Campbell, 1993)." >> >> But aside from early on the above article is short on details. Can anyone >> tell me how to calculate the CI of the AUC calculation? >> >> >> I read this... >> >> http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf >> >> Which talks about resampling (by showing R code), but I can't understand >> what is going on, or what is calculated (the example given is specific to >> microarray analysis I think). >> >> I think a general AUC CI function would be a good addition to the ROC >> package. >> >> >> >> >> One more thing, in calculating the AUC I see the splines function is >> recomended over the approx function. Here... >> >> http://tolstoy.newcastle.edu.au/R/help/04/10/6138.html >> >> How would I rewrite the following AUC functions (adapted from bioconductor >> source) to use splines (or approxfun or splinefun) ... >> >> >>>spe # Specificity >> >> [1] 0.02173913 0.13043478 0.21739130 0.32608696 0.43478261 0.54347826 >> [7] 0.65217391 0.76086957 0.89130435 1.00000000 1.00000000 1.00000000 >> [13] 1.00000000 >> >> >>>sen # Sensitivity >> >> [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.9302326 0.8139535 >> [8] 0.6976744 0.5581395 0.4418605 0.3488372 0.2325581 0.1162791 >> >> trapezint(1-spe,sen) >> my.integrate(1-spe,sen) >> >> ## Functions >> ## Nicked (and modified) from the ROC function in bioconductor. >> "trapezint" <- >> function (x, y, a = 0, b = 1) >> { >> if (x[1] > x[length(x)]) { >> x <- rev(x) >> y <- rev(y) >> } >> y <- y[x >= a & x <= b] >> x <- x[x >= a & x <= b] >> if (length(unique(x)) < 2) >> return(NA) >> ya <- approx(x, y, a, ties = max, rule = 2)$y >> yb <- approx(x, y, b, ties = max, rule = 2)$y >> x <- c(a, x, b) >> y <- c(ya, y, yb) >> h <- diff(x) >> lx <- length(x) >> 0.5 * sum(h * (y[-1] + y[-lx])) >> } >> >> "my.integrate" <- >> function (x, y, t0 = 1) >> { >> f <- function(j) approx(x,y,j,rule=2,ties=max)$y >> integrate(f, 0, t0)$value >> } >> >> >> >> >> >> Thanks for any pointers, >> Dan. > >I don't see why the above formulas are being used. The >Bamber-Hanley-McNeil-Wilcoxon-Mann-Whitney nonparametric method works >great. Just get the U statistic (concordance probability) used in >Wilcoxon. As Somers' Dxy rank correlation coefficient is 2*(1-C) where >C is the concordance or ROC area, the Hmisc package function rcorr.cens >uses U statistic methods to get the standard error of Dxy. You can >easily translate this to a standard error of C.
I am sure I could do this easily, except I can't. The good thing about ROC is that I understand it (I can see it). I know why the area means what it means, and I could even imagine how sampling the data could give a CI on the area. However, I don't know why "the area under the ROC curve is well known to be equivalent to the numerator of the Mann-Whitney U statistic" - from http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf Nor do I know how to calculate "the numerator of the Mann-Whitney U statistic". Can you point me at some ? pages or tutorials or even give an example of what you suggested so I can try to follow it through? I tried the following... x <- rnorm(100,5,1) # REAL NEGATIVE # y <- rnorm(100,8,1) # REAL POSITIVE t <- wilcox.test(x,y,paired=FALSE,conf.int=0.95) > t Wilcoxon rank sum test with continuity correction data: x and y W = 132, p-value < 2.2e-16 alternative hypothesis: true mu is not equal to 0 95 percent confidence interval: -3.232207 -2.664620 sample estimates: difference in location -2.957496 And from ?wilcox.test ... "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out." But I don't know what to do next. Sorry for all the questions, but I am a dumb biologist. Thanks for the help, Dan. > >Frank > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html