Hi, I was lately debugging parts of my 'colAUC' function in caTools package, and in a process looked into other packages for calculating Areas Under ROC Curves (AUC). To my surprise I found at least 6 other functions: * wilcox.test * AUC from ROC package, * performance from ROCR package, * auROC from limma package, * ROC from Epi package, * roc.area from verification package And I belive I found problems with 3 of them: 'ROC', 'auROC' & 'roc.area'
I wrote short code to compare them all: # Compare calAUC with other functions designed for similar purpose library(caTools) library(verification) library(ROC) library(ROCR) library(Epi) library(limma) library(MASS) data(cats) colAUC(cats[,2:3], cats[,1], plotROC=TRUE) auc = matrix(NA,9,3) pval = matrix(NA,4,3) rownames(auc) = c("colAUC(alg='ROC')", "colAUC(alg='Wilcox')", "wilcox.test", "rank", "roc.area", "AUC", "performance", "ROC", "auROC") colnames(auc ) = c("AUC(x)", "AUC(-x)", "AUC(x+noise)") rownames(pval) = c("wilcox.test(exact=1)","wilcox.test(exact=0)", "roc.area()$p.adj","roc.area()$p") colnames(pval) = c("p(x)", "p(-x)", "p(x+noise)") X = cbind(cats[,2], -cats[,2], cats[,2]+rnorm(nrow(cats))/10 ) y = ifelse(cats[,1]=='F',0,1) for (i in 1:3) { x = X[,i] x1 = x[y==1]; n1 = length(x1); # prepare input data ... x2 = x[y==0]; n2 = length(x2); # ... into required format r = rank(c(x1,x2)) auc[1,i] = colAUC(x, y, alg="ROC") auc[2,i] = colAUC(x, y, alg="Wilcox") auc[3,i] = wilcox.test(x1, x2, exact=0)$statistic / (n1*n2) auc[4,i] = (sum(r[1:n1]) - n1*(n1+1)/2) / (n1*n2) auc[5,i] = roc.area(y, x)$A.tilda auc[6,i] = AUC(rocdemo.sca(y, x, dxrule.sca)) auc[7,i] = performance(prediction( x, y),"auc")@y.values[[1]] auc[8,i] = ROC(x,y,grid=0)$AUC # get AUC by 'ROC' auc[9,i] = auROC(y, x) # get AUC by 'auROC' pval[1,i] = wilcox.test(x1, x2, exact=0)$p.value pval[2,i] = wilcox.test(x1, x2, exact=1)$p.value pval[3,i] = roc.area(y, x)$p.adj pval[4,i] = roc.area(y, x)$p } print(auc) print(pval) Which gave the following results: > print(auc) AUC(x) AUC(-x) AUC(x+noise) colAUC(alg='ROC') 0.8338451 0.8338451 0.8225488 colAUC(alg='Wilcox') 0.8338451 0.8338451 0.8225488 wilcox.test 0.8338451 0.1661549 0.8225488 rank 0.8338451 0.1661549 0.8225488 roc.area 0.8338451 0.1661549 0.8225488 AUC 0.8338451 0.1661549 0.8225488 performance 0.8338451 0.1661549 0.8225488 ROC 0.8338451 0.1654968 0.8225488 auROC 0.8131169 0.1454266 0.8225488 > print(pval) p(x) p(-x) p(x+noise) wilcox.test(exact=1) 8.200e-11 8.200e-11 3.774e-10 wilcox.test(exact=0) 8.200e-11 8.200e-11 3.817e-11 roc.area()$p.adj 4.042e-11 1.000e+00 1.861e-10 roc.area()$p 4.446e-11 1.000e+00 1.861e-10 Some thoughts about those results: - Data used for the testing: column 1 (x) - has ties, column 2 (-x) - negative of the same data column 3 (x+noise) - similar data but with no ties - All AUC functions gave the same results for data with no ties - 'ROC' and 'auROC' gave different results for data with ties - 'colAUC' (my function) returns 'max(auc, 1-auc)' that's why AUC(x) and AUC(-x) are the same - I assume that "roc.area()$p.adj" and "wilcox.test" p.values suppose to return the same results, but they are different. - At least 'roc.area(x)$p.adj' and 'roc.area(-x)$p.adj' should be the same. Comments? Corrections? Jarek ====================================================\==== Jarek Tuszynski, PhD. o / \ Science Applications International Corporation <\__,| (703) 676-4192 "> \ [EMAIL PROTECTED] ` \ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html