Hello Lyndon, I'm still puzzled what x and y actually are and how they are related. Your "bins" are calculated from x only, so you might missing some range of y and since you use a fixed number of bins of equal size, some bins might be empty (and therefore reproducing the result from the bin before), and some might contain more than one distinct value (and so are ignoring an intermediate step). But this might be intentional and the reason for you to build up your own function. If not, have a look at pROC, there is a clever way to calculate the required bins, in particular look at
getAnywhere("roc.utils.thresholds") Some tweaking of your code is possible anyway: > for(i in 1:length(threshold)) { > tp[i] <- length(x2) - length(x2[x2 <= bins[i]]) > fp[i] <- length(y2) - length(y2[y2 <= bins[i]]) > } calculates length(x2) and length(y2) in every single loop, but this is a constant, so it is wasting time. Anyway, you get away even without calculating this once, using tp[i]<-sum(x2>bins[i]) #indexing is not necessary, #and 1-x2[j]<=tr == x2[j]>tr and on top one that, the fanciest or at least R'ish way, by common understanding, is avoiding loops and using *apply, but that's a matter of taste, when it comes to that ;) tp<-sapply(bins,function(b)sum(x>b)) fp<-sapply(bins,function(b)sum(y>b)) #or all at once ctch<-t(sapply(bins,function(b)c(b,sum(x>b),sum(y>b)))) A last minor remark: sorting x and y is not necessary in the new code fragment. hth. Am 13.07.2011 16:46, schrieb Lyndon Estes: > Hello Eik, > > Thanks very much for your response and for directing me to a useful > explanation. > > To make sure I am grasping your answer correctly, the two problems I > was experiencing are related to the fact that the floating point > numbers I was calculating and using in subsequent indices were not > entirely equivalent, even if they were calculated using the same > function (quantile). > > I confirmed this as follows: > > j <- 0.055 # bins value for 5.5% threshold > bin.ct <- as.numeric(as.character(quantile(x, j, na.rm = TRUE))) > bin.ct > #430.29 > bin.ct2 <- quantile(x, j, na.rm = TRUE) > bin.ct2 > # 5.5% > #430.29 > bin.ct - bin.ct2 > # 5.5% > #5.684342e-14 > length(x) - length(x[x <= bin.ct]) > length(x) - length(x[x <= bin.ct]) # 2875 > length(x) - length(x[x <= bin.ct2]) # 3029 > > Testing the unname() option does not fix the result, I should however note: > bin.ct <- as.numeric(as.character(quantile(x, j, na.rm = TRUE))) > bin.ct2 <- unname(quantile(x, j, na.rm = TRUE)) > bin.ct - bin.ct2 # 5.684342e-14 > > But rounding, as an alternative approach, works: > bin.ct2 <- round(quantile(x, j, na.rm = TRUE), 2) > bin.ct - bin.ct2 > #5.5% > # 0 > length(x) - length(x[x <= bin.ct]) # 2875 > length(x) - length(x[x <= bin.ct2]) # 2875 > > As to my code, it is part of a custom ROC function I created a while > back and started using again recently (the data are rainfall > values). I can't remember why I did this rather than using one of the > existing ROC functions, but I thought (probably incorrectly) that I > had some compelling reason. In > any case, it is quite unwieldy, so I will explore those other > packages, or try revise this to be more efficient > > (e.g. maybe this is a better approach, although the answers are fairly > different?). > > x2 <- x[order(x)] > y2 <- y[order(y)] > bins <- round(seq(min(x2), max(x2), by = diff(range(x2)) / 200), 2) > threshold <- seq(0, 100, by = 0.5) > tp <- rep(0, length(bins)) > fp <- rep(0, length(bins)) > for(i in 1:length(threshold)) { > tp[i] <- length(x2) - length(x2[x2 <= bins[i]]) > fp[i] <- length(y2) - length(y2[y2 <= bins[i]]) > } > ctch <- cbind(threshold, bins, tp, fp) > ctch[1:20, ] > > Thanks again for your help. > > Cheers, Lyndon > > > On Tue, Jul 12, 2011 at 5:09 AM, Eik Vettorazzi > <e.vettora...@uke.uni-hamburg.de> wrote: >> Hi, >> >> Am 11.07.2011 22:57, schrieb Lyndon Estes: >>> ctch[ctch$threshold == 3.5, ] >>> # [1] threshold val tp fp tn fn tpr >>> fpr tnr fnr >>> #<0 rows> (or 0-length row.names) >> >> this is the very effective FAQ 7.31 trap. >> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f >> >> Welcome to the first circle of Patrick Burns' R Inferno! >> >> Also, unname() is a more intuitive way of removing names. >> >> And I think your code is quite inefficient, because you calculate >> quantiles many times, which involves repeated ordering of x, and you may >> use a inefficient size of bin (either to small and therefore calculating >> the same split many times or to large and then missing some splits). >> I'm a bit puzzled what is x and y in your code, so any further advise is >> vague but you might have a look at any package that calculates >> ROC-curves such as ROCR or pROC (and many more). >> >> Hth >> >> -- >> Eik Vettorazzi >> >> Department of Medical Biometry and Epidemiology >> University Medical Center Hamburg-Eppendorf >> >> Martinistr. 52 >> 20246 Hamburg >> >> T ++49/40/7410-58243 >> F ++49/40/7410-57790 >> > > > -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.