Hi, I was working on a classification problem using the pamr package. I used the pamr.adaptthresh() function to find the optimal accuracy of the classifier. I must not be doing it right, since it doesn't return the threshold values for optimum classification. For example,if I run it on a dataset, I get the following result using pamr.adaptthresh(): predicted true (1) (2) (1) 32 8 (2) 5 17 i.e a mis-classification of (5 + 8 ) / ( 32 + 8 + 5 + 17) However, if I just use an arbitrary threshold (in this case, I chose '2'), I get the following result:
predicted true (1) (2) (1) 35 5 (2) 5 17 i.e a mis-classification of (5 + 5) / ( 32 + 8 + 5 + 17), which is clearly better than the one that I got from using pamr.adaptthresh(). Am I doing something wrong? What do I need to do to ensure that pamr.adaptthresh() returns the least mis-classification error rate? I have tried using different values for 'ntries', and 'reduction factor' in pamr.adaptthresh(), without any success. I have reproduced my code below. Any comments would be appreciated! thanks. ########################### CODE ################################# library(multtest) # golub library(siggenes) # SAM library(e1071) # support vector m/c library(base) library(graphics) library(pamr) library(bootstrap) rm(list = ls()) gc() makeColon <- function(){ # This dataset has 24 cancer, and 9 normal samples n2 <- read.table("data/Colon.data",header = FALSE,sep = ",") cancdat <- n2[,n2[1,]== 'tumor'] normdat <- n2[,n2[1,]== 'normal'] cancdat <- cancdat[-1,] normdat <- normdat[-1,] mat <- as.matrix(cbind(cancdat,normdat)) actclass <- rep(c(1, 2), c(ncol(cancdat), ncol(normdat))) return(list(mat,actclass)) } m <- makeColon() mat <- m[[1]] actclass <- m[[2]] mat <- matrix(as.numeric(mat),nrow(mat),ncol(mat)) geneid = as.character(1:nrow(mat)) gs = as.character(1:nrow(mat)) mydata <- list(x= mat,y=factor(actclass),geneid = geneid ,genenames=gs) mytrain <- pamr.train(mydata) new.scales <- pamr.adaptthresh(mytrain,ntries = 10, reduction.factor = 0.9) mytrain2 <- pamr.train(mydata,threshold.scale = new.scales) mycv <- pamr.cv(mytrain2,mydata,nfold = 10) res1 <- pamr.confusion(mycv, threshold = mytrain2$threshold.scale,extra = FALSE) print(res1) res2 <- pamr.confusion(mycv, threshold = 2,extra = FALSE) print(res2) ########################### END CODE ############################### --------------------------------- [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html