Hi,
   
  I was working on a classification problem using the pamr package. I used the 
pamr.adaptthresh() function to find the optimal accuracy of the classifier. I 
must not be doing it right, since it doesn't return the threshold values for 
optimum classification.
   
  For example,if I run it on a dataset, I get the following result using 
pamr.adaptthresh():
   
      predicted
true  (1)  (2)
   (1) 32  8
   (2)  5 17
   
  i.e a mis-classification of (5 + 8 ) / ( 32 + 8 + 5 + 17)
   
  However, if I just use an arbitrary threshold (in this case, I chose '2'), I 
get the following result:

      predicted
true  (1)  (2)
   (1) 35  5
   (2)  5 17
   
  i.e a mis-classification of (5 + 5) / ( 32 + 8 + 5 + 17), which is clearly 
better than the one that I got from using pamr.adaptthresh().
   
  Am I doing something wrong? What do I need to do to ensure that 
pamr.adaptthresh() returns the least mis-classification error rate?
   
  I have tried using different values for 'ntries', and 'reduction factor' in 
pamr.adaptthresh(), without any success.
   
  I have reproduced my code below. Any comments would be appreciated!
   
  thanks.
   
  ########################### CODE #################################
   
  library(multtest) # golub
library(siggenes) # SAM
library(e1071)  # support vector m/c
library(base)
library(graphics)
library(pamr)
library(bootstrap)
   
  rm(list = ls())
gc()

  
makeColon <- function(){
  # This dataset has 24 cancer, and 9 normal samples
  n2 <- read.table("data/Colon.data",header = FALSE,sep = ",")     
    cancdat <- n2[,n2[1,]== 'tumor'] 
    normdat <- n2[,n2[1,]== 'normal']
    cancdat <- cancdat[-1,]
    normdat <- normdat[-1,]
    mat <-  as.matrix(cbind(cancdat,normdat))
    actclass <-  rep(c(1, 2), c(ncol(cancdat), ncol(normdat)))
    return(list(mat,actclass))
  }
   
  m <- makeColon()
mat <- m[[1]]
actclass <- m[[2]]
   mat <- matrix(as.numeric(mat),nrow(mat),ncol(mat))
   
 geneid = as.character(1:nrow(mat))
 gs = as.character(1:nrow(mat))
 mydata <- list(x= mat,y=factor(actclass),geneid = geneid ,genenames=gs) 
 mytrain <-   pamr.train(mydata)
   new.scales <- pamr.adaptthresh(mytrain,ntries = 10, reduction.factor = 0.9)  
             mytrain2 <- pamr.train(mydata,threshold.scale = new.scales)
   mycv <- pamr.cv(mytrain2,mydata,nfold = 10)
   
   res1 <- pamr.confusion(mycv,  threshold = mytrain2$threshold.scale,extra = 
FALSE)
   print(res1)
  
 res2 <- pamr.confusion(mycv,  threshold = 2,extra = FALSE)
 print(res2)
   
   
   
  ########################### END CODE ###############################
   
   

                
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to