Re: [R] ROCR package finding maximum accuracy and optimal cutoff point

2009-03-28 Thread Saeed Abu Nimeh
Found the solution to my own question. To find the false positive rate
and the false negative rate that correspond to a certain cutoff point
using the ROCR package, one can do the following (for sure there is
simpler ways, but this works):

library(ElemStatLearn)
library(rpart)
data(spam)

##
# create a train and test sets   #
##
index- 1:nrow(spam)
testindex - sample(index, trunc(length(index)/3))
testset - spam[testindex, ]
trainset - spam[-testindex, ]
rpart.model - rpart(spam ~ ., data = trainset) # training model

##
# use ROCR to calculate accuracy #
# fp,fn,tp,tn rates  #
##
library(ROCR)
rpart.pred2 - predict(rpart.model, testset)[,2]  #testing model
pred-prediction(rpart.pred2,testset[,58]) #prediction using rocr
perf.acc-performance(pred,acc) #find list of accuracies
perf.fpr-performance(pred,fpr) # find list of fp rates
perf.fnr-performance(pred,fnr) # find list of fn rates

acc.rocr-max(perf@y.values[[1]])   # accuracy using rocr

#find cutoff list for accuracies
cutoff.list.acc - unlist(perf@x.values[[1]])

#find optimal cutoff point for accuracy
optimal.cutoff.acc-cutoff.list.acc[which.max(perf@y.values[[1]])]

#find optimal cutoff fpr, as numeric because a list is returned
optimal.cutoff.fpr-which(perf@x.values[[1]]==as.numeric(optimal.cutoff.acc))

# find cutoff list for fpr
cutoff.list.fpr - unlist(perf@y.values[[1]])
# find fpr using rocr
fpr.rocr-cutoff.list.fpr[as.numeric(optimal.cutoff.fpr)]

#find optimal cutoff fnr
optimal.cutoff.fnr-which(perf@x.values[[1]]==as.numeric(optimal.cutoff.acc))
#find list of fnr
cutoff.list.fnr - unlist(perf@y.values[[1]])
#find fnr using rocr
fnr.rocr-cutoff.list.fnr[as.numeric(optimal.cutoff.fnr)]

Now acc.rocr, fpr.rocr, fnr.rocr will give you the accuracy, fpr, and
fnr percentages

Saeed Abu Nimeh wrote:
 If we use the ROCR package to find the accuracy of a classifier
 pred - prediction(svm.pred, testset[,2])
 perf.acc - performance(pred,acc)
 
 Do we find the maximum accuracy as follows (is there a simplier way?):
 max(perf@x.values[[1]])
 
 Then to find the cutoff point that maximizes the accuracy do we do the
 following (is there a simpler way):
 cutoff.list - unlist(perf@x.values[[1]])
 cutoff.list[which.max(perf@y.values[[1]])]
 
 If the above is correct how is it possible to find the average false
 positive and negative rates  from the following
 perf.fpr - performance(pred, fpr)
 perf.fnr - performance(pred, fnr)
 
 The dataset that consists of two columns; score and a binary response,
 similar to this:
 2.5, 0
 -1, 0
 2, 1
 6.3, 1
 4.1, 0
 3.3, 1
 
 
 Thanks,
 Saeed
  ---
 R 2.8.1 Win XP Pro SP2
 ROCR package v1.0-2
 e1071 v1.5-19


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROCR package finding maximum accuracy and optimal cutoff point

2009-03-26 Thread Saeed Abu Nimeh
If we use the ROCR package to find the accuracy of a classifier
pred - prediction(svm.pred, testset[,2])
perf.acc - performance(pred,acc)

Do we find the maximum accuracy as follows (is there a simplier way?):
 max(perf@x.values[[1]])

Then to find the cutoff point that maximizes the accuracy do we do the
following (is there a simpler way):
 cutoff.list - unlist(perf@x.values[[1]])
 cutoff.list[which.max(perf@y.values[[1]])]

If the above is correct how is it possible to find the average false
positive and negative rates  from the following
perf.fpr - performance(pred, fpr)
perf.fnr - performance(pred, fnr)

The dataset that consists of two columns; score and a binary response,
similar to this:
2.5, 0
-1, 0
2, 1
6.3, 1
4.1, 0
3.3, 1


Thanks,
Saeed
 ---
R 2.8.1 Win XP Pro SP2
ROCR package v1.0-2
e1071 v1.5-19

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.