[R] logistic model cross validation resolved

Trevor Wiens Thu, 17 Mar 2005 21:01:17 -0800

This post is NOT a question, but an answer. For readers please disregard all 
earlier posts by myself about this question.


I'm posting for two reasons. First to say thanks, especially to Dimitris, for 
suggesting the use of errorest in the ipred library. Second, so that the 
solution to this problem is in the archives in case it gets asked again. 
 
If one wants to run a k-fold cross-validation using specified folds, and get 
misclassification error and root mean squared error this is what you do. 

Below is a script that will do this returning the results of two errorest 
results combined into a data frame. Now this script assumes that the variable 
you are going to fold with is an integer. If you want to have the predicted 
values or models returned from each fold, the calls to errorest, can be 
modified. Please see the help page for control.errorest on details.

T
-- 
Trevor Wiens 
[EMAIL PROTECTED]

==========================================================
myglm <- function(formula, family, data){
ret <- glm(formula, family=binomial, data=data)
return(ret)
}

myfacpred <- function(object, newdata) {
ret <- as.factor(ifelse(predict.glm(object, newdata, type='response') < 0.5, 0, 
1))
return(ret)
}

# logerrorest takes four arguments
# mdata is a data frame holding the data to be modeled
# form is the output of the is.formula function
# rvar is the response variable as a string, for example 'birdx'
# fvar is the fold variable, for example 'recordyear'

logerrorest <- function(mdata, form, rvar, fvar) {
require(Hmisc)
require(ipred)

# determine index of variables
rpos <- match(rvar, names(mdata))
fpos <- match(fvar, names(mdata))
# get fold values and count for each group
vardesc <- describe(mdata[[fpos]])$values
fvarlist <- as.integer(dimnames(vardesc)[[2]])
k <- length(fvarlist)
countlist <- vardesc[1,1]

for (i in 2:k)
{
countlist[i] <- vardesc[1,i]
}
n <- length(mdata[[fpos]])

# get index list for each fold
cc <- list()
for (i in 1:k)
{
cc[[i]] <- as.integer(rownames(mdata[mdata[[fpos]] == fvarlist[i], ]))
}

# determine root mean squared error
ee <- errorest(form, mdata, estimator='cv', model=myglm, 
est.para=control.errorest(list.tindx = cc))

# determine misclassification error
# first convert to factor
width = length(mdata)
response <- as.factor(as.integer(mdata[[rpos]]))
newmatrix <- data.frame(cbind(mdata[1:(rpos-1)], response, 
mdata[(rpos+1):width]))
newform <- as.formula(paste('response ~ ', as.character(form)[[3]]))
me <- errorest(newform, newmatrix, estimator='cv', model=myglm, 
predict=myfacpred, est.para=control.errorest(list.tindx = cc))

ret <- data.frame(cbind(ee, me))

return(ret)
}

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] logistic model cross validation resolved

Reply via email to