This post is NOT a question, but an answer. For readers please disregard all
earlier posts by myself about this question.
I'm posting for two reasons. First to say thanks, especially to Dimitris, for
suggesting the use of errorest in the ipred library. Second, so that the
solution to this problem is in the archives in case it gets asked again.
If one wants to run a k-fold cross-validation using specified folds, and get
misclassification error and root mean squared error this is what you do.
Below is a script that will do this returning the results of two errorest
results combined into a data frame. Now this script assumes that the variable
you are going to fold with is an integer. If you want to have the predicted
values or models returned from each fold, the calls to errorest, can be
modified. Please see the help page for control.errorest on details.
T
--
Trevor Wiens
[EMAIL PROTECTED]
==========================================================
myglm <- function(formula, family, data){
ret <- glm(formula, family=binomial, data=data)
return(ret)
}
myfacpred <- function(object, newdata) {
ret <- as.factor(ifelse(predict.glm(object, newdata, type='response') < 0.5, 0,
1))
return(ret)
}
# logerrorest takes four arguments
# mdata is a data frame holding the data to be modeled
# form is the output of the is.formula function
# rvar is the response variable as a string, for example 'birdx'
# fvar is the fold variable, for example 'recordyear'
logerrorest <- function(mdata, form, rvar, fvar) {
require(Hmisc)
require(ipred)
# determine index of variables
rpos <- match(rvar, names(mdata))
fpos <- match(fvar, names(mdata))
# get fold values and count for each group
vardesc <- describe(mdata[[fpos]])$values
fvarlist <- as.integer(dimnames(vardesc)[[2]])
k <- length(fvarlist)
countlist <- vardesc[1,1]
for (i in 2:k)
{
countlist[i] <- vardesc[1,i]
}
n <- length(mdata[[fpos]])
# get index list for each fold
cc <- list()
for (i in 1:k)
{
cc[[i]] <- as.integer(rownames(mdata[mdata[[fpos]] == fvarlist[i], ]))
}
# determine root mean squared error
ee <- errorest(form, mdata, estimator='cv', model=myglm,
est.para=control.errorest(list.tindx = cc))
# determine misclassification error
# first convert to factor
width = length(mdata)
response <- as.factor(as.integer(mdata[[rpos]]))
newmatrix <- data.frame(cbind(mdata[1:(rpos-1)], response,
mdata[(rpos+1):width]))
newform <- as.formula(paste('response ~ ', as.character(form)[[3]]))
me <- errorest(newform, newmatrix, estimator='cv', model=myglm,
predict=myfacpred, est.para=control.errorest(list.tindx = cc))
ret <- data.frame(cbind(ee, me))
return(ret)
}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html