I apologize for posting on this question again, but unfortunately, I don't have and can't get access to MASS for at least three weeks. I have found some code on the web however which implements the prediction error algorithm in cv.glm.
http://www.bioconductor.org/workshops/NGFN03/modelsel-exercise.pdf Now I've tried to adapt it to my purposes, but since I'm not deeply familiar with R programming, I don't know why it doesn't work. Now checking the r-help list faq it seems this is an appropriate question. I've included my attempted function below. The error I get is: logcv(basp.data, form, 'basp', 'recordyear') Error in order(na.last, decreasing, ...) : Argument 1 is not a vector My questions are, why doesn't this work, and how do I fix it. I'm using the formula function to create the formula that I'm sending to my function. And the mdata is a data.frame. I'm assumed that if I passed the column names as strings (response variable - rvar, fold variable - fvar) this would work. Apparently however it doesn't. Lastly, since I don't have access to MASS and there are apparently many examples of doing this kind of thing in MASS, could someone tell me if this function looks approximately correct? Thanks T ------------------------------------------------ logcv <- function(mdata, formula, rvar, fvar) { require(Hmisc) # sort by fold variable sorted <- mdata[order(mdata$fvar), ] # get fold values and count for each group vardesc <- describe(sorted$fvar)$values fvarlist <- as.integer(dimnames(vardesc)[[2]]) k <- length(fvarlist) countlist <- vardesc[1,1] for (i in 2:k) { countlist[i] <- vardesc[1,i] } n <- length(sorted$fvar) # fit to all the mdata fit.all <- glm(formula, sorted, family=binomial) pred.all <- ifelse( predict(fit.all, type="response") < 0.5, 0, 1) #setup pred.c <- list() error.i <- vector(length=k) for (i in 1:k) { fit.i <- glm(formula, subset(sorted, sorted$fvar != fvarlist[i]), family=binomial) pred.i <- ifelse(predict(fit.i, newdata=subset(sorted, sorted$fvar == fvarlist[i]), type="response") < 0.5, 0, 1) pred.c[[i]] = pred.i pred.all.i <- ifelse(predict(fit.i, newdata=sorted, type="response") < 0.5, 0, 1) error.i[i] <- sum(sorted$rvar != pred.all.i)/n } pred.cc <- unlist(pred.c) delta.cv.k <- sum(sorted$rvar != pred.cc)/n p.k <- countlist/n delta.app <- mean(sorted$rvar != pred.all)/n delta.acv.k <- delta.cv.k + delta.app - sum(p.k*error.i) print(delta.acv.k) } -- Trevor Wiens [EMAIL PROTECTED] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
