Radford Neal wrote in news:[EMAIL PROTECTED]: >>Cornilia wrote in >>news:[EMAIL PROTECTED]: >> >>> I have a training data set, and I want to obtain the LOOCV error >>> rate for a linear regression model. How can I implement this in R or >>> S-Plus? I can use for loop and fit linear models n times, with one >>> row out each time. My main problem is that I don't know how to leave >>> one row out of my data set in lm function within the for loop. >>> >>> It might look like: >>> for (i in 1:n) { >>> fitcv<-lm(y ~ V1+V2+V3+V4+V5+V6+V7+V8+V9,data=train, > > Just using data=train[-i,] ought to work. I don't know how efficient > it is. > > In article <[EMAIL PROTECTED]>, > David Winsemius <[EMAIL PROTECTED]> wrote: > >>Not sure what your acronym means, but it sounds as though you are >>doing a jack-knife analysis. Why not do a real bootstrap analysis? If >>you are already using R, it should not be difficult to find the boot >>package. I think it is in the default 1.8.1 distribution. You would >>bring it into the workspace with library("boot") > > I've encountered suggestions to use bootstrap in circumstances such as > this before, but I've never understood them. The bootstrap samples > will clearly violate the assumption of independent residuals that > underlies the usual regression model.
Why should the validation method leave out exactly one instance during each validation run? My reading indicates that LOOCV performs relatively poorly in comparisons with K-fold CV or bootstrap methods. Although the LOOCV estimates of prediction error are unbiased, they are plagued by higher variance than competing methods. Users of that method will be giving up efficiency. The method proposed by the original questioner still looks like a jackknife, rather than what I now understand LOOCV to mean after searching the web. The model is fixed and there is no way for model misspecification to be identified. At any rate LOOCV, as well as k-fold CV, can be implemented in the boot package for R that I offered the questioner: http://www.math.mcgill.ca/sysdocs/R/library/boot/html/cv.glm.html Harrell's Design library also has a validate.lrm function, whose default is bootstrap, but can also be set for cross-validation. > The bootstrap samples will also > have less diverse values for the predictor variables. So it seems to > me that the bootstrap results will NOT be a good guide to what is > going on with the actual sample. As I understand statistics, the goal is making some plausible statements about what is likely in the world *outside* the sample. My understanding is that bootstrap methods use the joint distribution of measured features of the sample to create a plausible larger (neo-sampled) world. It seems to be a realization of the concept of exchangeability. > The poster's use of leave-one-out cross validation seems more sensible > to me. Each person must determine what "makes sense". I have been relying on results of the simulation tests in Efron and Gong and in Efron and Tibshirani. You can certainly make your choice on the basis of theory. Given your far greater authority in this arena, I may learn quite a bit from your response. -- David Winsemius . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
