Torsten Hothorn writes: > as long as one does not use the information in the response (the class > variable, in this case) I don't think that one ends up with an > optimistically biased estimate of the error
I would be a little careful, though. The left-out sample in the LDA-cross-validation, will still have influenced the PCA used to build the LDA on the rest of the samples. The sample will have a tendency to lie closer to the centre of the "complete" PCA than of a PCA on the remaining samples. Also, if the sample has a high leverage on the PCA, the directions of the two PCAs can be quite different. Thus, the LDA is built on data that "fits" better to the left-out sample than if the sample was a completely new sample. I have no proofs or numerical studies showing that this gives over-optimistic error rates, but I would not recommend placing the PCA "outside" the cross-validation. (The same for any resampling-based validation.) -- Bjørn-Helge Mevik ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html