Dear all, Im my opinion it makes sense to use repeated k-fold cross validation. The distribution of the statistics yields their confidence intervals.
I will try that during the next few months on a dataset with about 2500 data points. The current plan is to repeat 1000 times a 10-fold cross validation. Or is k = 10 to small? But maybe I will have to downsize this if it requires too much computing time. The variogram re-estimation is something I had on my mind. I'll send Edzer the code if I manage to get it working. Cheers, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-geo-boun...@stat.math.ethz.ch [mailto:r-sig-geo-boun...@stat.math.ethz.ch] Namens Edzer Pebesma Verzonden: maandag 23 februari 2009 20:32 Aan: dde...@sciborg.uwaterloo.ca CC: r-sig-geo@stat.math.ethz.ch Onderwerp: Re: [R-sig-Geo] cross validation gstat dde...@sciborg.uwaterloo.ca wrote: > Hi list, > A quick question regarding n-fold validation... > I've seen several papers suggest the LOOCV is too optimistic. Is > n-fold closer to a "true" validation? I don't think "true" validation exists; could you explain what it is? If you mean having a completely independent set of observations not involved in forming the predictions, then there are two issues, (i) how to form this set from the total set: how to select, how large should it be? (ii) you're simply forming validation statistics without using all the information you could use. In the book by Hastie, Tibshiranie and Friedman (statistical learning) it is argued (in the context of regression models) that LOOCV often results in many models that are almost identical, whereas n-fold with low n results in somewhat more different models. I don't recall they came with a statistical/theoretical argument why this difference among models was a good thing. One of the issues is that with n-fold using random folds (as gstat does), that the result varies if you repeat the procedure--obviously, but also a bit of a gamble, then. Which one to pick? Look at distributions of CV statistics? I think when you look at CV statistics, you need to question why you do it; often it is because you want to find out how well the model performs in a predictive setting. In that case things like predicting locations very close to measurements is often something that is not possible to CV at all when data are collected somewhat regular in space. > I am assuming that it uses the variogram that is constructed using ALL > data, so my assumption is that the variogram is not re-fit for each > n-fold before estimation... > That is correct. Please submit me code with variogram re-estimation when you have it. ;-) -- Edzer Pebesma Institute for Geoinformatics (ifgi), University of Münster Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/ http://www.springer.com/978-0-387-78170-9 e.pebe...@wwu.de _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo