Excellent! It was exactly this what I meant. I also agree that it is not needed a function to calculate such simple stuff, but certainly would be desirable (and effective) to have it in the help file from krige.cv (to clarify the less statistically minded like myself :-)) I totally agree with 'desirable', instead of expected ;-)
About the 10-fold issue (thank you for the reference. It was exactly what I have heard about), maybe it would nice if the default was to do 10-fold, because it is also substantially quicker to estimate. hihihi In relation to the use of CV, I found an article that brought my attention to the subject again, where they compared the the mean estimated by kriging and sample mean through cross-validation, as a way to show how kriging is improving the estimates. (Sales MH, Souza JCM, Kyriakidis PC, Roberts DA, Vidal E (2007) Improving spatial distribution estimation of forest biomass with geostatistics: A case study for Rondonia, Brazil. Ecol Model 205:221-230). Thus this is one of the issues (for many users, I believe...) how to quantify/prove that kriging is actually doing better than other methods? Does any one has any further alternatives? Best wishes and thank you very much once again, Marta Edzer Pebesma wrote: > Marta Rufino wrote: >> Great! > Hi Marta, > > I now have in the example section of gstat.cv help: > > # multivariable; thanks to M. Rufino: > meuse.g <- gstat(id = "zn", formula = log(zinc) ~ 1, data = meuse) > meuse.g <- gstat(meuse.g, "cu", log(copper) ~ 1, meuse) > meuse.g <- gstat(meuse.g, model = vgm(1, "Sph", 900, 1), fill.all = T) > x <- variogram(meuse.g, cutoff = 1000) > meuse.fit = fit.lmc(x, meuse.g) > out = gstat.cv(meuse.fit, nmax = 40, nfold = 5) > summary(out) > # mean error, ideally 0: > mean(out$residual) > # MSPE, ideally small > mean(out$residual^2) > # Mean square normalized error, ideally close to 1 > mean(out$zscore^2) > # correlation observed and predicted, ideally 1 > cor(out$observed, out$observed - out$residual) > # correlation predicted and residual, ideally 0 > cor(out$observed - out$residual, out$residual) > > I'm a bit hesitant to write a dedicated function for such simple > calculations. In your function below, I object to the use of > "expected", where you mean "desired". It's definitely not expected in > the statistical sense. > > The difficult questions about the cross validation results in > geostatistics are usually "can we attribute the difference of a MSNE > of 1.1 from the idealized value of 1 attribute to sampling error, or > is it an indication of a misfitting model?" In the latter case: > "should we worry?" (only if kriging errors are of importance, many > people ignore them). > > Also, if the data are nicely spread, say the shortest distance is 100m > (stay with the example above), then CV is never going to tell anything > about the variogram for distances up to 100m. > > Leave-on-out vs. n-fold? I believe Hastie, Tibshirani and Friedman > promote the 10-fold. The idea was that leave-one-out may be too > optimistic and not reveal model error, as refitted models are almost > identical for moderately sized or large data sets, in each fold. > Leave-one-out will have smaller RMSE than say 5-fold, but this is not > an indication of a better model nor of a substantially better > procedure, IMO. > > Best wishes, > -- > Edzer > >> >> This works wonderfully...maybe would be nice if you add it to the >> example in the help page :-) >> >> Further comments in /CV/... from the gstat.cv output, which >> cross-validation measures should be considered when establishing the >> performance of kriging, in relation to other methods, for example or >> to compare among kriging modelling options? >> >> I. >> http://www.ic.arizona.edu/ic/math574/class_notes/meuse%20zinc%20vs%20logzinc%20using%20gstat.pdf >> >> >> "Cross validation produces multiple statistics, >> making changes to improve one statistic may make another worse. The >> key statistics are (1) >> the mean error (expected value is zero), (2) mean square normalized >> error (in gstat and >> several other software packages, the normalized error is called the >> “zscore”). The expected >> value of this statistic is one. (3) Ideally the correlation between >> the predicted value and the >> observed value is one, however when using Ordinary kriging or >> Universal kriging, the >> maximum correlation is somewhat less (because of the Lagrange >> multipliers). (4) Ideally the >> correlation between the predicted value and the error (residual) is >> zero but again the >> Lagrange multipliers have an effect. (5) Using Chebyshev’s Inequality >> we can bound the" >> >> Thus, they state that the important checks to do with the CV, is (): >> 1) mean error (should ~0) >> 2) Mean squared normalized error (should ~1) >> 3) correlation between observed and predicted (should ~1) >> 4) correlation between predicted and residuals (should ~0) >> >> out = gstat.cv(object=meuse.fit, nmax=40, nfold=5) >> cv.results=function(x){ #function to organize CV results. x is the >> result of kriging Crossvalidation >> print(bubble(x, c("residual"), main = "leave-one-out CV residuals")) >> x=data.frame(x) >> kk=data.frame( >> ME=c(0,mean(x$res)), >> MSNE=c(1,sqrt(sum(x$zscore^2)/length(x$res))), >> cor1=c(1,cor(x$obs, x[,1])) , >> cor2=c(0,cor(x[,1], x$res))) >> row.names(kk)=c("expected", "estimated") >> return(kk) >> } >> >> cv.results(out) >> >> >> II >> On another web ref. >> (http://www.itc.nl/personal/rossiter/teach/R/R_ck.pdf): >> "Diagnostic measures are the ME (bias), RMSE (precision), and Mean >> Squared >> Deviation Ratio (MSDR) of the residuals to the prediction errors; this >> should be 1 because the residuals from cross-validation should equal >> to prediction errors at each point that was held out. For the univariate >> case (OK) we can use the krige.cv cross-validation method of the gstat >> package." >> Which is calculated by: >> >> mean(out$res) # mean error >> sqrt(mean(out$res^2)) # RMSE >> mean(out$res^2/as.data.frame(out)[,2]) # MSDR, should be ~1 (it is >> equivalent to predicted vs. sampled >> >> >> So, I consulted several text books about the subject, and there was >> none providing a clear answer about the important measures to take >> int account and which sort of cross-validation performs better >> (although I heard that it is 10-fold CV). Does anyone has further >> information about this subject? >> >> Which measures perform best? >> Does anyone has good references about the subject? >> >> Best wishes, >> Marta >> >> >> Edzer Pebesma wrote: >>> That was not the problem, the problem was that you used meuse.g >>> instead of meuse.fit to pass on to gstat.cv. For meuse.g, you have >>> perfect correlation between Cu and Zn, so that collocated >>> observations (meaning a Zn and a Cu observation at each obs >>> location) act as a duplicate in univarite kriging. >>> >>> Try: >>> >>> out = gstat.cv(object=meuse.fit, nmax=40, verbose=F, nfold=5) >>> -- >>> Edzer >>> >>> Paul Hiemstra wrote: >>>> Hi, >>>> >>>> You should check if you have duplicate observations, duplicate >>>> observations lead to a singular matrix. Use the function zerodist() >>>> to check where the observations are and remove.duplicates() to >>>> remove them. >>>> >>>> cheers, >>>> Paul >>>> >>>> Marta Rufino schreef: >>>>> Hello, >>>>> >>>>> yes, I know it is suppose to do it, but I could not find how, >>>>> because it gives me an error... for example: >>>>> >>>>> require(gstat); require(lattice) >>>>> data(meuse) >>>>> coordinates(meuse) = ~x + y >>>>> data(meuse.grid) >>>>> gridded(meuse.grid) = ~x + y >>>>> >>>>> meuse.g <- gstat(id = "zn", formula = log(zinc) ~ 1, data = meuse) >>>>> meuse.g <- gstat(meuse.g, "cu", log(copper) ~ 1, meuse) >>>>> >>>>> meuse.g <- gstat(meuse.g, model = vgm(1, "Sph", 900, 1), fill.all >>>>> = T) >>>>> x <- variogram(meuse.g, cutoff = 1000) >>>>> meuse.fit = fit.lmc(x, meuse.g) >>>>> plot(x, model = meuse.fit) >>>>> z <- predict(meuse.fit, newdata = meuse.grid) >>>>> spplot(z) #map >>>>> gstat.cv(meuse.g) #does not work... >>>>> gstat.cv(meuse.g, remove.all=T) #either >>>>> gstat.cv(meuse.g, all.residuals=T) #either >>>>> gstat.cv(object=meuse.g, formula = log(zinc) ~ 1, data = meuse, >>>>> model = vgm(1, "Sph", 900, 1), nmax=40, verbose=F) #either :-( >>>>> >>>>> # # Intrinsic Correlation found. Good. >>>>> # [using ordinary cokriging] >>>>> >>>>> # "chfactor.c", line 130: singular matrix in function LDLfactor() >>>>> # Error in predict.gstat(object, newdata = data[sel, ], ...) : >>>>> # LDLfactor >>>>> >>>>> Maybe an example on the help file would be nice (eheheh).. I >>>>> What am I missing? >>>>> >>>>> >>>>> Thank you very much in advance, >>>>> Marta >>>>> >>>>> _______________________________________________ >>>>> R-sig-Geo mailing list >>>>> R-sig-Geo@stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>>> >>>> >>> >>> >>> >> >> -- >> ....................................................................... >> Marta M. Rufino (PhD) >> >> ..... >> Instituto Nacional de Investigação Agrária e das Pescas (INIAP/IPIMAR), >> Centro Regional de Investigação Pesqueira do Sul (CRIPSul) >> Avenida 5 de Outubro s/n >> P-8700-305 Olhão, Portugal >> +351 289 700 541 >> >> ..... Institut de Ciències del Mar - CMIMA (CSIC) >> Passeig Marítim de la Barceloneta, 37-49 08003 BARCELONA - Catalunya >> Spain >> >> >> > > > -- ....................................................................... Marta M. Rufino (PhD) ..... Instituto Nacional de Investigação Agrária e das Pescas (INIAP/IPIMAR), Centro Regional de Investigação Pesqueira do Sul (CRIPSul) Avenida 5 de Outubro s/n P-8700-305 Olhão, Portugal +351 289 700 541 ..... Institut de Ciències del Mar - CMIMA (CSIC) Passeig Marítim de la Barceloneta, 37-49 08003 BARCELONA - Catalunya Spain _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo