Re: [R-sig-Geo] cross-validation

Marta Rufino Thu, 07 Feb 2008 03:59:44 -0800

Great!

This works wonderfully...maybe would be nice if you add it to the 
example in the help page :-)

Further comments in /CV/... from the gstat.cv output, which 
cross-validation measures should be considered when establishing the 
performance of kriging, in relation to other methods, for example or to 
compare among kriging modelling options?

I.
http://www.ic.arizona.edu/ic/math574/class_notes/meuse%20zinc%20vs%20logzinc%20using%20gstat.pdf
"Cross validation produces multiple statistics,
making changes to improve one statistic may make another worse. The key 
statistics are (1)
the mean error (expected value is zero), (2) mean square normalized 
error (in gstat and
several other software packages, the normalized error is called the 
"zscore"). The expected
value of this statistic is one. (3) Ideally the correlation between the 
predicted value and the
observed value is one, however when using Ordinary kriging or Universal 
kriging, the
maximum correlation is somewhat less (because of the Lagrange 
multipliers). (4) Ideally the
correlation between the predicted value and the error (residual) is zero 
but again the
Lagrange multipliers have an effect. (5) Using Chebyshev's Inequality we 
can bound the"

Thus, they state that the important checks to do with the CV, is ():
1) mean error (should ~0)
2) Mean squared normalized error (should ~1)
3) correlation between observed and predicted (should ~1)
4) correlation between predicted and residuals (should ~0)

out = gstat.cv(object=meuse.fit, nmax=40, nfold=5)
cv.results=function(x){ #function to organize CV results. x is the 
result of kriging Crossvalidation
                    print(bubble(x, c("residual"), main = "leave-one-out 
CV residuals"))
                    x=data.frame(x)
                    kk=data.frame(                         
                        ME=c(0,mean(x$res)),
                        MSNE=c(1,sqrt(sum(x$zscore^2)/length(x$res))),
                        cor1=c(1,cor(x$obs, x[,1])) ,
                        cor2=c(0,cor(x[,1], x$res)))
                        row.names(kk)=c("expected", "estimated")
                        return(kk)                        
                    }                                

cv.results(out)

II
On another web ref. (http://www.itc.nl/personal/rossiter/teach/R/R_ck.pdf):
"Diagnostic measures are the ME (bias), RMSE (precision), and Mean Squared
Deviation Ratio (MSDR) of the residuals to the prediction errors; this
should be 1 because the residuals from cross-validation should equal
to prediction errors at each point that was held out. For the univariate
case (OK) we can use the krige.cv cross-validation method of the gstat
package."
Which is calculated by:

mean(out$res) # mean error
sqrt(mean(out$res^2)) # RMSE
mean(out$res^2/as.data.frame(out)[,2]) # MSDR, should be ~1 (it is 
equivalent to predicted vs. sampled

So, I consulted several text books about the subject, and there was none 
providing a clear answer about the important measures to take int 
account and which sort of cross-validation performs better (although I 
heard that it is 10-fold CV). Does anyone has further information about 
this subject?

Which measures perform best?
Does anyone has good references about the subject?

Best wishes,
Marta

Edzer Pebesma wrote:
> That was not the problem, the problem was that you used meuse.g 
> instead of meuse.fit to pass on to gstat.cv. For meuse.g, you have 
> perfect correlation between Cu and Zn, so that collocated observations 
> (meaning a Zn and a Cu observation at each obs location) act as a 
> duplicate in univarite kriging.
>
> Try:
>
> out = gstat.cv(object=meuse.fit, nmax=40, verbose=F, nfold=5)
> -- 
> Edzer
>
> Paul Hiemstra wrote:
>> Hi,
>>
>> You should check if you have duplicate observations, duplicate 
>> observations lead to a singular matrix. Use the function zerodist() 
>> to check where the observations are and remove.duplicates() to remove 
>> them.
>>
>> cheers,
>> Paul
>>
>> Marta Rufino schreef:
>>> Hello,
>>>
>>> yes, I know it is suppose to do it, but I could not find how, 
>>> because it gives me an error... for example:
>>>
>>> require(gstat); require(lattice)
>>> data(meuse)
>>> coordinates(meuse) = ~x + y
>>> data(meuse.grid)
>>> gridded(meuse.grid) = ~x + y
>>>
>>> meuse.g <- gstat(id = "zn", formula = log(zinc) ~ 1, data = meuse)
>>> meuse.g <- gstat(meuse.g, "cu", log(copper) ~ 1, meuse)
>>>
>>> meuse.g <- gstat(meuse.g, model = vgm(1, "Sph", 900, 1), fill.all = T)
>>> x <- variogram(meuse.g, cutoff = 1000)
>>> meuse.fit = fit.lmc(x, meuse.g)
>>> plot(x, model = meuse.fit)
>>> z <- predict(meuse.fit, newdata = meuse.grid)
>>> spplot(z) #map
>>> gstat.cv(meuse.g) #does not work...
>>> gstat.cv(meuse.g, remove.all=T) #either
>>> gstat.cv(meuse.g, all.residuals=T) #either
>>> gstat.cv(object=meuse.g, formula = log(zinc) ~ 1, data = meuse, 
>>> model = vgm(1, "Sph", 900, 1), nmax=40, verbose=F) #either :-(
>>>
>>> #                             # Intrinsic Correlation found. Good.
>>> # [using ordinary cokriging]
>>>
>>> # "chfactor.c", line 130: singular matrix in function LDLfactor()
>>> # Error in predict.gstat(object, newdata = data[sel, ], ...) :
>>> #         LDLfactor
>>>
>>> Maybe an example on the help file would be nice (eheheh).. I
>>> What am I missing?
>>>
>>>
>>> Thank you very much in advance,
>>> Marta
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo@stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>   
>>
>>
>
>
>

-- 
.......................................................................
Marta M. Rufino (PhD)

.....
Instituto Nacional de Investigação Agrária e das Pescas (INIAP/IPIMAR),
Centro Regional de Investigação Pesqueira do Sul (CRIPSul)
Avenida 5 de Outubro s/n
P-8700-305 Olhão, Portugal
+351 289 700 541

..... 
Institut de Ciències del Mar - CMIMA (CSIC)
Passeig Marítim de la Barceloneta, 37-49      
08003 BARCELONA - Catalunya
Spain 

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] cross-validation

Reply via email to