Re: [R-sig-Geo] spatial autocorrelation in GAM residuals for large data set

Roger Bivand Tue, 20 Aug 2019 13:44:01 -0700

On Tue, 20 Aug 2019, Elizabeth Webb wrote:

Hello,
I have a large data set (~100k rows) containing observations at points(MODIS pixels) across the northern hemisphere. I have created a GAMusing the bam command in mgcv and I would like to check the modelresiduals for spatial autocorrelation.
One idea is to use the DHARMa package(https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#spatial-autocorrelation).The code looks something like this:
    simulationOutput  <-   simulateResiduals(fittedModel = mymodel) # point at 
which R runs into memory problems
    testSpatialAutocorrelation(simulationOutput = simulationOutput, x =  
data$latitude, y= data$longitude)

However, this runs into memory problems.
Another idea is to use the following code, after this tutorial(http://www.flutterbys.com.au/stats/tut/tut8.4a.html):
    library(ape)
    library(fields)
    coords = cbind(data$longitude, data$latitude)     
   w = rdist(coords)  # point at which R runs into memory problems
    Moran.I(x = residuals(mymodel), w = w)
But this also runs into memory problems. I have tried increasing theamount of memory allotted to R, but that just means R works for longerbefore timing out.


I do hope that you read

https://cran.r-project.org/web/packages/ape/vignettes/MoranI.pdf

first, because the approach used in ape has been revised.

The main problem is that ape uses by default a square matrix, and it isuncertain whether sparse matrices are accepted. This means that completelyunneeded computations are carried out - dense matrices should never beused unless there is a convincing scientific argument (seehttps://edzer.github.io/UseR2019/part2.html#exercise-review-1 for adevelopment on why distances are wasteful when edge counts on a graph dothe same thing sparsely).

Use one of the approaches described in the tutorial and you may be OK, butyou should not trust the outcome of Moran's I on residuals without usingan appropriate variant. Say you can represent your GAM with a linear modelwith say spline terms, you can use Moran's I for regression residuals.Take care that the average number of neighbours is very small (6-10), andlarge numbers of observations should not be a problem.

A larger problem is that Moran's I (also for residuals) also responds toother mis-specifications than spatial autocorrelation, in particularmissing variables and spatial processes with a different scale from theunits of observation chosen.

So, two questions: (1) Is there a memory efficient way to check forspatial autocorrelation using Moran's I in large data sets? or (2) Isthere another way to check for spatial autocorrelation (besides Moran'sI) that won't have such memory problems?


1) Yes, see above, do not use dense matrices

2) Consider a higher level MRF term in your GAM for aggregates of yourobservations if such aggregation makes sense for your data.


Hope this clarifies,

Roger


Thanks in advance,

Elizabeth








_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] spatial autocorrelation in GAM residuals for large data set

Reply via email to