Thank you, Roger for your help. A quick follow-up: What do you mean when you say "Use one of the approaches described in the tutorial and you may be OK, but you should not trust the outcome of Moran's I on residuals without using an appropriate variant." ? Or in other words, what is an appropriate variant in this context?
Elizabeth _______________________________________From: Roger Bivand <roger.biv...@nhh.no> Sent: Tuesday, August 20, 2019 4:43 PM To: Elizabeth Webb Cc: r-sig-geo@r-project.org Subject: Re: [R-sig-Geo] spatial autocorrelation in GAM residuals for large data set On Tue, 20 Aug 2019, Elizabeth Webb wrote: > Hello, > > I have a large data set (~100k rows) containing observations at points > (MODIS pixels) across the northern hemisphere. I have created a GAM > using the bam command in mgcv and I would like to check the model > residuals for spatial autocorrelation. > > One idea is to use the DHARMa package > (https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_DHARMa_vignettes_DHARMa.html-23spatial-2Dautocorrelation&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=lD9g96_TN9t-znQvV1M9V8CH2tgKAYHWKcTS8osjBSc&e= > ). > The code looks something like this: > > simulationOutput <- simulateResiduals(fittedModel = mymodel) # point > at which R runs into memory problems > testSpatialAutocorrelation(simulationOutput = simulationOutput, x = > data$latitude, y= data$longitude) > > However, this runs into memory problems. > > Another idea is to use the following code, after this tutorial > (https://urldefense.proofpoint.com/v2/url?u=http-3A__www.flutterbys.com.au_stats_tut_tut8.4a.html&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=v9Op69bwvOVRi1ujar_P8-LHsJFDAN_aE25i1_m22U4&e= > ): > library(ape) > library(fields) > coords = cbind(data$longitude, data$latitude) > w = rdist(coords) # point at which R runs into memory problems > Moran.I(x = residuals(mymodel), w = w) > > But this also runs into memory problems. I have tried increasing the > amount of memory allotted to R, but that just means R works for longer > before timing out. I do hope that you read https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_ape_vignettes_MoranI.pdf&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=tp6IoNz-8y-MBnLZldMpb3wUT_faHdoGMFszkZQxYBU&e= first, because the approach used in ape has been revised. The main problem is that ape uses by default a square matrix, and it is uncertain whether sparse matrices are accepted. This means that completely unneeded computations are carried out - dense matrices should never be used unless there is a convincing scientific argument (see https://urldefense.proofpoint.com/v2/url?u=https-3A__edzer.github.io_UseR2019_part2.html-23exercise-2Dreview-2D1&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=PIyJQgsz9qD81VCZsyfJQGdO-Gh6iJNgF2xH9jATbhI&e= for a development on why distances are wasteful when edge counts on a graph do the same thing sparsely). Use one of the approaches described in the tutorial and you may be OK, but you should not trust the outcome of Moran's I on residuals without using an appropriate variant. Say you can represent your GAM with a linear model with say spline terms, you can use Moran's I for regression residuals. Take care that the average number of neighbours is very small (6-10), and large numbers of observations should not be a problem. A larger problem is that Moran's I (also for residuals) also responds to other mis-specifications than spatial autocorrelation, in particular missing variables and spatial processes with a different scale from the units of observation chosen. > > So, two questions: (1) Is there a memory efficient way to check for > spatial autocorrelation using Moran's I in large data sets? or (2) Is > there another way to check for spatial autocorrelation (besides Moran's > I) that won't have such memory problems? 1) Yes, see above, do not use dense matrices 2) Consider a higher level MRF term in your GAM for aggregates of your observations if such aggregation makes sense for your data. Hope this clarifies, Roger > > Thanks in advance, > > Elizabeth > > > > > > > > > _______________________________________________ > R-sig-Geo mailing list > R-sig-Geo@r-project.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dsig-2Dgeo&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=qq-Vo4xAxTCARWawQQYjCYvxm_Hg_iYYW1_n-Xphyg4&e= > -- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no https://urldefense.proofpoint.com/v2/url?u=https-3A__orcid.org_0000-2D0003-2D2392-2D6140&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=nyvB8TRse_NA-lALeTG3k_KOIVVzaNLMuDAqPo3dyGI&e= https://urldefense.proofpoint.com/v2/url?u=https-3A__scholar.google.no_citations-3Fuser-3DAWeghB0AAAAJ-26hl-3Den&d=DwIFAw&c=sJ6xIWYx-zLMB3EPkvcnVg&r=fIXSZTeOvV9o221vPRYLSw&m=IMILOSwwjZJkpsYiuj66ywrnluSI19Bn8ozD5p-NZks&s=UBVoMNMtGxwGxOGDcIJHGAuFm8gqb8X3kqNkGpPVKS4&e= [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo