In an off-list description of the motivation for your question, you wrote: The motivation is to test for spatial autocorrelation in residuals from a fitted linear model on a plant breeding multi-environment trials (MET) dataset. Generally, there are thousands of testing plots, hundreds of filed locations, and less than 20 regions (represented by multipolygons). One region has multiple field locations, and one field location has multiple testing plots. The objective is to use testing plots as datapoints to run a linear model, then check whether the residuals have spatial autocorrelations.
The following doesn't answer your practical question, but I think that your sample design is intended in itself to remove both the risk of omitted covariates as a source of spatial autocorrelation, and any residual spatial autocorrelation when using linear mixed models with individual or group random effects. I am sure that in a mixed effects model setting, using Moran's I will not give reliable guidance, either ignoring covariates (moran.test) or including covariates in a linear model (lm.morantest). You should rather attempt to fit models with spatially structured random effects, and pick up any (grouped) residual autocorrelation in the model. The absence of important contributions from such SSRE would suggest that the model as fitted did not show residual autocorrelation. See for example model fitting functions in the mgcv and hglm packages for examples. In those cases, you also need to formalise the group membership structures. I'm thinking that each testing plot is sown with the same variety, but that plots within a field might exhibit spillover, but that fields in a region are distant from each other. You could also consult the spmodel package. Hope this helps, Roger -- Roger Bivand Emeritus Professor Norwegian School of Economics Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway [email protected] ________________________________________ From: Hongyun Wang <[email protected]> Sent: 21 December 2024 05:03 To: Roger Bivand; [email protected] Subject: Re: Spatial autocorrelation test in a dataset with multiple observations per region Prof. Bivand, Thank you for your response. I changed data source and data manipulation methods and copied only codes to this email. I also checked that the modified columbus is still a sf/data.frame object. My motivation using modified columbus dataset that has multiple observations per region/polygon is to find correct ways to complete following two tasks: 1. Creation of neighbors list and weights list. 2. Moran’s I test for spatial autocorrelation for: 1) a primary variable, 2) residuals from an estimated linear model. I looked at nb2blocknb function’s details and examples and thought this function can be used to create blocked up neighbors list in my case, because this modified dataset is like the post codes example described in function’s details. On the other hand, as stated in the documentation: “observations belonging to each unique location are observation neighbours”, and observation ids will be added to the neighbors list as attribute region.id. I think modifying the neighbors lists and weights list is not necessary, because two observations within a unique location can be “observation neighbours”. In the codes below, I also compared two weights list objects: listw.block and listw0. They are not identical, but they have the same neighbors list and weights list. And when they were used in Moran’s I tests, the tests returned same z-scores/p-values. Is listw.block object created correctly? Can listw0 replace listw.block? library(spdep) # read dataset and modify it: only 7 regions, variable observations per region. columbus <- st_read(system.file("shapes/columbus.gpkg", package="spData")[1], quiet=TRUE) columbus$regionId <- rep(as.character(1:7), 4:10) columbus$geom <- rep(columbus$geom[1:7], 4:10) # check class of object columbus class(columbus) columbus_ <- st_make_valid(st_as_sf(columbus, wkt = "geom")) class(columbus_) identical(columbus, columbus_) # create neighbor list using 7 unique regions/polygons nb7 <- poly2nb(as(unique(columbus[, "geom"]), "Spatial")) # block up neighbor list for location-less observations if (all(sort(as.character(attr(nb7, "region.id"))) == sort(unique(as.character(columbus$regionId))))) { block.nb <- nb2blocknb(nb7, columbus$regionId) } else { stop("ID tags don't match!") } # create spatial weights for neighbor list (listw.block <- nb2listw(block.nb, zero.policy=T, style="B")) # check number of neighbors of each observation card(listw.block$neighbours) # the neighbors and weights of the first observation card(listw.block$neighbours)[[1]] listw.block$neighbours[[1]] listw.block$weights[[1]] # Moran's I test for spatial autocorrelation in a primary variable CRIME (m.test <- moran.test(columbus$CRIME, listw.block, randomisation=FALSE, alternative="greater")) # Moran's I test for spatial autocorrelation in residuals from an estimated linear model crime.lm <- lm(CRIME ~ HOVAL + INC, data = columbus) (m.test_ <- lm.morantest(crime.lm, listw.block, alternative="greater")) ######################################################################################## # create neighbor list directly from full columbus dataset nblist <- poly2nb(as(columbus, "Spatial")) listw0 <- nb2listw(nblist, zero.policy=T, style="B") # Moran's I test for spatial autocorrelation in a primary variable CRIME (m.test <- moran.test(columbus$CRIME, listw0, randomisation=FALSE, alternative="greater")) # Moran's I test for spatial autocorrelation in residuals from an estimated linear model crime.lm <- lm(CRIME ~ HOVAL + INC, data = columbus) (m.test_ <- lm.morantest(crime.lm, listw0, alternative="greater")) # list.block and list0 have same neighbors and weights, although they are not identical all(card(listw0$neighbours) == card(listw.block$neighbours)) do.call(all, lapply(seq(length(listw0$neighbours)), function(x) all(listw0$neighbours[[x]] == listw.block$neighbours[[x]]))) do.call(all, lapply(seq(length(listw0$neighbours)), function(x) all(listw0$weights[[x]] == listw.block$weights[[x]]))) identical(listw.block, listw0) Thank you for your help. Hongyun From: Roger Bivand <[email protected]> Date: Thursday, December 19, 2024 at 3:22 AM To: Hongyun Wang <[email protected]>, [email protected] <[email protected]> Subject: Re: Spatial autocorrelation test in a dataset with multiple observations per region [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] This is not easy to disambiguate. First, you have copied from the output to your email, so that replicating your code is contorted - provide the code, either instead of, or in addition to the output. Second, shapes/columbus.shp is deprecated, soon to be defunct, use shapes/columbus.gpkg. Third, you use pipes and dplyr, so that examining intermediate objects is not possible; I would be most grateful if you avoided dplyr::mutate, as sf objects are spatial, not general. I do not think that you have created the object you describe. I also fail to grasp your motivation, do you have multiple observations with unknown position within larger enclosing polygons, as in ?nb2blocknb? Roger -- Roger Bivand Emeritus Professor Norwegian School of Economics Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway [email protected] ________________________________________ From: R-sig-Geo <[email protected]> on behalf of Hongyun Wang <[email protected]> Sent: 18 December 2024 23:26 To: [email protected] Subject: [R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hello Everyone, I have one question on the spatial autoregression tests on primary variables or residuals of linear regression models of a dataset that has multiple observations per region. Here I use columbus dataset and moran.test() on a primary variable CRIME as illustration. I modified the dataset to have only 7 regions, not 49 regions in original dataset, and each region has 7 observations. I used poly2nb() and nb2listw() to create neighbors and weights list of each region. But after examining the list carefully, I found that two observations within the same region have weight 1. This looks not right. I think poly2nb function treats each observation (a row in the dataset) as a region. Then I modified the neighbors and weights list to have weight 0 for two observations that are in the same region. The moran.test results from original listw object (listw0) and modified listw object (listw) have different Z-scores and p-values. The which moran.test is correct, or are both incorrect? > library(dplyr) > library(spdep) > > # read dataset and modify it: only 7 regions, 7 observations per region. > columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1], > quiet=TRUE) %>% + mutate(regionId = rep(LETTERS[1:7], each = 7), + geometry = rep(.$geometry[1:7], each = 7)) > columbus %>% as.data.frame() %>% select(regionId,geometry) %>% unique() %>% > dim() [1] 7 2 > > # neighbor list and weights > nblist <- poly2nb(as(columbus, "Spatial")) > listw0 <- listw <- nb2listw(nblist, zero.policy=T, style="B") > > # 1.Moran I test on primary variable CRIME using original weights list > (m.test <- moran.test(columbus$CRIME, listw0, + randomisation=FALSE, alternative="greater")) Moran I test under normality data: columbus$CRIME weights: listw0 Moran I statistic standard deviate = 2.6121, p-value = 0.004499 alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance 0.0497371251 -0.0208333333 0.0007298841 > > # change weights in listw: two observations within the same region have > weight 0. > columbus$ID = seq(nrow(columbus)) > for (i in seq_along(listw$neighbours)){ + listw$neighbours[[i]] = setdiff(listw$neighbours[[i]], subset(as.data.frame(columbus), regionId == as.data.frame(columbus)[i, "regionId"])$ID) + listw$weights[[i]] = rep(1, length(listw$neighbours[[i]])) + + if (length(listw$neighbours[[i]]) == 0) { + listw$neighbours[[i]] = 0L + listw$weights[i] = list(NULL) + } + } > > # 2.Moran I test on primary variable CRIME using modified weights list > (m.test <- moran.test(columbus$CRIME, listw, + randomisation=FALSE, alternative="greater")) Moran I test under normality data: columbus$CRIME weights: listw n reduced by no-neighbour observations Moran I statistic standard deviate = 1.1726, p-value = 0.1205 alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance 0.014619659 -0.024390244 0.001106761 > > # the neighbors and weights of the first observation > ## original listw > card(listw0$neighbours)[[1]] [1] 20 > listw0$neighbours[[1]] [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 > listw0$weights[[1]] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > ## modified listw > card(listw$neighbours)[[1]] [1] 14 > listw$neighbours[[1]] [1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 > listw$weights[[1]] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > # overview of listw0 and listw > listw0 Characteristics of weights list object: Neighbour list object: Number of regions: 49 Number of nonzero links: 1078 Percentage nonzero weights: 44.89796 Average number of links: 22 2 disjoint connected subgraphs Weights style: B Weights constants summary: n nn S0 S1 S2 B 49 2401 1078 2156 110544 > listw Characteristics of weights list object: Neighbour list object: Number of regions: 49 Number of nonzero links: 784 Percentage nonzero weights: 32.65306 Average number of links: 16 7 regions with no links: 43 44 45 46 47 48 49 8 disjoint connected subgraphs Weights style: B Weights constants summary: n nn S0 S1 S2 B 42 1764 784 1568 65856 Thank you in advance and sorry for the inconvenience. Hongyun Wang Sr. Data Scientist � Contracted Bayer AG [email protected] ________________________________ The information contained in this e-mail is for the excl...{{dropped:14}} ________________________________ The information contained in this e-mail is for the exclusive use of the intended recipient(s) and may be confidential, proprietary, and/or legally privileged. Inadvertent disclosure of this message does not constitute a waiver of any privilege. If you receive this message in error, please do not directly or indirectly use, print, copy, forward, or disclose any part of this message. Please also delete this e-mail and all copies and notify the sender. Thank you. ________________________________ _______________________________________________ R-sig-Geo mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
