Prof. Bivand,
Thank you for your response. I changed data source and data manipulation
methods and copied only codes to this email. I also checked that the modified
columbus is still a sf/data.frame object.
My motivation using modified columbus dataset that has multiple observations
per region/polygon is to find correct ways to complete following two tasks:
1. Creation of neighbors list and weights list.
2. Moran’s I test for spatial autocorrelation for: 1) a primary variable, 2)
residuals from an estimated linear model.
I looked at nb2blocknb function’s details and examples and thought this
function can be used to create blocked up neighbors list in my case, because
this modified dataset is like the post codes example described in function’s
details. On the other hand, as stated in the documentation: “observations
belonging to each unique location are observation neighbours”, and observation
ids will be added to the neighbors list as attribute region.id. I think
modifying the neighbors lists and weights list is not necessary, because two
observations within a unique location can be “observation neighbours”.
In the codes below, I also compared two weights list objects: listw.block and
listw0. They are not identical, but they have the same neighbors list and
weights list. And when they were used in Moran’s I tests, the tests returned
same z-scores/p-values. Is listw.block object created correctly? Can listw0
replace listw.block?
library(spdep)
# read dataset and modify it: only 7 regions, variable observations per region.
columbus <- st_read(system.file("shapes/columbus.gpkg", package="spData")[1],
quiet=TRUE)
columbus$regionId <- rep(as.character(1:7), 4:10)
columbus$geom <- rep(columbus$geom[1:7], 4:10)
# check class of object columbus
class(columbus)
columbus_ <- st_make_valid(st_as_sf(columbus, wkt = "geom"))
class(columbus_)
identical(columbus, columbus_)
# create neighbor list using 7 unique regions/polygons
nb7 <- poly2nb(as(unique(columbus[, "geom"]), "Spatial"))
# block up neighbor list for location-less observations
if (all(sort(as.character(attr(nb7, "region.id"))) ==
sort(unique(as.character(columbus$regionId))))) {
block.nb <- nb2blocknb(nb7, columbus$regionId)
} else {
stop("ID tags don't match!")
}
# create spatial weights for neighbor list
(listw.block <- nb2listw(block.nb, zero.policy=T, style="B"))
# check number of neighbors of each observation
card(listw.block$neighbours)
# the neighbors and weights of the first observation
card(listw.block$neighbours)[[1]]
listw.block$neighbours[[1]]
listw.block$weights[[1]]
# Moran's I test for spatial autocorrelation in a primary variable CRIME
(m.test <- moran.test(columbus$CRIME, listw.block,
randomisation=FALSE, alternative="greater"))
# Moran's I test for spatial autocorrelation in residuals from an estimated
linear model
crime.lm <- lm(CRIME ~ HOVAL + INC, data = columbus)
(m.test_ <- lm.morantest(crime.lm, listw.block, alternative="greater"))
########################################################################################
# create neighbor list directly from full columbus dataset
nblist <- poly2nb(as(columbus, "Spatial"))
listw0 <- nb2listw(nblist, zero.policy=T, style="B")
# Moran's I test for spatial autocorrelation in a primary variable CRIME
(m.test <- moran.test(columbus$CRIME, listw0,
randomisation=FALSE, alternative="greater"))
# Moran's I test for spatial autocorrelation in residuals from an estimated
linear model
crime.lm <- lm(CRIME ~ HOVAL + INC, data = columbus)
(m.test_ <- lm.morantest(crime.lm, listw0, alternative="greater"))
# list.block and list0 have same neighbors and weights, although they are not
identical
all(card(listw0$neighbours) == card(listw.block$neighbours))
do.call(all, lapply(seq(length(listw0$neighbours)), function(x)
all(listw0$neighbours[[x]] == listw.block$neighbours[[x]])))
do.call(all, lapply(seq(length(listw0$neighbours)), function(x)
all(listw0$weights[[x]] == listw.block$weights[[x]])))
identical(listw.block, listw0)
Thank you for your help.
Hongyun
From: Roger Bivand <[email protected]>
Date: Thursday, December 19, 2024 at 3:22 AM
To: Hongyun Wang <[email protected]>, [email protected]
<[email protected]>
Subject: Re: Spatial autocorrelation test in a dataset with multiple
observations per region
[You don't often get email from [email protected]. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
This is not easy to disambiguate.
First, you have copied from the output to your email, so that replicating your
code is contorted - provide the code, either instead of, or in addition to the
output.
Second, shapes/columbus.shp is deprecated, soon to be defunct, use
shapes/columbus.gpkg.
Third, you use pipes and dplyr, so that examining intermediate objects is not
possible; I would be most grateful if you avoided dplyr::mutate, as sf objects
are spatial, not general. I do not think that you have created the object you
describe. I also fail to grasp your motivation, do you have multiple
observations with unknown position within larger enclosing polygons, as in
?nb2blocknb?
Roger
--
Roger Bivand
Emeritus Professor
Norwegian School of Economics
Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway
[email protected]
________________________________________
From: R-sig-Geo <[email protected]> on behalf of Hongyun Wang
<[email protected]>
Sent: 18 December 2024 23:26
To: [email protected]
Subject: [R-sig-Geo] Spatial autocorrelation test in a dataset with multiple
observations per region
[You don't often get email from [email protected]. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
Hello Everyone,
I have one question on the spatial autoregression tests on primary variables or
residuals of linear regression models of a dataset that has multiple
observations per region. Here I use columbus dataset and moran.test() on a
primary variable CRIME as illustration.
I modified the dataset to have only 7 regions, not 49 regions in original
dataset, and each region has 7 observations. I used poly2nb() and nb2listw() to
create neighbors and weights list of each region. But after examining the list
carefully, I found that two observations within the same region have weight 1.
This looks not right. I think poly2nb function treats each observation (a row
in the dataset) as a region. Then I modified the neighbors and weights list to
have weight 0 for two observations that are in the same region. The moran.test
results from original listw object (listw0) and modified listw object (listw)
have different Z-scores and p-values. The which moran.test is correct, or are
both incorrect?
> library(dplyr)
> library(spdep)
>
> # read dataset and modify it: only 7 regions, 7 observations per region.
> columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1],
> quiet=TRUE) %>%
+ mutate(regionId = rep(LETTERS[1:7], each = 7),
+ geometry = rep(.$geometry[1:7], each = 7))
> columbus %>% as.data.frame() %>% select(regionId,geometry) %>% unique() %>%
> dim()
[1] 7 2
>
> # neighbor list and weights
> nblist <- poly2nb(as(columbus, "Spatial"))
> listw0 <- listw <- nb2listw(nblist, zero.policy=T, style="B")
>
> # 1.Moran I test on primary variable CRIME using original weights list
> (m.test <- moran.test(columbus$CRIME, listw0,
+ randomisation=FALSE, alternative="greater"))
Moran I test under normality
data: columbus$CRIME
weights: listw0
Moran I statistic standard deviate = 2.6121, p-value = 0.004499
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.0497371251 -0.0208333333 0.0007298841
>
> # change weights in listw: two observations within the same region have
> weight 0.
> columbus$ID = seq(nrow(columbus))
> for (i in seq_along(listw$neighbours)){
+ listw$neighbours[[i]] = setdiff(listw$neighbours[[i]],
subset(as.data.frame(columbus), regionId == as.data.frame(columbus)[i,
"regionId"])$ID)
+ listw$weights[[i]] = rep(1, length(listw$neighbours[[i]]))
+
+ if (length(listw$neighbours[[i]]) == 0) {
+ listw$neighbours[[i]] = 0L
+ listw$weights[i] = list(NULL)
+ }
+ }
>
> # 2.Moran I test on primary variable CRIME using modified weights list
> (m.test <- moran.test(columbus$CRIME, listw,
+ randomisation=FALSE, alternative="greater"))
Moran I test under normality
data: columbus$CRIME
weights: listw n reduced by no-neighbour observations
Moran I statistic standard deviate = 1.1726, p-value = 0.1205
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.014619659 -0.024390244 0.001106761
>
> # the neighbors and weights of the first observation
> ## original listw
> card(listw0$neighbours)[[1]]
[1] 20
> listw0$neighbours[[1]]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
> listw0$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ## modified listw
> card(listw$neighbours)[[1]]
[1] 14
> listw$neighbours[[1]]
[1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21
> listw$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> # overview of listw0 and listw
> listw0
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 1078
Percentage nonzero weights: 44.89796
Average number of links: 22
2 disjoint connected subgraphs
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 49 2401 1078 2156 110544
> listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 784
Percentage nonzero weights: 32.65306
Average number of links: 16
7 regions with no links:
43 44 45 46 47 48 49
8 disjoint connected subgraphs
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 42 1764 784 1568 65856
Thank you in advance and sorry for the inconvenience.
Hongyun Wang
Sr. Data Scientist � Contracted
Bayer AG
[email protected]
________________________________
The information contained in this e-mail is for the excl...{{dropped:14}}
________________________________
The information contained in this e-mail is for the exclusive use of the
intended recipient(s) and may be confidential, proprietary, and/or legally
privileged. Inadvertent disclosure of this message does not constitute a
waiver of any privilege. If you receive this message in error, please do not
directly or indirectly use, print, copy, forward, or disclose any part of this
message. Please also delete this e-mail and all copies and notify the sender.
Thank you.
________________________________
[[alternative HTML version deleted]]
_______________________________________________
R-sig-Geo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo