Hi Roger, Thank you for this clarification, it will certainly be useful for me and others.
Best, Vinicius Em sáb., 25 de jul. de 2020 às 08:41, Roger Bivand <roger.biv...@nhh.no> escreveu: > On Thu, 23 Jul 2020, Vinicius Maia wrote: > > > Dear Roger, > > > > Why ncf::correlog() should not be used for regression residuals in this > > case? > > > > Could you clarify this to me, please? > > I do not have my copy of Cliff & Ord (1973) to hand. The value of Moran's > I is the same, but the inferences will be affected by changes in the > expectation and variance (under Normality): > > library(spdep) > columbus <- st_read(system.file("shapes/columbus.shp", > package="spData")[1]) > col.gal.nb <- read.gal(system.file("weights/columbus.gal", > package="spData")[1]) > OLS <- lm(CRIME ~ INC + HOVAL, data=columbus) > lm.morantest(OLS, listw=nb2listw(col.gal.nb)) > moran.test(residuals(OLS), listw=nb2listw(col.gal.nb), > randomisation=FALSE) > > This is because standard regression assumptions affect how we view the > regression residuals, details in Cliff & Ord. See for example the code in > moran.test() for E(I) (var(I) is similar): > > EI <- (-1)/wc$n1 > > where wc$n1 is n - 1 from spweights.constants(). In the lm.morantest() > case: > > XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE]) > X <- model.matrix(terms(model), model.frame(model)) > if (length(nacoefs) > 0L) > X <- X[, -nacoefs] > if (!is.null(wts <- weights(model))) { > X <- drop(t(sapply(1:length(wts), function(i) sqrt(wts[i]) * > X[i, ]))) > } > Z <- lag.listw(listw.U, X, zero.policy = zero.policy) > C1 <- t(X) %*% Z > trA <- (sum(diag(XtXinv %*% C1))) > EI <- -((N * trA)/((N - p) * S0)) > > and S_0 is the sum of weights (N if row standardised), p is the number > columns of X, and A = (X'X)^{-1} X'WX. For the distance band case, you > also need to adjust N for the count of pairs of neighbours within that > band; in moran.test() you see n <- as.double(length(which(cards > 0))) by > default, where cards is a vector of neighbour counts; in lm.morantest() > the equivalent by default is > N <- as.double(length(which(card(listw$neighbours) > 0L))) > > My presumption is that ncf::correlog() is for input variables, not > regression residuals, as there is no argument to pass though the fitted > model object from which to extract the model matrix or (X'X)^{-1}. > Further, resampling regression residuals may be unsafe for similar > reasons; the draws should not be from the residuals, I think. > > Recall that the code is shown by typing the function name, and this best > documents what is going on. > > Again, refer to Cliff & Ord 1973 for the full development. > > Roger > > > > > > Thank you > > > > Best, > > > > Vinicius > > > > Em qui., 23 de jul. de 2020 às 13:52, Roger Bivand <roger.biv...@nhh.no> > > escreveu: > > > >> On Wed, 22 Jul 2020, Peter B. Pearman wrote: > >> > >>> > >>> Dear Roger and list members, > >>> > >>> I have a ols regression and want to remove spatial autocorrelation > (SAC) > >>> from the residuals, in order to avoid its potential effects of SAC on > >>> the hypothesis tests (and the reviewers/editor). I have generated > >>> spatial eigenvectors with SpatialFiltering(), and added the generated > >>> vectors to the regression. Surprisingly, SAC appears to become more > >>> pronounced. I also tried ME(), but many more vectors are produced and > >>> SAC is also not removed. Isn't including the vectors from > >>> SpatialFiltering() supposed to reduce SAC? > >>> > >>> Can you please enlighten me as to what's going on, what I am doing > >>> wrong, or what I should try? > >> > >> You are using point support (your forests are at points not areas, so > that > >> the observations are not contiguous). The advice to consider this as a > >> mixed model problem may be relevant. It is certainly the case that > >> ncf::correlog() should not be used for regression residuals. > >> > >> Further, your data are in a band 1.5-4.3E, 43.0-43.4N, so using 0.7 > >> degrees in dnearneigh() was a risky choice, and squashes the data. If > you > >> coerce to an sf object and apply an appropriate CRS, you get many fewer > >> neighbours on average. > >> > >> library(sf) > >> data <- st_as_sf(data, coords=c("LONG", "LAT")) > >> (nbnear4 <- dnearneigh(data, 0, 0.7)) > >> st_crs(data) <- 4326 > >> (nbnear4a <- dnearneigh(data, 0, 27)) > >> # or project to a relevant planar UTM 31N spec. > >> data_utm31 <- st_transform(data, st_crs(32631)) > >> (nbd <- dnearneigh(data_utm31, 0, 27)) > >> > >> and so on. SpatialFiltering() and ME() test against global residual > >> autocorrelation in finding candidate eigenvectors, and running > >> lm.morantest() on the fitted model shows that there is no global > >> autocorrelation as intended (because positive and negative local > >> autocorrelation cancels out). I think that the underlying problems are > >> mixing approaches that are not asking the same questions, and in too > >> little care in choosing the weights. > >> > >> Hope this clarifies, > >> > >> Roger > >> > >>> > >>> Thanks in advance for you time. > >>> > >>> Peter > >>> > >>> The data are here: > >>> > >>> > >> > https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp= > >>> sharing > >>> > >>> The analysis goes like this: > >>> > >>> library(spatialreg) > >>> library(spdep) > >>> library(tidyverse) > >>> library(car) > >>> library(ncf) > >>> > >>> > >>> data <- read_csv("for_RAR.csv") > >>> set.seed(12345) > >>> x <- data$ses.mntd > >>> y <- log(data$RAR) > >>> ols_for_RAR <- lm(y ~ x) > >>> ## qqplot() and shapiro.test() show residuals are nicely distributed > >>> # x is significant and R-square about 0.2, demonstrated here > >>> car::Anova(ols_for_RAR,type="III") > >>> summary(ols_for_RAR) > >>> > >>> # the following appears to make an acceptable neighbor network > >>> c1<-c(data$LONG) > >>> c2<-c(data$LAT) > >>> cbindForests<-cbind(c1,c2) > >>> # a value of 0.7, below, is sufficient to join all the points. > >>> # qualitatively the results aren't affected, as far as I see by setting > >> this > >>> higher > >>> # However, the number of eigenvectors generated by SpatialFiltering() > >> varies > >>> a lot > >>> nbnear4 <- dnearneigh(cbindForests, 0, 0.7) > >>> plot(nbnear4, cbindForests, col = "red", pch = 20) > >>> > >>> # SAC appears significant at short distances (<10km), which is what I > >> want > >>> to remove > >>> cor_for <- correlog(c1, c2, residuals(ols_for_RAR), increment = 1, > >> resamp = > >>> 1000, latlon=TRUE, na.rm = TRUE) > >>> plot(cor_for$correlation[1:20],type="s") > >>> # p-values > >>> print(cor_for$p[which(cor_for$p < 0.05)]) > >>> # Moran's I values > >>> cor_for$correlation[which(cor_for$p < 0.05)] > >>> > >>> # Generate optimized spatial eigenvectors using SpatialFiltering() and > >> use > >>> them > >>> # Several vectors are generated depending on values in dnearneigh() > >>> spfilt_mntd_RAR<- spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style > = > >>> "W", tol=0.0001, ExactEV = TRUE) > >>> new_mod <- lm(y ~ x + fitted(spfilt_mntd_RAR)) > >>> car::Anova(new_mod, type="III") > >>> summary(new_mod) > >>> > >>> # Plot Moran's I at distances under 20km > >>> cor_for_1c <- correlog(c1, c2, residuals(new_mod), increment = 1, > resamp > >> = > >>> 1000, latlon=TRUE, na.rm = TRUE) > >>> plot(cor_for_1c$correlation[1:20],type="s") > >>> > >>> # Extract significant values of Moran's I > >>> cor_for_1c$p[which(cor_for_1c$p < 0.05)] > >>> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)] > >>> > >>> # The result is that Moran's I is significant at additional short > >> distances > >>> -- > >>> > >>> _+_+_+_+_+_+_+_+_ > >>> > >>> Peter B. Pearman > >>> Ikerbasque Research Professor > >>> > >>> Laboratory for Computational Ecology and Evolution > >>> Departamento de Biología Vegetal y Ecología > >>> Facultad de Ciencias y Tecnología > >>> Ap. 644 > >>> Universidad del País Vasco/ Euskal Herriko Unibertsitatea > >>> > >>> Barrio Sarriena s/n > >>> 48940 Leioa, Bizkaia > >>> > >>> SPAIN > >>> > >>> Tel. +34 94 601 8030 > >>> > >>> Fax +34 94 601 3500 > >>> > >>> www.ehu.eus/es/web/bgppermp > >>> > >>> > >>> > >>> When you believe in things that you don’t understand > >>> > >>> Then you suffer > >>> > >>> -- Stevie Wonder > >>> > >>> > >>> Download my public encryption key here: https://pgp.mit.edu/ > >>> > >>> and please use it when you send me e-mail. > >>> > >>> > >>> > >>> > >>> > >> > >> -- > >> Roger Bivand > >> Department of Economics, Norwegian School of Economics, > >> Helleveien 30, N-5045 Bergen, Norway. > >> voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no > >> https://orcid.org/0000-0003-2392-6140 > >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > >> _______________________________________________ > >> R-sig-Geo mailing list > >> R-sig-Geo@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo > >> > > > > -- > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N-5045 Bergen, Norway. > voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no > https://orcid.org/0000-0003-2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo