[R] lme, lmer, gls, and spatial autocorrelation
Hello folks, I have some data where spatial autocorrelation seems to be a serious problem, and I'm unclear on how to deal with it in R. I've tried to do my homework - read through 'The R Book,' use the online help in R, search the internet, etc. - and I still have some unanswered questions. I'd greatly appreciate any help you could offer. The super-super short explanation is that I'd like to draw a straight line through my data, accounting for spatial autocorrelation and using Poisson errors (I have count data). There's a longer explanation at the end of this e-mail, I just didn't want to overdo it at the start. There are three R functions that do at least some of what I would like, but I'm unclear on some of their specifics. 1. lme - Maybe models spatial autocorrelation, but doesn't allow for Poisson errors. I get mixed messages from The R Book. On p. 647, there's an example that uses lme with temporal autocorrelation, so it seems that you can specify a correlation structure. On the other hand, on p.778, The R Book says, "the great advantage of the gls function is that the errors are allowed to be correlated". This suggests that only gls (not lme or lmer) allows specification of a corStruct class. Though it may also suggest that I have an incomplete understanding of these functions. 2. lmer - Allows specification of a Poisson error structure. However, it seems that lmer does not yet handle correlated errors. 3. gls - Surely works with spatial autocorrelation, but doesn't allow for Poisson errors. Does allow the spatial autocorrelation to be assessed independently for different groups (I have two groups, one at each of two different spatial scales). Since gls is what The R Book uses in the example of spatial autocorrelation, this seems like the best option. I'd rather have Poisson errors, but Gaussian would be OK. However, I'm still somewhat confused by these three functions. In particular, I'm unclear on the difference between lme and gls. I'd feel more confident in my results if I had a better understanding of these choices. I'd greatly appreciate advice on the matter More detailed explanation of the data/problem is below: The data: [1] A count of the number of plant species present on each of 96 plots that are 1m^2 in area. [2] A count of the number of plant species present on each of 24 plots that are 100m^2 in area. [3] X,Y coordinates for the centroid of all plots (both sizes). Goal: 1. A best fit straight-line relating log10(area) to #species. 2. The slope of that line, and the standard error of that slope. (I want to compare the slope of this line with the slope of another line) The problem: Spatial autocorrelation. Across our range of plot-separation-distances, Moran's I ranges from -.5 to +.25. Depending on the size of the distance-bins, about 1 out of 10 of these I values are statistically significant. Thus, there seems to be a significant degree of spatial autocorrelation. if I want 'good' values for my line parameters, I need to account for this somehow. Tim Handley Fire Effects Monitor Santa Monica Mountains National Recreation Area 401 W. Hillcrest Dr. Thousand Oaks, CA 91360 805-370-2347 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme, lmer, gls, and spatial autocorrelation
Bert - I took a look at that page just now, and I'd classify my problem as spatial regression. Unfortunately, I don't think the spdep library fits my needs. Or at least, I can't figure out how to use it for this problem. The examples I have seen all use spdep with networks. They build a graph, connecting each location to something like the nearest N neighbors, attach some set of weights, and then do an analysis. The plots in my data have a very irregular, semi-random, yet somewhat clumped (several isolated islands), spatial distribution. Honestly, it's quite weird looking. I don't know how to cleanly turn this into a network, and even if I did, I don't know that I ought to. To me (and please feel free to disagree) it seems more natural to use a matrix of distances and associated correlations, which is what the gls function appears to do. In the ecological analysis section, it looks like both 'ade4' and 'vegan' may have helpful tools. I'll explore that some more. However, I still think that one of lme or gls already has the functionality I need, and I just need to learn how to use them properly. Tim Handley Fire Effects Monitor Santa Monica Mountains National Recreation Area 401 W. Hillcrest Dr. Thousand Oaks, CA 91360 805-370-2347 Bert Gunter To , 08/24/2009 11:43 AM cc Subject RE: [R] lme, lmer, gls, and spatial autocorrelation Have you looked at the "Spatial" task view on CRAN? That would seem to me the logical first place to go. Bert Gunter Genentech Nonclinical Biostatisics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of timothy_hand...@nps.gov Sent: Monday, August 24, 2009 11:12 AM To: r-help@r-project.org Subject: [R] lme, lmer, gls, and spatial autocorrelation Hello folks, I have some data where spatial autocorrelation seems to be a serious problem, and I'm unclear on how to deal with it in R. I've tried to do my homework - read through 'The R Book,' use the online help in R, search the internet, etc. - and I still have some unanswered questions. I'd greatly appreciate any help you could offer. The super-super short explanation is that I'd like to draw a straight line through my data, accounting for spatial autocorrelation and using Poisson errors (I have count data). There's a longer explanation at the end of this e-mail, I just didn't want to overdo it at the start. There are three R functions that do at least some of what I would like, but I'm unclear on some of their specifics. 1. lme - Maybe models spatial autocorrelation, but doesn't allow for Poisson errors. I get mixed messages from The R Book. On p. 647, there's an example that uses lme with temporal autocorrelation, so it seems that you can specify a correlation structure. On the other hand, on p.778, The R Book says, "the great advantage of the gls function is that the errors are allowed to be correlated". This suggests that only gls (not lme or lmer) allows specification of a corStruct class. Though it may also suggest that I have an incomplete understanding of these functions. 2. lmer - Allows specification of a Poisson error structure. However, it seems that lmer does not yet handle correlated errors. 3. gls - Surely works with spatial autocorrelation, but doesn't allow for Poisson errors. Does allow the spatial autocorrelation to be assessed independently for different groups (I have two groups, one at each of two different spatial scales). Since gls is what The R Book uses in the example of spatial autocorrelation, this seems like the best option. I'd rather have Poisson errors, but Gaussian would be OK. However, I'm still somewhat confused by these three functions. In particular, I'm unclear on the difference between lme and gls. I'd feel more confident in my results if I had a better understanding of these choices. I'd greatly appreciate advice on the matter More detailed explanation of the data/problem
Re: [R] lme, lmer, gls, and spatial autocorrelation
Manuel, Thanks for the reference. I printed it out and read through it this morning. I think I'm going to take a gls approach. I've spent the last couple weeks reading about spatial autocorrelation, and found that the world of SAC is large, complex, and requires more time than I currently have. Using gls seems a reasonable compromise between statistical rigour, and the unfortunate but real constraint of my limited time to work on this project. According to Dorman et al, in their (admittedly limited) tests, GLS worked reasonably well with Poisson distributed synthetic data. Also, I've come to think that the ability to do model comparison would be useful. While I would like to be able to confidently choose a model for spatial autocorrelation a priori, based on biological knowledge, I don't have enough information to do this. Even after some data exploration, using variograms and plots of Moran's I, it still seems like there's insufficient information. Using a fitness score such as AIC, I could compare a small number of reasonable models to find the most appropriate error structure. Additionally, I could compare the SAC-informed and SAC-ignorant models to get a holistic assessment of the importance of SAC in my data. Tim Handley Fire Effects Monitor Santa Monica Mountains National Recreation Area 401 W. Hillcrest Dr. Thousand Oaks, CA 91360 805-370-2347 Manuel Morales To timothy_hand...@nps.gov 08/24/2009 05:31 cc PMBert Gunter , r-help@r-project.org Subject Re: [R] lme, lmer, gls, and spatial autocorrelation Hi Tim, I don't believe there is a satisfactory solution in R - at least yet - for non-normal models. Ultimately, this should be possible using lmer() but not in the near-term. One possibility is to use glmPQL as described in: Dormann, F. C., McPherson, J. M., Araújo, M. B., Bivand, R., Bolliger, J., Carl, G., Davies, R. G., Hirzel, A., Jetz, W., Kissling, W. D., Kühn, I., Ohlemüller, R., Peres-Neto, P. R., Reineking, B., Schröder, B., Schurr, F. M. and Wilson, R. 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. – Ecography 30: 609–628. However, note the caution: "This is an inofficial abuse of a Generalized Linear Mixed Model function (glmmPQL {MASS}), which is a wrapper function for lme {nlme}, which in turn internally calls gls {nlme}." If all you need are parameter estimates, fine. If you want to do model comparison, though, no luck. Manuel On Mon, 2009-08-24 at 12:10 -0700, timothy_hand...@nps.gov wrote: > Bert - > > I took a look at that page just now, and I'd classify my problem as > spatial regression. Unfortunately, I don't think the spdep library fits my > needs. Or at least, I can't figure out how to use it for this problem. The > examples I have seen all use spdep with networks. They build a graph, > connecting each location to something like the nearest N neighbors, attach > some set of weights, and then do an analysis. The plots in my data have a > very irregular, semi-random, yet somewhat clumped (several isolated > islands), spatial distribution. Honestly, it's quite weird looking. I don't > know how to cleanly turn this into a network, and even if I did, I don't > know that I ought to. To me (and please feel free to disagree) it seems > more natural to use a matrix of distances and associated correlations, > which is what the gls function appears to do. > > In the ecological analysis section, it looks like both 'ade4' and 'vegan' > may have helpful tools. I'll explore that some more. However, I still think > that one of lme or gls already has the functionality I need, and I just > need to learn how to use them properly. > > Tim Handley > Fire Effects Monitor > Santa Monica Mountains National Recreation Area > 401 W. Hillcrest Dr. > Thousand Oaks, CA 91360 > 805-370-2347 > > > > Bert Gunter >n
[R] understanding the output from gls
I'd like to compare two models which were fitted using gls, however I'm having trouble interpreting the results of gls. If any of you could offer me some advice, I'd greatly appreciate it. Short explanation of models: These two models have the same fixed-effects structure (two independent, linear effects), and differ only in that the second model includes a corExp structure for spatial autocorrelation. (more detailed explanation of the models at end). Specific questions: 1. The second model estimates two additional parameters in the process of fitting the corSpatial object - the range and nugget of the spatial autocorrelation. Based on this, I would expect the second model to have two fewer residual degrees of freedom. However, the summary function reports that both models have the same number of residual degrees of freedom. Why is this? (Interestingly, the difference in AIC between the two models reflects this difference in the number of model parameters) 2. In the model summary, what is the meaning of the small correlation matrix under the heading "Correlation:"? At first, I thought that this was describing possible correlations among the predictor variables, but then I saw that it also included the model intercept. What do these correlation value mean? ##More detailed information ##function calls: sppl.i.xx = gls(all.all.rch~l10area+newx, data = gtemp, method="ML") sppl.i.ex = gls(all.all.rch~l10area+newx, data = gtemp, method="ML", correlation = corExp(c(20,.8), form=~x+y|area, nugget=TRUE)) ##model summaries > summary(sppl.i.xx) Generalized least squares fit by maximum likelihood Model: all.all.rch ~ l10area + newx Data: gtemp AIC BIClogLik 567.4893 578.572 -279.7446 Coefficients: Value Std.Error t-value p-value (Intercept) 6.891867 0.3295097 20.915522 0e+00 l10area 6.586182 0.3048870 21.602046 0e+00 newx0.047901 0.0117281 4.084307 1e-04 Correlation: (Intr) l10are l10area -0.364 newx 0.577 -0.007 Standardized residuals: Min Q1 Med Q3 Max -3.34307266 -0.57949890 -0.07214605 0.64309760 2.66409931 Residual standard error: 2.590313 Degrees of freedom: 118 total; 115 residual summary(sppl.i.ex) Generalized least squares fit by maximum likelihood Model: all.all.rch ~ l10area + newx Data: gtemp AIC BIClogLik 559.167 575.7911 -273.5835 Correlation Structure: Exponential spatial correlation Formula: ~x + y | area Parameter estimate(s): range nugget 15.4448835 0.3741476 Coefficients: Value Std.Error t-value p-value (Intercept) 7.621306 0.7648135 9.964921 0. l10area 6.400442 0.5588160 11.453576 0. newx0.066535 0.0204417 3.254857 0.0015 Correlation: (Intr) l10are l10area -0.592 newx 0.358 0.014 Standardized residuals: Min Q1Med Q3Max -3.0035983 -0.5990432 -0.2226852 0.5113270 2.263 Residual standard error: 2.820337 Degrees of freedom: 118 total; 115 residual Tim Handley Fire Effects Monitor Santa Monica Mountains National Recreation Area 401 W. Hillcrest Dr. Thousand Oaks, CA 91360 805-370-2347 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] understanding the output from gls
Kingsford, Thanks for the information. As you suggest, if I don't hear from anyone else about the degrees of freedom issue in a couple days, I'll try r-devel. Also, while I appreciate your explanation of the correlation matrix produced by summary.gls, I'm afriad I don't have the statistical background to fully understand it. If it wouldn't impose too much on your time, could you suggest a more intuitive way to interpret these numbers, or perhaps a reference with a more intuitive explanation? To me, the idea that fixed effects estimates could be correlated just sounds odd. I understand that the effects themselves could be correlated. While I have good biological reasons for thinking that my two fixed effects are independent, the data may prove me wrong. However, the fixed effects estimates are each just a single number for the data/model as a whole. How can one pair of two single numbers be correlated? Tim Handley Fire Effects Monitor Santa Monica Mountains National Recreation Area 401 W. Hillcrest Dr. Thousand Oaks, CA 91360 805-370-2347 Kingsford Jones To timothy_hand...@nps.gov 09/01/2009 05:53 cc PMR-help@r-project.org Subject Re: [R] understanding the output from gls Hi Tim, On Tue, Sep 1, 2009 at 2:00 PM, wrote: > > I'd like to compare two models which were fitted using gls, however I'm > having trouble interpreting the results of gls. If any of you could offer > me some advice, I'd greatly appreciate it. > > Short explanation of models: These two models have the same fixed-effects > structure (two independent, linear effects), Known to be independent? > and differ only in that the > second model includes a corExp structure for spatial autocorrelation. (more > detailed explanation of the models at end). > > Specific questions: > > 1. The second model estimates two additional parameters in the process of > fitting the corSpatial object - the range and nugget of the spatial > autocorrelation. Based on this, I would expect the second model to have two > fewer residual degrees of freedom. However, the summary function reports > that both models have the same number of residual degrees of freedom. Why > is this? (Interestingly, the difference in AIC between the two models > reflects this difference in the number of model parameters) > That's a good question. It does seem degrees of freedom would be spent in estimating the spatial parameters. The reason the functions using logLik get the number of parameters "right" is because logLik.gls includes: attr(val, "df") <- p + length(coef(object[["modelStruct"]])) + 1 where modelStruct holds the estimated parameters that structure residual variance and correlation, and I believe p is the number of columns in the design matrix, but I couldn't easily follow the code within gls that produces it. If nobody offers an explanation for the current result from print.summary.gls you could ask on r-devel if the results are intended. > 2. In the model summary, what is the meaning of the small correlation > matrix under the heading "Correlation:"? At first, I thought that this was > describing possible correlations among the predictor variables, but then I > saw that it also included the model intercept. What do these correlation > value mean? The GLS covariance of estimated fixed effects (including the intercept) is (X' \hat{\Sigma}^-1 X)^-1, where X is the model matrix and \Sigma is the response/error covariance matrix. This will generally have non-zero off-diagonals, implying the fixed effects estimates are correlated. The values you're inquiring about are the scaled estimates of those off-diagonals. hth, Kingsford Jones > > ##More detailed information > ##function calls: > sppl.i.xx = gls(all.all.rch~l10area+newx, data = gtemp, method="ML") > sppl.i.ex = gls(all.all.rch~l10area+newx, data = gtemp, method="ML", > correlation = corExp(c(20,.8), form=~x+y|area, nugget=TRUE)) > > ##model summaries > >> summary(sppl.i.xx) > Generalized least squares fit by m