I have been reading about autocorrelation in linear models over the last couple of days, and I have to say the more I read, the more confused I get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for guidance.

Most authors are mainly worried about autocorrelation in the residuals, but some authors are also worried about autocorrelation within Y and within X vectors before any model is fitted. Would you test for autocorrelation both in the data and in the residuals?

If we limit our worries to the residuals, it looks like we have a variety of tests for lag=1:

  stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
  stats::Box.test(residuals(fm))
  lmtest::dwtest(fm, alternative="two.sided")
  lmtest::bgtest(fm, type="F")

In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have significant _positive_ autocorrelation within Y and within both X vectors, but _negative_ autocorrelation in the residuals. The residual autocorrelation is not quite significant, with the p-values

  0.070
  0.064
  0.125
  0.077

from the tests above. I seem to remember some authors saying that the Durbin-Watson test has less power than some alternative tests, as reflected here. The difference in p-values is substantial, so choosing which test to use could in many cases make a big difference for the subsequent analysis and conclusions. Most of them (cor.test, Box.test, bgtest) can also test lags>1. Which test would you recommend? I imagine the basic cor.test is somehow inappropriate for this; the other tests wouldn't have been invented otherwise, right?

The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a very long simulation, which results in a p-value similar to lmtest::dwtest, at least in my case.

Finally, one question regarding remedies. If there was significant _positive_ autocorrelation in the residuals, some authors suggest remedying this by deflating the df (fewer effective df in the data) and redo the t-tests of the regression coefficients, rejecting fewer null hypotheses. Does that mean if the residuals are _negatively_ correlated then I should inflate the df (more effective df in the data) and reject more null hypotheses?

That's four question marks. I'd greatly appreciate guidance on any of them.

Thanks in advance,

Arni

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to