Stats Wolf <stats.wolf <at> gmail.com> writes: > > Dear all, > > Consider a completely randomized block design (let's use data(Oats) > irrespoctive of the split-plot design it was arranged in). Look: > > library(nlme) > fit <- lme(yield ~ nitro, Oats, random = ~1|Block, method="ML") > fit2 <- lm(yield ~ nitro + Block, Oats) > anova(fit, fit2) > gives this: > Model df AIC BIC logLik Test L.Ratio p-value > fit 1 4 624.3245 633.4312 -308.1623 > fit2 2 8 611.9309 630.1442 -297.9654 1 vs 2 20.39366 4e-04 > > Clearly, considering block a random term is worse than considering it > a fixed term.
I have technical concerns with this step (and larger philosophical concerns with the whole approach -- see below). It's not at all clear that anova() applied to these two models makes sense -- in fact I think it doesn't, because the two models aren't nested. (I guess technically I could take the limit as the random effects variance goes to infinity, or 1/variance -> 0, to get from the random to the fixed specification. Then the LRT would be suspect on the grounds that the null hypothesis (1/variance=0) is on the boundary of the allowable region, but that will typically "only" bias the p-value by a factor of 2 ...) Furthermore, I'm not 100% convinced that the likelihoods computed by lm() and lme() are comparable, constant terms are often left out. But let's suppose this comparison is OK ... Let's see if blocking should be included in the model at > all: > > fit3 <- lm(yield ~nitro, Oats) > anova(fit2,fit3) > > which gives a very small P value in favor of fit2, which suggests the > block term should be included. So, I go for the second model, with > block considered fixed. > > Is this indeed how I should generally proceed when choosing the > optimum model for a situation that calls for mixed effects? Of course, > the example above is overly simplistic, yet such situations can occur I think in general that you should decide on "random" vs "fixed" on philosophical and practical grounds, _a priori_ ... if you're going to be Bayesian (see various discussions by Andrew Gelman on the subject) then you don't have any philosophical problems at all, because "fixed" and "pooled" are just two ends of a spectrum (variance of the priors of the effects fixed at infinity and zero, respectively), and it's just a practical question of estimation. Doing a lot of hypothesis testing to decide on the best model starts down the road of data-dredging ... In general, questions like this might get more attention on r-sig-mixed-mod...@lists.r-project.org cheers Ben Bolker ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.