Hi Sean, Thanks for the data. Your first problem is that Age is a factor with 14 levels. Second, Year is a factor with 16 levels. Now consider what interactions of these will be---factors with a huge number of levels, exceeding what is reasonable to fit to your size of data. I included inline code below. First I convert these variables to numeric. I also do some plots.
While I get a model to run, the estimated variance of the random intercept is 0, which is a bit troublesome. I looked at the data, but I do not have a particularly good reason, other than presumably there is very little variability by fish after with age and lake in the model. Anyway, I go on to try a log transformation of Age because of its relationship with Increment. That seems to work pretty well, but the residuals are clearly heteroscedastic, so I also show how you can use the sandwich package to (at least partially) address this. Cheers, Josh require(lme4) ## shows that there are 224 levels in the Age by Year interaction!! with(odata1, interaction(Age, Year)) ## create numeric variables instead odata1 <- within(odata1, { numAge <- as.numeric(levels(Age))[Age] numYear <- as.numeric(levels(Year))[Year] }) ## look at a density plot of Increment plot(density(odata1$Increment)) rug(odata1$Increment) ## look at Increment versus Age (numeric) ## looks like a straight line is likely a poor fit xyplot(Increment ~ numAge, data = odata1) ## compare by lake (looks pretty similar) xyplot(Increment ~ numAge | lake, data = odata1) ## also note that it looks like there is more variability when age is log than hight ## we'll keep that in mind for later ## simple model m1 <- lmer(Increment ~ 0 + numAge*lake + (1 | FishID), data = odata1) ## check residuals against normal qqnorm(resid(m1)) ## residuals against age ## this looks pretty bad plot(odata1$numAge, resid(m1)) ## what about the random effects? plot(ranef(m1)) ## checking the model summary ## we see that the variance for the random intercepts is 0 ## this is also rather concerning summary(m1) ## lets look at some descriptives descriptives <- with(odata1, do.call("rbind", tapply(Increment, FishID, function(x) { c(M = mean(x, na.rm = TRUE), V = var(x, na.rm = TRUE), N = length(x)) }))) descriptives <- descriptives[order(descriptives[, "M"]), ] descriptives # print ## looks like there are quite a few fish with only one observations ## also, those with only one tend to have a higher mean increment ## given the shape of the relationship, we might try a log transformation on Age ## also given that the random intercepts have 0 variance, we can drop it malt <- lm(Increment ~ 1 + log(numAge)*lake, data = odata1) qqnorm(resid(malt)) ## looks decent enough ## not too bad, but clearly not homoscedastic residuals plot(odata1$numAge, resid(malt)) ## the residuals appear heteroscedastic, so we could ## use a sandwhich estimator of the covariance matrix ## the general form is: inv(X'X) X' W X inv(X'X) ## the correction comes in terms of the weight matrix, W which ## can be defined in various ways ## load two required packages require(lmtest) require(sandwich) ## the default coef(summary(malt)) ## using sandwhich estimator ## (okay, the results come out pretty similar, not parameter estimates will be unchaged ## only SEs change) coeftest(malt, vcov. = vcovHC(malt, type = "HC", sandwich = TRUE)) On Thu, Feb 16, 2012 at 11:44 PM, Sean Godwin <sean.god...@gmail.com> wrote: > Hi Josh, > > Thank you so much for your response. > > The point of the process was actually to find out whether there are > different age effects for each lake, so an interaction term between age and > lake is a necessary one. Taking out the other random effects to make the > model simpler would work perfectly to make this easier to tackle though, and > I can add the other terms on later as you say! > > So: m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|FishID),lakedata) > > But apparently I don't understand the Age*lake syntax... all I'm trying to > do with this is to have an interaction term between age and lake, but since > neither are random effects, I can't just write (1|Age:lake) as I could with > (1|Age:Year), right? It is the Age*lake term that is causing the error... > when I take it out, the model runs fine. > > There are 117 fish observations, so 117 unique values of fishID. The number > of observations per fish changes depending on how old the fish is ( a2 year > old fish = 2 observations), with a maximum fish age of 13 years (so max 13 > observations per fish and 13 years total in the dataset). > > I've attached my data (Josh_fulldata.csv) and the script used to rearrange > it into a usable format (Josh_model.R). For some reason, I wasn't able to > export the resulting dataframe without causing errors when re-inputting it, > so hopefully this works for you. Hopefully the inclusion of the full > dataset answers your other questions. Interestingly, when I used a 50 fish > subset, the error message didn't occur (I've attached that just in case: > Josh_data.csv). > > I can't thank you enough for your help. Already you've pointed me in a > better direction. All I'm trying to do is to include a non-random > interaction in here (age:lake), and it sure has me stumped! > > -Sean > > On 16 February 2012 23:06, Joshua Wiley <jwiley.ps...@gmail.com> wrote: >> >> Hi Sean, >> >> Can you fit a simpler model? Particularly I would start with just one >> random effect (say, (1|fishid)). I would not expect the age*lake >> interaction to be a problem per se. >> >> Also you may not be fitting the model you think you are. Writing age*lake >> in a formula expands to: >> >> Age + lake + age:lake >> >> That is the two main effects of age and lake plus their interaction. That >> makes your specification of the main effect of age redundant and means you >> will have a main effect of lake, which since you suppressed the intercept by >> adding the 0, will be the conditional expected values of the two lakes when >> age is 0 for the same random effects. >> >> In terms of getting around the error, some other information about your >> data would be helpful. How many observations do you have? How many unique >> values of fishid are there? How many observations per fish? How many >> years? Is it a numeric variable or factor? What about age? For factor >> variables what do the joint distributions look like? Do you really mean to >> have a random constant by year AND age:year? Within a fish, do age and fish >> have a perfect correlation? >> >> In terms of practical advice, I would start with simple models and make >> them more complex. Try to find the term that when you add it leads to the >> error. Start with an unconditional model only if you have to. As you >> estimate additional parameters in your model, what happens to the others? >> Are they stable? Do they change a lot? What about the standard errors? >> Does adding one term the model still fit but the SEs become huge? >> >> If you need more help, posting a sample of your data that reproduces the >> error so we can run it on our machines would help (a plaintext file or the >> output of dput() would work well...we do not need all of it, just enough >> that a model could work (say 50-100 observations) and that reproduces the >> same error. >> >> Cheers, >> >> Josh >> >> On Feb 16, 2012, at 21:39, Sean Godwin <sean.god...@gmail.com> wrote: >> >> > Hi all, >> > >> > I am fairly new to mixed effects models and lmer, so bear with me. >> > >> > Here is a subset of my data, which includes a binary variable (lake (TOM >> > or >> > JAN)), one other fixed factor (Age) and a random factor (Year). >> > lake FishID Age Increment Year >> > 1 TOM 1 1 0.304 2007 >> > 2 TOM 1 2 0.148 2008 >> > 3 TOM 1 3 0.119 2009 >> > 4 TOM 1 4 0.053 2010 >> > 5 JAN 2 1 0.352 2009 >> > 6 JAN 2 2 0.118 2010 >> > >> > The model I'm trying to fit is: >> > m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|Year) + (1|Year:Age) + >> > (1|FishID),lakedata) >> > >> > The error message I get is: *"Error in mer_finalize(ans) : Downdated X'X >> > is >> > not positive definite, 27."* >> > * >> > * >> >> From reading up on the subject, I think my problem is that I can't >> > incorporate the 'lake' variable in a fixed-effect interaction because it >> > is >> > only has one binary observation. But I don't know what to do to be able >> > to >> > fit this model. Any help would be greatly appreciated! >> > -Sean >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.