Hello, On 01/06/2013, at 02:10 AM, ckellogg wrote: > Hello, > I am using the cca function in Vegan to examine the relationship between > microbial community structure and a (large) suite of environmental > variables. My constraining/environmental data matrix as a lot of holes in > it so I have been exploring using the na.action argument. > This is the command I am currently using: > toolik250.cca<-cca(toolikotus250.ra~julianday+logwindspd_max_1dayprior+lograin_3dayprior+sqrtwindspd_1dayprior+windspd_3dayprior+days_since_thaw+days_since_iceout+days_btw_iceoutandthaw+toolikepitemp_24h+logtemp+conductivity+pH, > toolikenv.s, na.action=na.omit) > > The CCA seems to run just fine, but when I attempt to do the posthoc tests > such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error > message: "Error in anova.ccabyterm(object, step = step, ...) : number of > rows has changed: remove missing values?" What exactly is occurring here to > cause this error - I suspect it must be related to the fact that the > environmental data matrix has a lot of missing data. I don't quite > understand why it states that the number of rows has changed...changed from > what?
The number of rows has changed from term to term. That is, you have different numbers of missing values in each term (= explanatory variable), and when rows with missing values are removed for the current model, the accepted observations change from term to term. I admit the error message is not the most obvious one. I must see where it comes from, and how to make it more informative. However, it does give a hint to "remove missing values", doesn't it? If you want to have a term-wise test with missing values in terms, you must refit your model for complete.cases. Use argument 'subset' to select a subset with no missing values. Currently I don't know any nice short cut to do this with the current mode, but the following may work (untested), although it is not nice: keep <- rep(TRUE, nrow(tooliken.s) keep[toolik250.cca$na.action] <- FALSE m2 <- update(toolik250.cca, subset = keep) anova(m2, by="terms", perm=999) > Is there any way to get around having missing data when running the > anovas as you can when running the CCA itself? > > One other question I have is when I try and run the CCA on all the data in > my environmental data matrix (toolikenv.s), not just a subset of variables > as I do above, using this command: > toolik250.cca <-cca(toolikotus250.ra~., toolikenv.s, na.action=na.omit) > I get the following error message. "Error in svd(Xbar, nu = 0, nv = 0) : a > dimension is zero" What might be causing this error message to be thrown? > What does sum(complete.cases(toolikenv.s)) give as a result? Does it give 0? I suspect you have so many holes that nothing is left when you remove rows with any missing values. The message is about an attempt to analyse zero-dimensional matrix. > Thank you so much for your help. Maybe I will just have to filter out the > samples with missing environmental data (or filter out some of the variables > themselves if they have too much missing data), but I was just hoping to > avoid having to do this. The functions can handle missing values, but they handle them by removing the observation. Do you want to lose a huge number of rows? We won't invent values to replace missing data in cca(). Some people have suggested ways to do that, and that is not difficult: just search for imputation in R (for instance, package mice). However, the real problem is how to compare and summarize the multivariate results after imputation. Further, if you have a lot of missing values, nothing may be very reliable. It could be possible to collect together and combine permutation test results after multiple imputation, but better consult a statistician before trying to do this. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology