Re: [R-sig-eco] anova.cca question / missing data in constraining matrix

Jari Oksanen Fri, 31 May 2013 19:23:18 -0700

Hello, 
On 01/06/2013, at 02:10 AM, ckellogg wrote:

> Hello,
> I am using the cca function in Vegan to examine the relationship between
> microbial community structure and a (large) suite of environmental
> variables.  My constraining/environmental data matrix as a lot of holes in
> it so I have been exploring using the na.action argument. 
> This is the command I am currently using:
> toolik250.cca<-cca(toolikotus250.ra~julianday+logwindspd_max_1dayprior+lograin_3dayprior+sqrtwindspd_1dayprior+windspd_3dayprior+days_since_thaw+days_since_iceout+days_btw_iceoutandthaw+toolikepitemp_24h+logtemp+conductivity+pH,
> toolikenv.s, na.action=na.omit)
> 
> The CCA seems to run just fine, but when I attempt to do the posthoc tests
> such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error
> message: "Error in anova.ccabyterm(object, step = step, ...) : number of
> rows has changed: remove missing values?"  What exactly is occurring here to
> cause this error - I suspect it must be related to the fact that the
> environmental data matrix has a lot of missing data.  I don't quite
> understand why it states that the number of rows has changed...changed from
> what?


The number of rows has changed from term to term. That is, you have different 
numbers of missing values in each term (= explanatory variable), and when rows 
with missing values are removed for the current model, the accepted 
observations change from term to term. I admit the error message is not the 
most obvious one. I must see where it comes from, and how to make it more 
informative. However, it does give a hint to "remove missing values", doesn't 
it?

If you want to have a term-wise test with missing values in terms, you must 
refit your model for complete.cases.  Use argument 'subset' to select a subset 
with no missing values. Currently I don't know any nice short cut to do this 
with the current mode, but the following may work (untested), although it is 
not nice:

keep <- rep(TRUE, nrow(tooliken.s)
keep[toolik250.cca$na.action] <- FALSE
m2 <- update(toolik250.cca, subset = keep)
anova(m2, by="terms", perm=999)

> Is there any way to get around having missing data when running the
> anovas as you can when running the CCA itself?
> 
> One other question I have is when I try and run the CCA on all the data in
> my environmental data matrix (toolikenv.s), not just a subset of variables
> as I do above, using this command:
> toolik250.cca <-cca(toolikotus250.ra~., toolikenv.s, na.action=na.omit)
> I get the following error message.  "Error in svd(Xbar, nu = 0, nv = 0) : a
> dimension is zero" What might be causing this error message to be thrown?
> 
What does sum(complete.cases(toolikenv.s)) give as a result? Does it give 0?

I suspect you have so many holes that nothing is left when you remove rows with 
any missing values. The message is about an attempt to analyse zero-dimensional 
matrix.

> Thank you so much for your help.  Maybe I will just have to filter out the
> samples with missing environmental data (or filter out some of the variables
> themselves if they have too much missing data), but I was just hoping to
> avoid having to do this.


The functions can handle missing values, but they handle them by removing the 
observation. Do you want to lose a huge number of rows? We won't invent values 
to replace missing data in cca(). Some people have suggested ways to do that, 
and that is not difficult: just search for imputation in R (for instance, 
package mice). However, the real problem is how to compare and summarize the 
multivariate results after imputation. Further, if you have a lot of missing 
values, nothing may be very reliable. It could be possible to collect together 
and combine permutation test results after multiple imputation, but better 
consult a statistician before trying to do this.

Cheers, Jari Oksanen

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] anova.cca question / missing data in constraining matrix

Reply via email to