Re: [R] NA in cca (vegan)

2009-09-08 Thread Jari Oksanen
Gavin Simpson  ucl.ac.uk> writes:

> 
> On Fri, 2009-09-04 at 17:15 +0200, Kim Vanselow wrote:
> > Dear all,
> > I would like to calculate a cca (package vegan) with species and
> > environmental data. One of these environmental variables is
> > cos(EXPOSURE).
> > The problem: for flat releves there is no exposure. The value is
> > missing and I can't call it 0 as 0 stands for east and west.
> > The cca does not run with missing values. What can I do to make vegan
> > cca ignoring these missing values?
> > Thanks a lot,
> > Kim
> 
> 
> This is timely as Jari Oksanen (lead developer on vegan) has been
> looking into making this happen automatically in vegan ordination
> functions. The solution for something like cca is very simple but it
> gets more complicated when you might like to allow features like
> na.exclude etc and have all the functions that operate on objects of
> class "cca" work nicely.
> 
> For the moment, you should just process your data before it goes into
> cca. Here I assume that you have two data frames; i) Y is the species
> data, and ii) X the environmental data. Further I assume that only one
> variable in X has missings, lets call this Exposure:
> 
Kim,

A test version of NA handling in cca is now in the development version of vegan
at http://vegan.r-forge.r-project.org/. You may get current source code or a bit
stale packages from that address (when writing this, the packages are two to
three days behind the current devel version). Instruction of downloading the
working version of vegan can be found in the same web site.  

Basically the development version does exactly the same thing as Gavin showed
you in his response. It does a "listwise" elimination of missing values. Indeed,
it may be better to do that manually and knowingly than to use perhaps
surprising automation of handling missing values within the function. 

Your missing values are somewhat wierd as they are not missing values (= unknown
and unobserved), but you just decided to use a coding system that does not cope
with your well known and measured values. I would prefer to find a coding that
puts flat ground together with exposure giving similar conditions. In no case
should they be regarded as NA since they are available and known, and censoring
them from your data may distort your analysis. Perhaps having a new variable
(hasExposure, TRUE/FALSE) and coding them as east/west (=0) in Exposure could
make more sense. Indeed, model term hasExposure*Exposure would make sense as
this would separate flat ground from slopes of different Exposures. The
interaction term and aliasing would take care of having flat ground with known
values but separate from exposed slopes.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NA in cca (vegan)

2009-09-04 Thread Gavin Simpson
On Fri, 2009-09-04 at 17:15 +0200, Kim Vanselow wrote:
> Dear all,
> I would like to calculate a cca (package vegan) with species and
> environmental data. One of these environmental variables is
> cos(EXPOSURE).
> The problem: for flat releves there is no exposure. The value is
> missing and I can't call it 0 as 0 stands for east and west.
> The cca does not run with missing values. What can I do to make vegan
> cca ignoring these missing values?
> Thanks a lot,
> Kim

Hi Kim,

This is timely as Jari Oksanen (lead developer on vegan) has been
looking into making this happen automatically in vegan ordination
functions. The solution for something like cca is very simple but it
gets more complicated when you might like to allow features like
na.exclude etc and have all the functions that operate on objects of
class "cca" work nicely.

For the moment, you should just process your data before it goes into
cca. Here I assume that you have two data frames; i) Y is the species
data, and ii) X the environmental data. Further I assume that only one
variable in X has missings, lets call this Exposure:

## dummy data
set.seed(1234)
## 20 samples of 10 species
Y <- data.frame(matrix(rpois(20*10, 2), ncol = 10))
## 20 samples and 5 env variables
X <- data.frame(matrix(rnorm(20*5), ncol = 5))
names(X) <- c(paste("Var", 1:4, sep = ""), "Exposure")
## simulate some NAs in Exposure
X$Exposure[sample(1:20, 3)] <- NA
## show X
X

## Now create a new variable indicating which are missing
miss <- with(X, is.na(Exposure))

## now create new X and Y omitting these rows
Y2 <- Y[!miss, ]
X2 <- X[!miss, ]

## Now submit to CCA
mod <- cca(Y2 ~ ., data = X2)
mod

## plot it
plot(mod, display = c("sites","bp"), scaling = 3)

## It'd be nice to get predictions for the 3 samples we missed out
pred <- predict(mod, newdata = Y[miss, ], type = "wa", scaling = 3)

## add these points to the ordination:
points(pred[, 1:2], col = "red", cex = 1.5)

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.