Milton Cezar Ribeiro <milton_ruser <at> yahoo.com.br> writes: > > Dear R-gurus > > I have a data.frame with abundance data for species and sites which looks > like: > mydf<-data.frame( > sp1=sample(0:10,5,replace=T), > sp2=sample(0:20,5,replace=T), > sp3=sample(0:4,5,replace=T), > sp4=sample(0:2,5,replace=T)) > rownames(mydf)<-paste("sites",1:5,sep="") > > I would like make an ordination analysis of these data and my worries is about the "zeros" (absence of > species) into the matrix. Up to I read (Gotelli - A primir of ecological statistics, 2004), when I have > abundance data I canĀ“t compute Euclidian Distances because the zeros have the meaning of absence of the > species and not as zero counting. Gotelli suggests one make "principal coordinates analysis". I would > like to here from you what you think about and what is the best packages and functions to I compute my > distance matrices and do my ordination analysis. Can I considere zero as NA on my data.frame? Is there a > good PDF book available about Multivariate Analysis for abundance data available on the web? > > Other people already suggested what to do with these data and where to find pdf texts. I only comment on some points raised in this original question. Firstly, Euclidean distance is quite OK with zeros, or at least as good as any other normal dissimilarity index is with zeros. Euclidean distance on non-transformed data is poor for other reasons (it takes squared differences emphasizing abundance, and even when two sites have nothing in common, Euclidean distance varies with total abundances). Using Principal Co-ordinates analysis does not change this, since it also can be run with Euclidean distances. However, there are a many packages providing "better" dissimilarity indices or transformations that make Euclidean distances more useful (such as the Hellinger transformation).
Another question is more abstract: indeed, you may regard most zeros as missing data. Species probably could occur in your sample site, more or less, but it was too scarce to be observed. How to do this in practice is the tricky issue. You cannot simply change zeros to NA, since then the dissimilarities (if they don't fail) will really give a special significance to these cells. Regarding them as zeros certaily makes more sense than removing *pairs* of data where species is NA in one site and present in another. There are ways to have something like handling zeros as missing values of various degrees(!), but my decency prohibits me to write about these methods. cheers, jari oksanen ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.