Re: [R-sig-eco] how are compuetd the species scores of pca from veganpackage ?
Dear Jari, You're right I missed the Design decision vignettesorry for that. Thanks you so much for your reply. All the best Claire Della Vedova -Message d'origine- De : Jari Oksanen [mailto:jari.oksa...@oulu.fi] Envoyé : mardi 27 mai 2014 10:43 À : claire della vedova; r-sig-ecology@r-project.org Objet : RE: [R-sig-eco] how are compuetd the species scores of pca from veganpackage ? Dear Claire Della Vedova, It seems that you have searched in many places, except in vegan documentation. Look at the vignette on Design decisions, section "Scaling in redundancy analysis". Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org on behalf of claire della vedova Sent: 27 May 2014 11:17 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] how are compuetd the species scores of pca from veganpackage ? Hi everybody, I'm working on PCA approach, and comparing outputs from ade4 and vegan packages. I'm ok with the normalization of the variables coordinates coming from ade4 outputs. (with $co : coordinates are scaled to eigen values ; with $c1 : coordinates are scaled to 1). But I have difficulties to understand how are computed the Species scores in vegan's outputs with scaling 1 or 2 options, and what means the message concerning scaling, especially about the 'General scaling constant of scores'. For example : 'Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' ' I've search on the archives of R-sig-ecology , cross validate and stackoverflow, and found nothing that helped me. If somebody has some information about it, I would greatly appreciate some help. All the best. Claire Della Vedova Here some parts of my code : library(ade4) library(vegan) doubs.env <- read.csv ('http://www.sci.muni.cz/botany/zeleny/wiki/anadat-r/data -download/DoubsEnv.csv', row.names = 1) ## with ade4 ## pca.ad<-dudi.pca(doubs.env, scale = TRUE, center = TRUE, scann = FALSE,nf=3) # eigen value of the fisrt eigen vector pca.ad$eig[1] [1] 5.968749 #variables coordinates in first eigen vector pca.ad$co[,1] [1] 0.85280863 -0.81918008 -0.4528 0.75214647 -0.04996375 0.70722171 [7] 0.83048310 0.90260821 0.79011263 -0.76485397 0.76373149 #check the normalization of laodings sum(pca.ad$co[,1]^2) [1] 5.968749 #=> coordinatesscaled to eigen values #variables normed scores in first eigen vector pca.ad$c1[,1] [1] 0.34906791 -0.33530322 -0.18535177 0.30786532 -0.02045094 0.28947691 [7] 0.33992972 0.36945166 0.32340546 -0.31306670 0.31260725 #check the normalization sum(pca.ad$c1[,1]^2) [1] 1 #=> coordinates scaled to 1 ## with vegan ## pca.veg<-rda(doubs.env, scale = TRUE) # species scores for the fisrt eigen vector, with sacling 1 summary(pca.veg, scaling=1) dasaltpendeb pHdurpho 1.4752228 -1.4170507 -0.7833294 1.3010933 -0.0864293 1.2233806 1.4366031 nitammoxydbo 1.5613681 1.3667687 -1.3230752 1.3211335 'Scaling 1 for species and site scores * Sites are scaled proportional to eigenvalues * Species are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' summary(pca.veg, scaling=2)[1][[1]][,1] das alt pen deb pH dur 1.08668311 -1.04383225 -0.57701847 0.95841533 -0.06366582 0.90117039 pho nit amm oxy dbo 1.05823501 1.15013974 1.00679334 -0.97460773 0.97317743 'Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/how-are-compuetd-the-species-score s-of-pca-from-vegan-package-tp7578918.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] how are compuetd the species scores of pca from vegan package ?
Hi everybody, I'm working on PCA approach, and comparing outputs from ade4 and vegan packages. I'm ok with the normalization of the variables coordinates coming from ade4 outputs. (with $co : coordinates are scaled to eigen values ; with $c1 : coordinates are scaled to 1). But I have difficulties to understand how are computed the Species scores in vegan's outputs with scaling 1 or 2 options, and what means the message concerning scaling, especially about the 'General scaling constant of scores'. For example : 'Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' ' I've search on the archives of R-sig-ecology , cross validate and stackoverflow, and found nothing that helped me. If somebody has some information about it, I would greatly appreciate some help. All the best. Claire Della Vedova Here some parts of my code : library(ade4) library(vegan) doubs.env <- read.csv ('http://www.sci.muni.cz/botany/zeleny/wiki/anadat-r/data -download/DoubsEnv.csv', row.names = 1) ## with ade4 ## pca.ad<-dudi.pca(doubs.env, scale = TRUE, center = TRUE, scann = FALSE,nf=3) # eigen value of the fisrt eigen vector pca.ad$eig[1] [1] 5.968749 #variables coordinates in first eigen vector pca.ad$co[,1] [1] 0.85280863 -0.81918008 -0.4528 0.75214647 -0.04996375 0.70722171 [7] 0.83048310 0.90260821 0.79011263 -0.76485397 0.76373149 #check the normalization of laodings sum(pca.ad$co[,1]^2) [1] 5.968749 #=> coordinatesscaled to eigen values #variables normed scores in first eigen vector pca.ad$c1[,1] [1] 0.34906791 -0.33530322 -0.18535177 0.30786532 -0.02045094 0.28947691 [7] 0.33992972 0.36945166 0.32340546 -0.31306670 0.31260725 #check the normalization sum(pca.ad$c1[,1]^2) [1] 1 #=> coordinates scaled to 1 ## with vegan ## pca.veg<-rda(doubs.env, scale = TRUE) # species scores for the fisrt eigen vector, with sacling 1 summary(pca.veg, scaling=1) dasaltpendeb pHdurpho 1.4752228 -1.4170507 -0.7833294 1.3010933 -0.0864293 1.2233806 1.4366031 nitammoxydbo 1.5613681 1.3667687 -1.3230752 1.3211335 'Scaling 1 for species and site scores * Sites are scaled proportional to eigenvalues * Species are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' summary(pca.veg, scaling=2)[1][[1]][,1] das alt pen deb pH dur 1.08668311 -1.04383225 -0.57701847 0.95841533 -0.06366582 0.90117039 pho nit amm oxy dbo 1.05823501 1.15013974 1.00679334 -0.97460773 0.97317743 'Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/how-are-compuetd-the-species-scores-of-pca-from-vegan-package-tp7578918.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] troubles with global test of rda from vegan
Hi everybody, I’m in troubles with results I obtained using rda function of vegan package and I would greatly appreciate some help. I did a rda to assess if my matrix of species abundances (18 sites and 34 species) can be explained by my environmental matrix (18 sites and 5 variables). Abundances were transformed according hellinger equation First I did a rda with all my environmental variables, and then did the overall test. It was no significant. myrda1<-rda(decostand(abund, "hellinger")~.,VarEnv) anova(myrda1) Permutation test for rda under reduced model Model: rda(formula = decostand(abund, "hellinger") ~ VAR1 + VAR2 + Var3 + Var4 + VAR5, data = VarEnv) Df Var F N.Perm Pr(>F) Model 5 0.062863 1.025 99 0.43 Residual 12 0.147195 I also did the test by margin (all pvalues were no significant), and by axis (first axis significant) anova(myrda1, by="axis") Model: rda(formula = decostand(abund, "hellinger") ~ VAR1 + VAR2 + Var3 + Var4 + VAR5, data = VarEnv) Df Var F N.Perm Pr(>F) RDA1 1 0.030016 2.4470199 0.01 ** RDA2 1 0.013816 1.1263 99 0.29 RDA3 1 0.009770 0.7965 99 0.68 RDA4 1 0.006273 0.5114 99 0.84 RDA5 1 0.002989 0.2437 99 1.00 Residual 12 0.147195 On the plot, first axis is explained by Var1 and Var4 Since I was surprised by the results of the global test I tried a forward selection. Only the Var4 was kept is the final model, and the test was now significant. I also did backward selection ; it was the Var1 which was kept is the final model, and the test was significant too. So my question is, why the global test of the rda with all the environmental variables is not significant while the test by “axis” is significant for the first one (explain by variables Var1 and Var4) and while model selection lead to significant test for Var1 or Var4 ? I analyzed the VIF of the full model, and all were lower than 3 vif.cca(myrda1) VAR1 VAR2 Var3 Var4 VAR5 2.573506 2.949139 2.209569 2.023914 1.854133 Thanks in advance for your help. All the best. Claire Della Vedova -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/troubles-with-global-test-of-rda-from-vegan-tp7578754.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
Dear all, Im a biostatistician working for a French institute involved in environmental risk assessment, and I would need help to understand the results I obtained from several ordination analyses. I have a dataset of 25 sites. For these 25 sites I have abundance data of 38 species and also the measurement of 5 environmental variables. Here an extract of my abundance data for the 5 first sites: Anguinidae.ditylenchus Aphelenchidae Aphelenchoididae Aporcelaimidae 1218 184 0 014 154 0 45 0 101 6 20 0 148 0 0 0 118 0 Here the environmental data for the 5 first sites: ExtPond moist Corg pH DV50 0.946 9.086 4.269 5.24 171.33 0.682 27.139 23.813 3.82 75.45 2.480 14.322 7.191 4.48 230.90 3.069 18.380 11.404 3.58 211.19 2.615 16.693 7.128 4.12 224.45 My aim was to study how the distribution of species is linked with environmental data. Firstly, I did a PCA (with vegan library), using a Hellinger transformation, with commands like this : acp1<-rda(decostand(myDataSpec[,c(25:62)], "hellinger")) The first axe represent 19.5% the second one 16.3%. A colleague of me said it is not so bad with abundance data, but it seems to me quite poor. What do you think about ? Then, I fitted environmental vectors with the envfit function (of vegan library), with commands like this : physCInd.fit3<-envfit(acp1,MyDataEnv[,c(13,18,20,21,23)], permut=4999, na.rm=T) It appeared that pH variable is significantly linked with the ordination, and the pval of ExtPond is 0.1. Next I did a RDA which is not significant. To finish I did two NMDS. For the first one I used the Hellinger normalization and the Bray-Curtis distance. The stress obtained value is 0.22, Non metric fit R² is 0.952 and Linear fit R2 =0.777. When I fitted the environmental vectors , ExtPond was correlated with the ordination (pval =0.02) and p-val of pH = 0.23 But then I read in numerical ecology page 449 that its better to standardize the data by dividing each value by maximum abundance for species and then use Kulcynski distance. The stress value was 0.23 , Non metric fit R² was 0.948 and Linear fit R2 =0.69. These values are a little less good than those of the first NMDS, but the stressplot seems to me more homogenous. Nevertheless, the results I obtained are very different... When I fitted the environmental data it appeared that ExtPond was not correlated with this ordination (p-val=0.82) and p-val of pH=0.06. And obviously ExtPond is the most important variable for us ;-) With all these results, Im quite confused, and I dont know what to think. So, if someone can help me, I would appreciate it very much. Be sure that all comments will be welcome. To summarize my questions are : a) Which ordination method would be better for my data : PCA knowing that the represented inertia is 35.62% or NMDS with a stress value about 0.22? b) If NMDS is more adapted which one is the better? with Hellinger normalization and Bray-Curtis distance, or with the normalization recommended by Legendre and Legendre and Kulcynski distance ? c) Is there other method to apply? Im going to try co-inertia with ade4 package Thanks in advance. Cheers. Claire Della Vedova [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology