Hi Jay, What you describe is similar to research conducted at UCL by myself and colleagues and also as part of an EU project that finished a few years ago now.
Since then, my thoughts on this have expanded a little and Sarah's point about path analysis was where I would have gone next if I was continuing this line of investigation. In the work I did at UCL, I went with a time series approach. I decomposed the species data into a set of ordination axes (I used PCA, but on Hellinger transformed data to account for non-linear responses in the species which PCA doesn't handle well). Then I fitted an additive model with, say, PCA axis 1 (PC1) as the response variable and one or more covariates entering as smooth functions as the predictors. The nice things about the additive model is that the terms come together additively and the non-linear effects allow the effect of a variable to change (i.e. cold temperatures only induce an "effect" on the response). As these were time series data I controlled for autocorrelation by using a continuous time AR(1) process for the residuals. Here are some refs on these approaches: http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html http://dx.doi.org/10.1111/j.1365-2427.2012.02860.x http://dx.doi.org/10.1111/j.1365-2427.2011.02651.x http://dx.doi.org/10.1111/j.1365-2427.2011.02670.x and there is more in the Special Issue section of FWBiol: http://onlinelibrary.wiley.com/doi/10.1111/fwb.2012.57.issue-10/issuetoc Now this doesn't look at cascades of effects; one key aspect of all of the above was the construction of appropriate time series for the environmental data. Ideally, I'd take the variable that is most closely related to diatom physiology as my predictor. However those variables do not always exist or are not available. Instead surrogates can be used; amount of agricultural land-use in catchments or fertilizer-use historical records will be highly correlated with nutrient loading from a catchment to the lake/reservoir, so these could be used instead. Of course, you do need to wary of spurious correlations with time series data. As these models are using smooth functions, you are only going to be able to include one or a few covariates unless you have *lots* of samples; and anyway, I would always advise to think first from the biological or ecological viewpoint and formulate an hypothesis there and then fit that with the stats rather than throwing lots of variables into an analysis to see what pops out (which seems to be what a lot of palaeoecologists do!) As regards envfit(), it isn't symmetric in the variables at least as far as I see it; it fits a model of varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon in other words it uses the 2d "axis" scores (PC1 and PC2, or nMDS1 + nMDS2 coordinates) to predict the values of the response using a linear model. As each environmental variable is modelled separately (individually), one is not favouring a set of variables etc. Perhaps this is not what you meant but worth pointing out. Also, envfit presumes a linear relationship between the variables and the ordination coordinates. If that is too strong an assumption, see ordisurf() which fits GAM-based surfaces rather than linear-regression surfaces. A fairly standard way of looking at this sort of data might be to group variables that are related and then decompose the variance in the data in that which can be explained by each group of variables uniquely, that which can be explained by two or more groups, and the unexplained variance. The vegan package has a function varpart() which can do this all for you if you are willing to use RDA to analyse the data (unbiased estimates of the variance explained are not available for CCA and nMDS is not a constrained technique) - note you can use principal coordinates analysis to embed your original dissimilarity matrix into a metric space and then take the PCoA axis scores as the input "data" for the RDA so that the RDA is in the dissimilarity data of your choice and not linear in the original data. Steve Juggins has adapted the hierarchical partitioning approach from package hier.part to the multivariate multiple regression setting of RDA (possibly CCA too?) which is related to but somewhat different to the variance partitioning described above. I don't believe Steve has released this code yet, so if interested I'd emailing him for it; he is the author of the rioja package so contact details can be found on CRAN. Neither variance partitioning or hierarchical partitioning directly do exactly what you ask and model the directed dependence or pathways of effects. They are however far simpler methods which would have familiarity within the applied community that will see these results/papers etc. In writing this I have pondered on whether you and/or the ecologists are making it too complex? As you have all the variables of interest, I might model the variables that physiologically affect the diatoms and their effect on diatom composition (using constrained ordination or the additive model time series approach I used). Then I would model the relationship between the higher-level factor (land-use, etc) that I hypothesise to be driving changes in the lower-level, physiologically-relevant, variable. Whilst path analysis might be ideal tool here, I suspect you'll have plenty of complications to address, not least the compositional aspects (though Sarah points you to some ideas there), but also the temporal autocorrelation prevalent in the palaeo data which would need to be accounted for to yield appropriate p-values. If you do address these issues and use path analysis, I for one would be very interested to here how you got on as this would be a very useful contribution to the palaeolimnological literature! Wow, this got long! Anyway, hopefully some of the above will be of use and interest, and should you wish to chat about this off-list some more (I'm certainly very interested if you use path analysis for this), then please do so. (Note I am still sending via my old UCL address as I have moved to Canada and U Regina, contact details in my email signature.) All the best, Gavin On Wed, 2013-03-06 at 22:12 -0500, Jay Kerns wrote: > Hello, > > I'm posting to this list because I believe it's the best place to > go. My question is R related only inasmuch as all the work I've > done so far has been with R and I expect any answers I get from > here will lead me to more R work. > > I'm consulting with an ecologist and an engineer on a project > related to a reservoir nearby. They've collected data on diatoms > in the reservoir via core samples; they have sections of data over > the past 100yrs. They are looking at the community structure > plus other environmental factors over the same time period. > > We've done a ton of work already and there's no point trying to > hash all of that out here. Short story: we did an NMDS, it fits > OK (stress 0.17), there are obvious clusters in the ordination > which correspond to a-priori clusters from ecological > considerations (and which match an independent cluster analysis), > we're really quite pleased overall. We checked for relationships > with =envfit=, most environmental variables are *highly* > significant, yet there are a couple which aren't significant at > all. Here comes my question: > > The ecologist pointed out to me that our environmental variables > don't have equal status (ecologically speaking); some variables > lead to others. For instance, there are so-called ultimate > factors (population, percentage farmland) which contribute to > intermediate factors (suspended solids, total phosphorous) which > in turn contribute to direct factors (AREA, pH,...) which then in > turn contribute to diatom structure. > > We have measured data on all the above and several more. The > model we are fitting with =envfit= is symmetric in those n > environmental variables, but the ecology of the situation isn't > symmetric, it's a directed top-down kind of relationship. He > asked me, "How can we quantify that? How can we demonstrate > that? Can we quantify/demonstrate that?" I don't know. > > There are ecologists on this list: what am I looking for, here? > What methods do ecologists use to answer this (or related) > question(s)? Feel free to direct me to papers, literature, > textbooks, whatever. I'm trying to help answer this question > and (this not being my subject specialty) I'm at a bit of a loss. > > If there are relevant R packages/vignettes/manuals you can point > me to, that'd be cool too. > > Thanks for reading all the way down to here. > > Jay > > P.S. If it hadn't been for the archives of this list containing > lengthy and poignant answers to *several* questions I've had > already then I couldn't even have made it this far. Thank you! > > > -- Gavin Simpson, PhD [t] +1 306 337 8863 Institute of Environmental Change & Society [f] +1 306 337 2410 523 Research and Innovation Centre [e] gavin.simp...@uregina.ca University of Regina [tw] @ucfagls Regina, SK S4S 0A2, Canada _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology