Dear Marika,

I'm really glad that more people are starting to use SAMs, and other 
model-based methods.

Here are some answers to your questions (as best I can).

The model-selection procedure outlined in Leaper et al. (2014) is just a variant of the much-loved, much-hated, and often-used backwards elimination method. The variation is that the number of species-archetypes is selected first (so that the number of potential models is not unmanageably huge).

I presume that you get two different BIC values for two different instances of maximising the likelihood? That is, from two different calls to SpeciesMix()? This can occur, and often does, as the process of maximising the (log-)likelihood can get stuck in local maxima -- this is a 'feature' of any type of model that has random/latent factors/variables but seems to be quite acute in mixture models. The remedy for this, as outlined in the estimation section of Dunstan et al (2011) and buried in the application section of Dunstan et al (2013), is to perform multiple starts -- the more the merrier. The model that you should use for comparisons is the one that finds the global maxima (the highest likelihood -- lowest BIC -- that you observe). Performing multiple starts will increase computation time, but it will also reduce the possibility of making inference from a sub-optimal model. You should perform multiple starts.

I would also recommend, as you allude to, that you have a look at the values of tau -- the (posterior) probability of each species belong to each archetype group -- and pi the probability of any new species (with no data) belonging to each group. This can help remove a certain type of 'miss-fit', which is likely to be a singularity in the (log-)likelihood surface.

A colleague has been busy trying to make model selection more robust. In particular, he has looked at alternatives to using BIC in SAMs and related models (Hui et al 2015a), which aims to remove one of the question marks in SAMs by getting a decent criterion for choosing between models. He has also looked at automated methods based on regularisation shrinkage (Hui et al 2015b). Both are great additions to the arsenal for SAMs. However, both are not (yet?) incorporated in the SpeciesMix R-package.

Lastly (and I hope that this is not just in my opinion), model selection for any analysis is difficult irrespective of how complex the modelling framework is. Automating the process can make the process seem objective, but in truth there will always be assumptions made and personal preferences will come through. To my mind, the modelling process is enhanced by context-specific information that is only available from experts (generally the people that obtained the data). Such things as "polychaete assemblages are highly likely to vary with sediment size" are just the beginning... However, formalising this process is difficult and it is even more difficult to convince editors/reviewers/readers that you have done an excellent job without resorting to well established algorithms for model selection.

I hope that this answers your questions.  Let me know if you have any more.

Cheers,

Scott (SpeciesMix contributor)


Dunstan, Foster and Darnell (2011) Model based grouping of species across environmental gradients. Ecological Modelling. 222: 955-963. DOI: 10.1016/j.ecolmodel.2010.11.030 Dunstan, Foster, Hui and Warton (2013) Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology. Journal of Agricultural, Biological and Environmental Statistics. 18: 357-375. DOI: 10.1007/s13253-013-0146-x Hui, Warton and Foster (2015a) Order selection in finite mixture models: complete or observed likelihood information criteria? Biometrika. 102: 724-730. DOI: 10.1093/biomet/asv027 Hui Warton and Foster (2015b) MULTI-SPECIES DISTRIBUTION MODELING USING PENALIZED MIXTURE OF REGRESSIONS. Annals of Applied Statistics. 9: 866-882. DOI: 10.1214/15-AOAS813 Leaper, Dunstan, Foster, Barrett and Edgar (2014) Do communities exist? Complex patterns of overlapping marine species distributions. Ecology. 95: 2016-2025. DOI: 10.1890/13-0789.1





On 22/01/16 22:34, Marika Galanidi wrote:
Dear all,

I have been using Species Archetype Models (Dunstan et al 2011) to model
the distribution of benthic polychaete assemblages (presence/absence data).
I perform model selection as described in Leaper et al. (2014). However,
when using a particular sub-set of predictor variables, SpeciesMix returns
two different BIC values accompanied by different model parameters for the
exact same model (same predictor variables) at various stages in the
parameter selection process. Looking at the pi and tau values and the SEs
of the coefficients, one can draw certain conclusions but is there a more
rigorous way to proceed with model selection in this case?

Many thanks




Marika Galanidi
Post-Doctoral Researcher
Institute of Marine Science and Technology
Dokuz Eylul University, Izmir

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Scott Foster
CSIRO
E scott.fos...@csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to