Re: [R-sig-eco] SAMs parameter selection

Scott Foster Sun, 24 Jan 2016 17:56:17 -0800

Dear Marika,

I'm really glad that more people are starting to use SAMs, and other 
model-based methods.


Here are some answers to your questions (as best I can).

The model-selection procedure outlined in Leaper et al. (2014) is just a variant of the much-loved, much-hated, and often-used backwards eliminationmethod. The variation is that the number of species-archetypes is selected first (so that the number of potential models is not unmanageably huge).

I presume that you get two different BIC values for two different instances of maximising the likelihood? That is, from two different calls toSpeciesMix()? This can occur, and often does, as the process of maximising the (log-)likelihood can get stuck in local maxima -- this is a 'feature'of any type of model that has random/latent factors/variables but seems to be quite acute in mixture models. The remedy for this, as outlined in theestimation section of Dunstan et al (2011) and buried in the application section of Dunstan et al (2013), is to perform multiple starts -- the morethe merrier. The model that you should use for comparisons is the one that finds the global maxima (the highest likelihood -- lowest BIC -- that youobserve). Performing multiple starts will increase computation time, but it will also reduce the possibility of making inference from a sub-optimalmodel. You should perform multiple starts.

I would also recommend, as you allude to, that you have a look at the values of tau -- the (posterior) probability of each species belong to eacharchetype group -- and pi the probability of any new species (with no data) belonging to each group. This can help remove a certain type of'miss-fit', which is likely to be a singularity in the (log-)likelihood surface.

A colleague has been busy trying to make model selection more robust. In particular, he has looked at alternatives to using BIC in SAMs and relatedmodels (Hui et al 2015a), which aims to remove one of the question marks in SAMs by getting a decent criterion for choosing between models. He hasalso looked at automated methods based on regularisation shrinkage (Hui et al 2015b). Both are great additions to the arsenal for SAMs. However,both are not (yet?) incorporated in the SpeciesMix R-package.

Lastly (and I hope that this is not just in my opinion), model selection for any analysis is difficult irrespective of how complex the modellingframework is. Automating the process can make the process seem objective, but in truth there will always be assumptions made and personal preferenceswill come through. To my mind, the modelling process is enhanced by context-specific information that is only available from experts (generally thepeople that obtained the data). Such things as "polychaete assemblages are highly likely to vary with sediment size" are just the beginning...However, formalising this process is difficult and it is even more difficult to convince editors/reviewers/readers that you have done an excellent jobwithout resorting to well established algorithms for model selection.


I hope that this answers your questions.  Let me know if you have any more.

Cheers,

Scott (SpeciesMix contributor)

Dunstan, Foster and Darnell (2011) Model based grouping of species across environmental gradients. Ecological Modelling. 222: 955-963. DOI:10.1016/j.ecolmodel.2010.11.030Dunstan, Foster, Hui and Warton (2013) Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology. Journal ofAgricultural, Biological and Environmental Statistics. 18: 357-375. DOI: 10.1007/s13253-013-0146-xHui, Warton and Foster (2015a) Order selection in finite mixture models: complete or observed likelihood information criteria? Biometrika. 102:724-730. DOI: 10.1093/biomet/asv027Hui Warton and Foster (2015b) MULTI-SPECIES DISTRIBUTION MODELING USING PENALIZED MIXTURE OF REGRESSIONS. Annals of Applied Statistics. 9: 866-882.DOI: 10.1214/15-AOAS813Leaper, Dunstan, Foster, Barrett and Edgar (2014) Do communities exist? Complex patterns of overlapping marine species distributions. Ecology. 95:2016-2025. DOI: 10.1890/13-0789.1






On 22/01/16 22:34, Marika Galanidi wrote:

Dear all,

I have been using Species Archetype Models (Dunstan et al 2011) to model
the distribution of benthic polychaete assemblages (presence/absence data).
I perform model selection as described in Leaper et al. (2014). However,
when using a particular sub-set of predictor variables, SpeciesMix returns
two different BIC values accompanied by different model parameters for the
exact same model (same predictor variables) at various stages in the
parameter selection process. Looking at the pi and tau values and the SEs
of the coefficients, one can draw certain conclusions but is there a more
rigorous way to proceed with model selection in this case?

Many thanks




Marika Galanidi
Post-Doctoral Researcher
Institute of Marine Science and Technology
Dokuz Eylul University, Izmir

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Scott Foster
CSIRO
E scott.fos...@csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] SAMs parameter selection

Reply via email to