Re: [R-sig-phylo] Interpretation of standard errors of parameter estimates in OUwie models

Masahito Tsuboi Wed, 04 Apr 2018 23:39:07 -0700

Dear R-sig-phylo contributors,

Thanks a lot for sharing your illuminating thoughts on the list.


I agree with points raised by others, and I'd like to add one suggestion. As 
already mentioned by several, alpha and sigma-squared can be hard to estimate 
with certainty. However, in some cases, the stationary variance 
(sigma-squared/2 alpha) converges well while sigma-squared and alpha separately 
do not. What such case means is that the expected amount of variance at an OU 
equilibria (= stationary variances) can be estimated reliably, but how long 
adaptation takes to get there (= alpha, or half life, log(2)/alpha) is highly 
uncertain. I thus would suggest Rafael to compute stationary variances from 
estimated sigma-squared and alpha, and see if you can make a better sense. 

Good luck. Thanks again for putting an interesting question on the table and 
for sharing thoughts.

Best,
Masahito

> On 5 Apr 2018, at 00:41, Marguerite Butler <mbutler...@gmail.com> wrote:
> 
> Aloha Rafael,
> 
> I am very glad that you are thinking about power of your analyses and 
> confidence in your parameter estimates.  Folks have chimed in with many 
> useful responses, however, I believe your original question was whether 
> uncertainty in parameter estimates indicate bad models and limits of 
> interpretation?
> 
> It’s important to note that model fit and parameter estimates are two 
> different aspects of assessment. It is perfectly possible to have robust 
> model selection (decent power) but have parameters that have a lot of 
> uncertainty and are hard to identify with any precision (large confidence 
> intervals).  That was pretty well demonstrated in Cressler, Butler, King 
> 2015. The specific outcome is data and model dependent, but as a 
> generalization, model selection is more robust than parameter estimation, and 
> within the parameters, the position of the thetas are a little bit easier to 
> narrow down than the alphas and sigmas. 
> 
> This is not too surprising when you think about the information content of 
> the data. We are really trying to fit some relatively complex models with few 
> data (one point per species, with phylogenetic structure to boot), and the 
> intuitive interpretation is that there are many ways to the same outcome.  If 
> the outcome is the particular pattern of phenotypes along the tree, it might 
> be possible to get that outcome with strong selection and little 
> stochasticity, or weak selection toward optima very far apart (and the 
> observed phenotypes are essentially “on the way” toward the optima), or very 
> weak selection and strong stochasticity (they were just knocked about that 
> way by drift), etc.  While this last scenario is most different than the 
> first two (the first two are more OU-type, whereas the last is more BM-like 
> in being dominated by stochasticity), my point is that there may not 
> necessarily be a unique set of parameters that explain the data. 
> 
> Nevertheless, it may still be possible to say something biologically useful. 
> First of all, I noticed that you are doing SIMMAP which is great for 
> exploring phylogenetic uncertainty, but this is a bit arms-length with regard 
> to OU parameter and model uncertainty. A bit apples and oranges. I would 
> strongly recommend that you try some parametric bootstrapping on your OU 
> models for both model selection and parameter estimation. In OUCH, there are 
> built-in facilities to make this really easy, probably there are in OUwie as 
> well (sorry I only stick with OUCH :). The key for the model selection 
> bootstrap is that you can simulate the data using the best-fit model, but 
> then fit all of the alternative models to see how often your best model 
> performs. I like to point folks to my former grad student’s paper (Scales et 
> al 2009).  If you want to be even more stastically thorough you can do the 
> Botteinger, Coop, and Ralph 2012 approach which basically does parametric 
> simulations on both the best and alternative models.   The parametric 
> bootstraps on the parameter estimates is much easier. and you can just run 
> the simulations and fit only the best model and draw your CI’s from that. 
> Maybe you’ve done this already. 
> 
> With regard to what can be said, as Liam and Brian and others have said, many 
> times there are large CIs because we encounter flat or ridged likelihood 
> surfaces.  So that may limit your conclusions somewhat to saying something 
> like we don’t know what alpha is, precisely, but we know it’s greater than 
> zero.  Etc.  In your specific case, what you want to know is whether theta 1 
> > theta 2, or something like that, right? So without needing to know the 
> precise location of the theta’s I would bet that you can look through your 
> parametric simulations and see how often that condition is satisfied.  You 
> may find that you don’t know what they are exactly, but one is always greater 
> than the other. 
> 
> I love your study question by the way. I too study colorful organisms in 
> bright and dark habitats. Best of luck!
> 
> Hth,
> Marguerite
> 
> Cressler C., Butler M.A., and King A. A. (2015) Detecting adaptive evolution 
> in phylogenetic comparative analysis using the Ornstein-Uhlenbeck model.  
> Sys. Bio. 64(6):953-968. DOI: 10.1093/sysbio/syv043
> 
> Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or 
> running for your dinner: What drives fiber-type evolution in lizard locomotor 
> muscles? Am. Nat.173:543-553. DOI: 10.1086/597613
> 
> Boettinger C., Coop G., and P. Ralph (2012) Is your phylogeny informative? 
> Measuring the power of comparative methods. Evolution 66-7: 2240-51. DOI: 
> 10.1111/j.1558-5646.2011.01574.x
> 
>> On Apr 4, 2018, at 10:27 AM, Brian O'Meara <omeara.br...@gmail.com> wrote:
>> 
>> Apparently the plot did not come through correctly. I'm attaching a PDF of 
>> it.
>> 
>> Best,
>> Brian
>> 
>> _______________________________________________________________________
>> Brian O'Meara, http://www.brianomeara.info <http://www.brianomeara.info/>, 
>> especially Calendar <http://brianomeara.info/calendars/omeara/>, CV 
>> <http://brianomeara.info/cv/>, and Feedback 
>> <http://brianomeara.info/teaching/feedback/>
>> 
>> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
>> Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
>> Associate Director for Postdoctoral Activities, National Institute for 
>> Mathematical & Biological Synthesis <http://www.nimbios.org/> (NIMBioS)
>> 
>> 
>> On Wed, Apr 4, 2018 at 4:24 PM, Brian O'Meara <omeara.br...@gmail.com 
>> <mailto:omeara.br...@gmail.com>> wrote:
>> Hi, Rafael et al.
>> 
>> ?OUwie has some information on the standard errors:
>> 
>> The Hessian matrix is used as a means to estimate the approximate standard 
>> errors of the model parameters and to assess whether they are the maximum 
>> likelihood estimates. The variance-covariance matrix of the estimated values 
>> of alpha and sigma^2 are computed as the inverse of the Hessian matrix and 
>> the standard errors are the square roots of the diagonals of this matrix. 
>> The Hessian is a matrix of second-order derivatives and is approximated in 
>> the R package numDeriv. So, if changes in the value of a parameter results 
>> in sharp changes in the slope around the maximum of the log-likelihood 
>> function, the second-order derivative will be large, the standard error will 
>> be small, and the parameter estimate is considered stable. On the other 
>> hand, if the second-order derivative is nearly zero, then the change in the 
>> slope around the maximum is also nearly zero, indicating that the parameter 
>> value can be moved in any direction without greatly affecting the 
>> log-likelihood. In such situations, the standard error of the parameter will 
>> be large.
>> 
>> For models that allow alpha and sigma^2 to vary (i.e., OUMV, OUMA, and 
>> OUMVA), the complexity of the model can often times be greater than the 
>> information that is contained within the data. As a result one or many 
>> parameters are poorly estimated, which can cause the function to return a 
>> log-likelihood that is suboptimal. This has great potential for poor model 
>> choice and incorrect biological interpretations. An eigendecomposition of 
>> the Hessian can provide an indication of whether the search returned the 
>> maximum likelihood estimates. If all the eigenvalues of the Hessian are 
>> positive, then the Hessian is positive definite, and all parameter estimates 
>> are considered reliable. However, if there are both positive and negative 
>> eigenvalues, then the objective function is at a saddlepoint and one or 
>> several parameters cannot be estimated adequately. One solution is to just 
>> fit a simpler model. Another is to actually identify the offending 
>> parameters. This can be done through the examination of the eigenvectors. 
>> The row order corresponds to the entries in index.matrix, the columns 
>> correspond to the order of values in eigval, and the larger the value of the 
>> row entry the greater the association between the corresponding parameter 
>> and the eigenvalue. Thus, the largest values in the columns associated with 
>> negative eigenvalues are the parameters that are causing the objective 
>> function to be at a saddlepoint.
>> 
>> You can get better (by which I mean more accurate) estimates by parametric 
>> bootstrapping (simulation). 
>> 
>> Doing a quick summary of your OUM results (code at end):
>> 
>> Trait: contr
>> nonfor: 0.12 (-0.12, 0.36)
>> inter: 0.12 (-0.12, 0.36)
>> fores: 0.13 (-0.13, 0.39)
>> Phylogenetic half life: 0.06
>> Tree height: 28.34
>> 
>> Trait: cshd
>> nonfor: 0.24 (-0.23, 0.72)
>> inter: 0.33 (-0.32, 0.98)
>> fores: 0.2 (-0.19, 0.6)
>> Phylogenetic half life: 6.35
>> Tree height: 28.34
>> 
>> Trait: dors
>> nonfor: 0.04 (-0.04, 0.12)
>> inter: 0.15 (-0.14, 0.44)
>> fores: 0 (0, 0.01)
>> Phylogenetic half life: 3.68
>> Tree height: 28.34
>> 
>> Trait: vent
>> nonfor: 0.05 (-0.05, 0.14)
>> inter: 0.32 (-0.3, 0.93)
>> fores: -0.03 (0.03, -0.08)
>> Phylogenetic half life: 5.43
>> Tree height: 28.34
>> 
>> It does seem that you don't have a lot of ability to tell optima apart. The 
>> alpha value is pretty low, though not tremendously (the half life is a good 
>> way to look at that). But the raw data also suggest it's not three very 
>> clear clumps (beeswarm plot, regimes on the x axis, trait 1 value on the y). 
>> It's not phylogenetically corrected, but still a good gut check:
>> 
>> 
>> 
>> I agree that for your data, it may be that simpler models work better. It 
>> could also explain why you're getting crazy theta values for models with V 
>> or A allowed to vary: without strong attraction, it's hard to figure out 
>> what the attraction is towards, and especially for the third regime, with 
>> only 11 data points, there probably isn't a lot of info to estimate many 
>> parameters for just that regime.
>> 
>> Best,
>> Brian
>> 
>> library(OUwie)
>> library(ape)
>> library(beeswarm)
>> 
>> data <- read.csv("~/Downloads/OUM_PARAMS_ant_f.csv", row.names=1)
>> load("~/Downloads/examplemodel.Rdata")
>> 
>> traits <- colnames(data)
>> 
>> regimes <- c("nonfor", "inter", "fores")
>> 
>> observations <- list()
>> 
>> for (trait.index in sequence(length(traits))) {
>>    cat(paste("\n\nTrait:", traits[trait.index]))
>>    for (regime.index in sequence(length(regimes))) {
>>        point.estimate <- data[paste0("theta.",regimes[regime.index]), 
>> traits[trait.index]]
>>        se <- data[paste0("theta.",regimes[regime.index]), 
>> traits[trait.index]]
>>        result.vector <- round(c(point.estimate,point.estimate-1.96*se, 
>> point.estimate+1.96*se),2)
>>        cat(paste0("\n",regimes[regime.index], ": ", result.vector[1], " (", 
>> result.vector[2], ", ", result.vector[3], ")"))
>>        if (trait.index==1) {
>>            observations[[regime.index]] <- 
>> bla$data[which(bla$data[,1]==regime.index),2]   
>>        }
>>    }
>>    cat(paste0("\nPhylogenetic half life: ", round(log(2)/data["alpha", 
>> traits[trait.index]], 2)))
>>    cat(paste0("\nTree height: ", 
>> round(max(ape::branching.times(bla$phy)),2)))
>> }
>> 
>> points.to.plot <- bla$data[,1:2]
>> colnames(points.to.plot) <- c("regime", "trait1")
>> beeswarm(trait1~regime, data=points.to.plot, col="black", pch=20)
>> 
>> 
>> 
>> 
>> _______________________________________________________________________
>> Brian O'Meara, http://www.brianomeara.info <http://www.brianomeara.info/>, 
>> especially Calendar <http://brianomeara.info/calendars/omeara/>, CV 
>> <http://brianomeara.info/cv/>, and Feedback 
>> <http://brianomeara.info/teaching/feedback/>
>> 
>> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
>> Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
>> Associate Director for Postdoctoral Activities, National Institute for 
>> Mathematical & Biological Synthesis <http://www.nimbios.org/> (NIMBioS)
>> 
>> 
>> On Wed, Apr 4, 2018 at 4:07 PM, Jacob Berv <jakeberv.r.sig.ph...@gmail.com 
>> <mailto:jakeberv.r.sig.ph...@gmail.com>> wrote:
>> Even if those models may fit the data better, they may not necessarily help 
>> Rafael determine whether or not the parameter estimates of interest are 
>> different across regimes (though perhaps BMS might be informative).
>> 
>> Rafael, maybe you could try fixing the ancestral regimes to match most 
>> likely states for each node from your SIMMAP summary? I wonder if that might 
>> decrease your ‘uncertainty’ in parameter estimates.
>> 
>> I don’t have a great answer for your main question though, which is a good 
>> one.
>> 
>> Jake
>> 
>>> On Apr 4, 2018, at 8:59 PM, William Gearty <wgea...@stanford.edu 
>>> <mailto:wgea...@stanford.edu>> wrote:
>>> 
>>> Hi Rafael,
>>> 
>>> Have you tried running the BM1, BMS, or OU1 models? If so, how do the AICc
>>> values for those compare to those of the OUM model? If not, you should make
>>> sure you run those.
>>> If you find that the these models fit your data better, this could indicate
>>> that there isn't a large different across regimes and an OUM model isn't
>>> necessarily suitable.
>>> 
>>> Best,
>>> Will
>>> 
>>> On Wed, Apr 4, 2018 at 12:30 PM, Rafael S Marcondes 
>>> <raf.marcon...@gmail.com <mailto:raf.marcon...@gmail.com>
>>>> wrote:
>>> 
>>>> Dear all,
>>>> 
>>>> I'm writing (again!) to ask for help interpreting standard errors of
>>>> parameter estimates in OUwie models.
>>>> 
>>>> I'm using OUwie to examine how the evolution of bird plumage color varies
>>>> across habitat types (my selective regimes) in a tree of 229 tips. I was
>>>> hoping to be able to make inferences based on OUMV and OUMVA models, but I
>>>> was getting nonsensical theta estimates from those. So I've basically given
>>>> up on them for now.
>>>> 
>>>> But even looking at theta estimates from OUM models, I'm getting really
>>>> large standard errors, often overlapping the estimates from other selective
>>>> regimes. So I was wondering what that means exactly. How are these erros
>>>> calculated? How much do high errors it limit the biological inferences I
>>>> can make? I'm more interested in the relative thetas across regimes than on
>>>> the exact values (testing the prediction that birds in darker habitats tend
>>>> to adapt to darker plumages than birds in more illuminated habitats).
>>>> 
>>>> I have attached a table averaging parameter estimates and errors from
>>>> models fitted across a posterior distribution of 100 simmaps for four
>>>> traits; and one exemplar fitted model from one trait in one of those
>>>> simmaps.
>>>> 
>>>> Thanks a lot for any help,
>>>> 
>>>> 
>>>> *--*
>>>> *Rafael Sobral Marcondes*
>>>> PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
>>>> 
>>>> Museum of Natural Science <http://sites01.lsu.edu/wp/mns/ 
>>>> <http://sites01.lsu.edu/wp/mns/>>
>>>> Louisiana State University
>>>> 119 Foster Hall
>>>> Baton Rouge, LA 70803, USA
>>>> 
>>>> Twitter: @brown_birds <https://twitter.com/brown_birds 
>>>> <https://twitter.com/brown_birds>>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>>>> <mailto:R-sig-phylo@r-project.org>
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>>>> Searchable archive at http://www.mail-archive.com/r- 
>>>> <http://www.mail-archive.com/r->
>>>> sig-ph...@r-project.org/ <http://sig-ph...@r-project.org/>
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> William Gearty
>>> PhD Candidate, Paleobiology
>>> Department of Geological Sciences
>>> Stanford School of Earth, Energy & Environmental Sciences
>>> williamgearty.com <http://williamgearty.com/>
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>>> <mailto:R-sig-phylo@r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>>> Searchable archive at 
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
>>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
>> 
>> _______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>> <mailto:R-sig-phylo@r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
>> 
>> 
>> <swarm.pdf>_______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>> <mailto:R-sig-phylo@r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> ____________________________________________
> Marguerite A. Butler
> Associate Professor
> 
> Department of Biology 
> 2538 McCarthy Mall, Edmondson Hall 216 
> Honolulu, HI 96822
> 
> Office: 808-956-4713
> Dept: 808-956-8617
> Lab:  808-956-5867
> FAX:   808-956-4745
> http://butlerlab.org
> http://manoa.hawaii.edu/biology/people/marguerite-butler
> http://www2.hawaii.edu/~mbutler
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Interpretation of standard errors of parameter estimates in OUwie models

Reply via email to