Hi folks-

I still think there is a difference between (i) parameter identifiability, 
which may or may not be a problem here, and (ii) strong support for the wrong 
model, which clearly appears to be occurring here (e.g., Type I error rates > 
0.75). I don't think non-identifiability of a parameter implies that you'll get 
massively inflated Type I error rates for models that include the parameter. 

Also, the root state (and assumptions regarding it) don't seem to drive the 
pattern - you can set the root to 0 under the threshold model (e.g., both 
states equiprobable at root) and you recover the same strong bias - Jarrod's 
threshold example set the root to -1, you can verify that the problem holds 
with equiprobable root states). At this point I guess I'd be more concerned 
about model misspecification and its diagnosis (when modeling discrete 
characters) than about identifiability per se, though perhaps I am not thinking 
about this correctly. Distinguishing MK/ARD from threshold may be difficult, 
but there is clearly signal of these processes in the data: when you fit MK/ARD 
to data simulated under a threshold model, you get strong support for ARD - but 
not when you fit those same models to data generated under MK. So - there is 
clearly something in the data that is sufficiently informative about the 
processes such that we observe high error rates. 

Mark, thanks for pointing out the relationship between the threshold model and 
the single-site covarion model. 

Cheers,
~Dan






On Aug 17, 2012, at 6:31 AM, Jarrod Hadfield wrote:

> Hi,
> 
> Thanks for the Allman & Rhodes paper, it is very nice. For me at least it 
> confirms my suspicions, but made me realise that claims of asymmetric 
> transition rates are only suspicious if you are unprepared to make some 
> (strong?) assumptions. If anyone disagrees with what I have written below, 
> then please tell me and I will try again to understand this stuff:
> 
> Identifiability is achieved because the pdf for the root state is the 
> stationary distribution (denoted by sigma in Allman & Rhodes: see example 1). 
>  This is, I believe, the default in newer versions of Mesquite, although in 
> older versions the distribution is 0.5/0.5 as in ace.
> 
> If the pdf of the root state is defined by an additional parameter, this 
> leaves a single parameter to describe the rate of transitions, and 
> asymmetrical transition rates are non-identifiable.  It seems to me there is 
> a choice to be made between a) assuming the same processes after the root 
> held before the root and talk about asymmetric transition rates or b) do not 
> make this assumption and then admit that the rates of transition from 0->1 
> and 1->0 are not separable. I don't think the data can be used to distinguish 
> between these view points, and so its a matter of personal choice which 
> interpretation/model is used.
> 
> Cheers,
> 
> Jarrod
> 
> 
> 
> 
> 
> 
> 
> 
> Quoting Mark Holder <mthol...@ku.edu> on Thu, 16 Aug 2012 23:41:45 -0500:
> 
>> Hi,
>> 
>> I agree that model testing between ARD vs MK models is going to be 
>> misleading when the process is really described by a threshold model (and 
>> sorry for ignoring that set of simulations by Jarrod previously; somehow I 
>> misfiled that email and didn't see it).
>> 
>> The threshold model has nice ways of dealing with correlations among 
>> characters.  However, when it is applied as the underlying model for a 
>> single binary character (as in Jarrod's sims), the threshold model is 
>> similar to the single-site version of the covarion model (Tuffley and 
>> Steel's version).
>> 
>> I don't think the models are identical, but they are quite similar. I 
>> suspect that if you generated a data set under one of the models, it would 
>> be quite hard to determine which was the generating model. Instead of just 
>> having a an "on" and "off" state (as in the covarion model), the threshold 
>> model has a continuum (the further the underlying continuous trait is from 
>> boundary, the more "off" the observable binary trait is).  Allman and Rhodes 
>> (2009, ref below) proved some results on the identifiability of 
>> generalizations of covarion processes. They considered models with more 
>> hidden rate categories (not just rate of zero and an rate of evolution when 
>> in the "on" state). I believe that their results were that the number of 
>> hidden rate categories that you can identify cannot exceed the number of 
>> observable states. So it may be hard to get much richer than the 
>> Tuffley+Steel covarion when you have a binary character.
>> 
>> Which is a long way of saying that, it might be worth looking at the 
>> covarion model variants for the types of data that Jarrod is interested in.  
>> Implementations of the covarion model for two states is quite fast and 
>> tractable. Testing Mk+covarion vs ARD+covarion may indeed be a more robust 
>> way of
>> detecting asymmetry in rates of character transitions compared to Mk vs ARD.
>> 
>> 
>> Thanks for pointing out the Boettiger et al paper, Matt.
>> 
>> 
>> all the best,
>> Mark
>> 
>> 
>> [1]  E. S. Allman and J. A. Rhodes, “The Identifiability of Covarion Models 
>> in Phylogenetics,” IEEE/ACM Transactions on Computational Biology and 
>> Bioinformatics, vol. 6, no. 1, pp. 76–88, Jan. 2009.
>> 
>> 
>> 
>> 
>> On Aug 16, 2012, at 10:07 PM, Matt Pennell wrote:
>> 
>>> correction: the last sentence should have read
>>> 
>>> I wonder how that would work in this case. I think these are important 
>>> questions going forward.
>>> 
>>> On Thu, Aug 16, 2012 at 11:00 PM, Matt Pennell <mwpenn...@gmail.com> wrote:
>>> Hey all,
>>> 
>>> This has been a really fantastic discussion. Mark, you make some really 
>>> excellent points in response to my earlier comments. I think you are 
>>> correct in this.
>>> 
>>> The question that arises out of Jarrod and Dan's simulations (which I have 
>>> just run) is whether a model selection criteria would be able to 
>>> distinguish MK from the threshold model that Felsenstein (and Wright before 
>>> him) put forth? And how do we best assess model adequacy? Carl Boettiger 
>>> and company (2012: Evolution) suggested a Phylogenetic Monte Carlo approach 
>>> for continuous characters. I wonder how that would before  I think these 
>>> are important questions going forward.
>>> 
>>> thanks again,
>>> matt
>>> 
>>> 
>>> 
>>> On Thu, Aug 16, 2012 at 10:43 PM, Dan Rabosky <drabo...@umich.edu> wrote:
>>> 
>>> Hi all-
>>> 
>>> A couple of points. I am actually less concerned about the Type I error 
>>> rates I gave in that previous message for the equal rates markov process, 
>>> even though I think they are real (e.g., I can corroborate them using 
>>> Diversitree). I don't think it is an issue of ascertainment bias, but I 
>>> think Mark may be right about the LRT being inappropriate with few events 
>>> on the tree and this may well explain the matter. This is probably worth 
>>> exploring further.
>>> 
>>> However, I am much more concerned about Jarrod's second model (with 
>>> underlying continuous latent variable). This seems to be a serious problem, 
>>> and if you simulate under the latent model, I think he is right that Type I 
>>> error rates are really, really high. The model is reasonable: there is a 
>>> continuous trait that influences the probability of observing a particular 
>>> tip state. In a practical sense, this probably means the following: (i) 
>>> some clades in the tree will essentially be fixed for the character in 
>>> question. (ii) Other clades will appear to have high lability of the 
>>> character. The clades that are fixed will be those clades where the 
>>> underlying threshold character (e.g., mean clade value) drifts towards -Inf 
>>> (or +Inf). Regardless of whether we think about this latent model, this at 
>>> least leads to an interesting - and probably quite relevant - form of model 
>>> misspecification. The model essentially induces some extra heterogeneity in 
>>> rates, such that some clades will appear to be switching quickly and others 
>>> slowly. However, it is still a symmetric model of sorts.
>>> 
>>> You can simulate data easily under this model and verify (using whatever 
>>> software) that it is a problem. I'm attaching code that does this. You can 
>>> play around with 3 parameters: (i) the number of taxa in the analysis (set 
>>> to 100); (ii) the expected variance of the continuous latency factor (from 
>>> roots to tip); and (iii) the root state. These parameters are NTAXA, 
>>> tipvar, and root in the code below.
>>> 
>>> I'm keen to see what others think, but it looks to me like you can simulate 
>>> very reasonable-looking datasets and obtain extremely strong support for an 
>>> asymmetric model - even though the model is quasi-symmetric. So, if these 
>>> hold, then I think this is a serious issue - nothing we routinely do in the 
>>> analysis of discrete characters is designed to detect this sort of model 
>>> misspecification.
>>> 
>>> ## A single simulation
>>> 
>>> library(diversitree);
>>> library(geiger);
>>> library(mvtnorm);
>>> 
>>> NTAXA <- 101;
>>> 
>>> # Generate the tree:
>>> x <- birthdeath.tree(b=1, d=0, taxa.stop=NTAXA);
>>> x <- drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]);
>>> 
>>> 
>>> vv <- vcv.phylo(x); # get phylogenetic vcv matrix
>>> 
>>> # Now we set the expected variance at the tips:
>>> #       e.g., the value we want for the diagonal of the vcv matrix
>>> #       If this is = 1, you'll have a "phylogenetic" standard normal 
>>> distribution
>>> #
>>> 
>>> tipvar <- 2;
>>> sf <- tipvar/max(vv); #get scale factor for vcv matrix
>>> 
>>> vmat <- vv*sf; # scale matrix
>>> root <- 0; #root state: this assumes the root is equally likely to give 
>>> either state
>>> 
>>> mu <- rep(root, nrow(vmat));  #vector of means
>>> 
>>> 
>>> # Simulate continuous, and then discrete, chars from
>>> #               the corresponding mvn and binomial distributions
>>> chars <- rmvnorm(1, mean=mu, sigma=vmat);
>>> states <- rbinom(length(chars), 1, prob=plogis(chars));
>>> names(states) <- x$tip.label;
>>> 
>>> # Look at the data...
>>> plot(x, show.tip.label=F);
>>> tiplabels(pch=21, bg = c('black', 'white')[(states+1)], col='black', cex=1);
>>> 
>>> #### Using Diverstree for model fitting:
>>> lfx <- make.mk2(x, states); # The asymmetric likelihood function
>>> lfxcon <- constrain(lfx, formulae = list(q01 ~ q10)); #constraining q01 ~ 
>>> q10
>>> 
>>> # Estimation...
>>> l2 <- find.mle(lfx, runif(2, 0, 5))$lnLik;
>>> l1 <- find.mle(lfxcon, runif(1, 0, 5))$lnLik;
>>> 
>>> #likelihood ratio test
>>> lrt <- -2*l1 + 2*l2;
>>> 1 - pchisq(lrt, df=1);
>>> 
>>> 
>>> ### End sim
>>> 
>>> Cheers,
>>> ~Dan
>>> 
>>> 
>>> 
>>> 
>>> On Aug 16, 2012, at 9:22 PM, Mark Holder wrote:
>>> 
>>> > Hi all,
>>> >       <apologies for the long email>
>>> >
>>> > I'm a bit more concerned with Dan's elevated Type-1 error rates than 
>>> > Jarrod's example.
>>> >
>>> > With respect to Jarrod's simulations, I have a few thoughts:
>>> >       1. I don't understand the claim (in the original email) that "its 
>>> > fairly straightforward to prove that asymmetric transition rates cannot 
>>> > be identified using data collected on the tips of a phylogeny" It seems 
>>> > like this is something that is routinely done in phylogenetics, and 
>>> > proofs of identifiability of GTR exist (demonstrating that  this indeed 
>>> > feasible and not some computational artifact).
>>> >
>>> >       2. I think that Dan's original response is correct. We should not 
>>> > expect to reject the null only 5% of the time if we simulate a bias in 
>>> > the states of the tips. Simulating data without respect to a phylogeny 
>>> > and then analyzing under Mk or "ARD" should be like a simulation in which 
>>> > the rate of character evolution is essentially infinite.  The Mk model 
>>> > predicts the states to be equally frequent, and so it is not problematic 
>>> > for a deviation from 50:50 to favor the ARD model over the Mk model. In 
>>> > fact, the simulation model is equivalent the ARD model with a rate of 
>>> > evolution approaching infinity, so we should prefer the ARD model.
>>> >
>>> >       3. (in reference to Matt's post) In the ARD model, it is possible 
>>> > for the MLE of the equilibrium state frequency of the less frequent state 
>>> > to be >0.5. Presumably, this is a rare occurrence, but I don't agree with 
>>> > the characterization of the ratio of parameters converging to the ratio 
>>> > of states.
>>> >
>>> >       Consider a clade with lots of tips in state 1 but tiny branch 
>>> > lengths. If this clade is found in the context of a tree with a few other 
>>> > branches that are long and lead to tips with state 0, then you can get an 
>>> > MLE of the state frequency for state 0 being > 0.5. Most of the tips will 
>>> > have state 1, but because they are easily explained by one transition to 
>>> > 1 you can still infer that the less frequent state has a higher 
>>> > equilibrium frequency.
>>> >
>>> >       Perhaps, I'm mis-reading what Matt is referring to when he 
>>> > discusses an analysis of a tree with "a lot of tips (ie. approaching the 
>>> > limit)." I do agree that if you simulate a very large tree under ARD 
>>> > (with the frequencies not equal to 0.5), then the frequency of the states 
>>> > at the tips will converge to the equilibrium state frequencies.
>>> >
>>> >
>>> >
>>> > With respect to Dan's results:
>>> >
>>> > The Type I error rate of 0.12 troubles me.  Have you tried exporting the 
>>> > data and seeing if other software agrees with the likelihood ratios 
>>> > returned by ace() ?  I looked at the code for ace and nothing looked 
>>> > amiss to me (though my R skills are virtually non-existent).
>>> >
>>> > If the result is corroborated by other software, then my best guesses 
>>> > would be:
>>> >       1. ascertainment bias (the simulation model clearly excludes 
>>> > constant patterns, but I don't believe that the inference model in ace 
>>> > does any correction for this), and/or
>>> >       2. the assumption that you can use the chi-square as the null 
>>> > distribution for the LRT probably breaks down when you have very few 
>>> > events on the tree. In some sense the number of events is our measure of 
>>> > the amount of data, and when we have very few events on the tree the 
>>> > asymptotic behavior of the LRT under the null is probably not going to 
>>> > help us.
>>> >
>>> > In the limiting case, when rates of character change are so low that you 
>>> > never see homoplasy, then I think the LRT of the the two models should 
>>> > get close to 1 on virtually any realization (conditional on starting in 
>>> > state 1 and having exactly 1 change on the tree, both model make the same 
>>> > predictions about the data; so in this realm the data should not prefer 
>>> > one model over the other).  So, I'm not sure how the small data 
>>> > explanation would explain your observation of an excess of large LRT 
>>> > statistics.
>>> >
>>> >
>>> > those are my 2 cents.
>>> >
>>> > all the best,
>>> > Mark
>>> >
>>> >
>>> 
>>> _____________________
>>> Dan Rabosky
>>> Assistant Professor
>>> Dept of Ecology and Evolutionary Biology
>>> & Museum of Zoology
>>> University of Michigan
>>> drabo...@umich.edu
>>> 
>>> 
>>> 
>> 
>> Mark Holder
>> 
>> mthol...@ku.edu
>> http://phylo.bio.ku.edu/mark-holder
>> 
>> ==============================================
>> Department of Ecology and Evolutionary Biology
>> University of Kansas
>> 6031 Haworth Hall
>> 1200 Sunnyside Avenue
>> Lawrence, Kansas 66045
>> 
>> lab phone:  785.864.5789
>> 
>> fax (shared): 785.864.5860
>> ==============================================
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> 
> 

_____________________
Dan Rabosky
Assistant Professor
Dept of Ecology and Evolutionary Biology
& Museum of Zoology
University of Michigan
drabo...@umich.edu


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to