Hi All- This is an interesting discussion. I think there is clearly something going on. I do not get catastrophic Type I error rates from this exercise (only 'elevated') with discrete char simulations (an equal rates markov process) - see below. However, Jarrod's latent model seems reasonable and probably a decent approximation of many real phenomena, so it is extremely worrying that Type I error rates would be so high.
Also, I don't see why the BiSSE framework would be immune to this problem. If the "true" process of character change (regardless of its relationship to speciation and extinction) occurs with a latent/liability model, it seems that BiSSE would also be affected. This is something to think (and worry) about. #Simulations using pure birth tree with 100 tips; birthrate = 1.0. # pure birth used as these branch lengths are more consistent with those in a typical interspecific dataset #Rate = 0.01 Mean frequency of root state in pure birth tree of 100 tips: 0.930 Type I error rate: 0.12 # Rate = 0.05. Frequency of root state: 0.86 Type I error rate: 0.08 # Rate = 0.1. Frequency of root state: 0.77 Type I error rate: 0.08 #Rate = 1.0 Frequency of root state: 0.501 Type I error rate: 0.03 #Rate = 10 Frequency of root state: 0.49 Type I error rate: 0.04 #Simulation code library(geiger); # quick fix to deal with tree simulations in Geiger, eg birthdeath.tree # stops at exactly N taxa, but this gives a pair of 0 length branches: x <- birthdeath.tree(b=1, d=0, taxa.stop=101); x <- drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]); # We drop one of the tips with 0 length branches... ## Set up rate matrix: rate1 <-1; qmat <- list(rbind(c(-rate1, rate1), c(rate1, -rate1))); #FAST rate matrix ## PLot a tree with character states, just to get a feel for phylogenetic ## signal in the character state data: sim <-sim.char(x, qmat, model="discrete", n=1, root.state=1); cvec <- c('black', 'white'); plot.phylo(x, show.tip.label=F); tiplabels(pch=21, col='black', bg=cvec[sim], cex=0.9); NSIMS <- 100; fdif <- numeric(NSIMS); # vector for character frequencies pvec <- numeric(NSIMS); # vector for pvalues for (i in 1:NSIMS){ cat(i, '\n'); sim <-sim.char(x, qmat, model="discrete", n=1, root.state=1); # make sure at least both states are observed in dataset: while (sum(sim==1) == length(sim)){ sim <-sim.char(x, qmat, model="discrete", n=1, root.state=1); } tx <-table(sim); fdif[i] <- tx['1'] / sum(tx); m1 <- ace(sim, x, type = 'd', model ='SYM'); m2 <- ace(sim, x, type = 'd', model = 'ARD'); lrt <- -2*m1$loglik + 2*m2$loglik; pvec[i] <- 1 - pchisq(lrt, 1) } mean(fdif); sum(pvec <= 0.05); On Aug 16, 2012, at 11:04 AM, Matt Pennell wrote: > Jarrod and Dan, > > While I see what Dan is saying and I agree that evaluating this with > non-phylogenetic data is not entirely useful, I think you have stumbled upon > a known issue but one that is not widely appreciated. > > While the MK model is a useful model for discrete characters in many ways, it > may be inappropriate for such a test. If you assume that the root is equally > likely to be in either state with a lot of tips (i.e. approaching the limit), > the ML estimate for the ratio of q01 (transition rate from 0 to 1) to q10 > will converge on the ratio between number of tips in state 1 to number of > tips in state 0. So if you have a tree where most of the tips are in state 1 > then you will get support for the asymmetric change model as this best > explains the data. It is a very weird problem and perhaps I am incorrect in > this. > > Regardless, this result does not hold if the discrete character is modeled > simulataneously with the branching process of the phylogeny (i.e. BiSSE; > Maddison et al. 2007). > > So, in summation, I mostly agree with you. But this is not shocking behavior > of the model. If you are interested, a Bayesian implementation of the Mk > model can be found in the package diversitree in the function make.mk. > > again, i could be off base here. curious to see what others think? > > matt > > On Thu, Aug 16, 2012 at 10:58 AM, Dan Rabosky <drabo...@umich.edu> wrote: > > HI Jarrod- > > It isn't immediately obvious to me why the exercise below reflects something > problematic. In the first scenario, you have a random binary state but with > strong differences in frequency. Because there is effectively no phylogenetic > signal (as data are simply random), this suggests a high rate of transition > between states. I think that such asymmetry in frequencies would be highly > unlikely under a symmetric model of character change regardless of the root > state. I think it is useful to think about whether a symmetric process could > have given rise to a truly random distribution of tip data with strong > frequency differences - my intuition is that this is highly unlikely. > However, I would be happy to know what others think. > > Cheers, > ~Dan > > > On Aug 16, 2012, at 10:09 AM, Jarrod Hadfield wrote: > > > Hi, > > > > I have had a few replies off-list which have made me try and clarify what I > > mean. I think the distinction needs to be made between two types of > > probability: the probability that an outcome is 0 or 1 Pr(y| \theta) and > > the probability density of \theta, Pr(\theta | \gamma), indexed by > > parameter(s) \gamma. It seems to me that in order to make the problem > > identifiable an informative prior (or an assumption) is required for the > > root state. It seems to me that the strong prior Pr(\theta=0.5|\gamma) =1 > > is used, and then justified because int Pr(y=0| \theta)Pr(theta)dtheta=int > > Pr(y=1| \theta)Pr(theta)dtheta=0.5. A non-informative prior distribution > > for \theta could be U(0,1). This also induces a prior on y of the same > > form, int Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1| > > \theta)Pr(theta)dtheta=0.5, but that is not the main motivation for > > choosing U(0,1). > > > > For example, is this not worrying: > > > > library(ape) > > n<-100 > > tree<-rcoal(n) # random tree > > y<-rbinom(n, 1, 0.2) # random data unconnected to the tree > > m1<-ace(y, tree, type = "d", model="SYM") > > m2<-ace(y, tree, type = "d", model="ARD") > > anova(m1, m2) # asymmetric evolutionary transition rates strongly > > supported > > > > y<-rbinom(n, 1, 0.5) # random data unconnected to the tree but p=0.5 > > m1<-ace(y, tree, type = "d", model="SYM") > > m2<-ace(y, tree, type = "d", model="ARD") > > anova(m1, m2) # asymmetric evolutionary transition not supported > > > > Cheers, > > > > Jarrod > > > > > > > > > > > > > > Quoting Jarrod Hadfield <j.hadfi...@ed.ac.uk> on Thu, 16 Aug 2012 12:30:30 > > +0100: > > > >> Hi, > >> > >> I have been helping someone with some analyses and came across some > >> routines to estimate asymmetric transition rates between discrete > >> characters. This surprised me because its fairly straightforward to prove > >> that asymmetric transition rates cannot be identified using data collected > >> on the tips of a phylogeny. However when I run these routines (e.g. ace) > >> likelihood ratio tests often suggest strong support for asymmetric rates. > >> This seems to arise because there is an implicit (and unjustified) prior > >> point mass on the ancestral state *probability*. If anyone could point me > >> to reference that states this that would be great? I really don't want to > >> be saying we have support for processes that we need a fossil record, not > >> just a phylogeny, to understand. > >> > >> Cheers, > >> > >> Jarrod > >> > >> > >> > >> -- > >> The University of Edinburgh is a charitable body, registered in > >> Scotland, with registration number SC005336. > >> > >> _______________________________________________ > >> R-sig-phylo mailing list > >> R-sig-phylo@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> > >> > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > _______________________________________________ > > R-sig-phylo mailing list > > R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list > R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo