Hi All-

This is an interesting discussion. I think there is clearly something going on. 
I do not get catastrophic Type I error rates from this exercise (only 
'elevated') with discrete char simulations (an equal rates markov process) - 
see below. However, Jarrod's latent model seems reasonable and probably a 
decent approximation of many real phenomena, so it is extremely worrying that 
Type I error rates would be so high. 

Also, I don't see why the BiSSE framework would be immune to this problem. If 
the "true" process of character change (regardless of its relationship to 
speciation and extinction) occurs with a latent/liability model, it seems that 
BiSSE would also be affected.  

This is something to think (and worry) about.

#Simulations using pure birth tree with 100 tips; birthrate = 1.0. 
# pure birth used as these branch lengths are more consistent with those in a 
typical interspecific dataset

#Rate = 0.01 
Mean frequency of root state in pure birth tree of 100 tips: 0.930
Type I error rate: 0.12
 
# Rate = 0.05.
Frequency of root state: 0.86
Type I error rate: 0.08

# Rate = 0.1.
Frequency of root state: 0.77
Type I error rate: 0.08

#Rate = 1.0
Frequency of root state: 0.501
Type I error rate: 0.03

#Rate = 10
Frequency of root state: 0.49
Type I error rate: 0.04

 

#Simulation code

 library(geiger);

# quick fix to deal with tree simulations in Geiger, eg birthdeath.tree
#       stops at exactly N taxa, but this gives a pair of 0 length branches:

x <- birthdeath.tree(b=1, d=0, taxa.stop=101);
x <- drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]);

# We drop one of the tips with 0 length branches...
 

## Set up rate matrix:
rate1 <-1;
qmat <- list(rbind(c(-rate1, rate1), c(rate1, -rate1))); #FAST rate matrix

## PLot a tree with character states, just to get a feel for phylogenetic 
##      signal in the character state data:

sim <-sim.char(x, qmat, model="discrete", n=1, root.state=1);
cvec <- c('black', 'white');
plot.phylo(x, show.tip.label=F);
tiplabels(pch=21, col='black', bg=cvec[sim], cex=0.9); 
 
NSIMS <- 100;
fdif <- numeric(NSIMS); # vector for character frequencies
pvec <- numeric(NSIMS); # vector for pvalues

for (i in 1:NSIMS){
        cat(i, '\n');
        sim <-sim.char(x, qmat, model="discrete", n=1, root.state=1);
        
        # make sure at least both states are observed in dataset:
        while (sum(sim==1) == length(sim)){
                sim <-sim.char(x, qmat, model="discrete", n=1, root.state=1);
        }
        
        tx <-table(sim);
        fdif[i] <- tx['1'] / sum(tx);
        
        m1 <- ace(sim, x, type = 'd', model ='SYM');
        m2 <- ace(sim, x, type = 'd', model = 'ARD');
        lrt <- -2*m1$loglik + 2*m2$loglik;
        pvec[i] <- 1 - pchisq(lrt, 1)
        
}

mean(fdif);
sum(pvec <= 0.05);



On Aug 16, 2012, at 11:04 AM, Matt Pennell wrote:

> Jarrod and Dan,
> 
> While I see what Dan is saying and I agree that evaluating this with 
> non-phylogenetic data is not entirely useful, I think you have stumbled upon 
> a known issue but one that is not widely appreciated.
> 
> While the MK model is a useful model for discrete characters in many ways, it 
> may be inappropriate for such a test. If you assume that the root is equally 
> likely to be in either state with a lot of tips (i.e. approaching the limit), 
> the ML estimate for the ratio of q01 (transition rate from 0 to 1) to q10 
> will converge on the ratio between number of tips in state 1 to number of 
> tips in state 0. So if you have a tree where most of the tips are in state 1 
> then you will get support for the asymmetric change model as this best 
> explains the data. It is a very weird problem and perhaps I am incorrect in 
> this. 
> 
> Regardless, this result does not hold if the discrete character is modeled 
> simulataneously with the branching process of the phylogeny (i.e. BiSSE; 
> Maddison et al. 2007).
> 
> So, in summation, I mostly agree with you. But this is not shocking behavior 
> of the model. If you are interested, a Bayesian implementation of the Mk 
> model can be found in the package diversitree in the function make.mk.
> 
> again, i could be off base here. curious to see what others think?
> 
> matt
> 
> On Thu, Aug 16, 2012 at 10:58 AM, Dan Rabosky <drabo...@umich.edu> wrote:
> 
> HI Jarrod-
> 
> It isn't immediately obvious to me why the exercise below reflects something 
> problematic. In the first scenario, you have a random binary state but with 
> strong differences in frequency. Because there is effectively no phylogenetic 
> signal (as data are simply random), this suggests a high rate of transition 
> between states. I think that such asymmetry in frequencies would be highly 
> unlikely under a symmetric model of character change regardless of the root 
> state. I think it is useful to think about whether a symmetric process could 
> have given rise to a truly random distribution of tip data with strong 
> frequency differences - my intuition is that this is highly unlikely. 
> However, I would be happy to know what others think.
> 
> Cheers,
> ~Dan
> 
> 
> On Aug 16, 2012, at 10:09 AM, Jarrod Hadfield wrote:
> 
> > Hi,
> >
> > I have had a few replies off-list which have made me try and clarify what I 
> > mean.  I think the distinction needs to be made between two types of 
> > probability: the probability  that an outcome is 0 or 1 Pr(y| \theta) and 
> > the probability density of \theta, Pr(\theta | \gamma), indexed by 
> > parameter(s) \gamma.  It seems to me that in order to make the problem 
> > identifiable an informative prior (or an assumption) is required for the 
> > root state.  It seems to me that the strong prior Pr(\theta=0.5|\gamma) =1 
> > is used, and then justified because int Pr(y=0| \theta)Pr(theta)dtheta=int 
> > Pr(y=1| \theta)Pr(theta)dtheta=0.5. A non-informative prior distribution 
> > for \theta could be U(0,1). This also induces a prior on y of the same 
> > form, int Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1| 
> > \theta)Pr(theta)dtheta=0.5, but that is not the main motivation for 
> > choosing U(0,1).
> >
> > For example, is this not worrying:
> >
> > library(ape)
> > n<-100
> > tree<-rcoal(n)             # random tree
> > y<-rbinom(n, 1, 0.2)  # random data unconnected to the tree
> > m1<-ace(y, tree, type = "d", model="SYM")
> > m2<-ace(y, tree, type = "d", model="ARD")
> > anova(m1, m2)     # asymmetric evolutionary transition rates strongly 
> > supported
> >
> > y<-rbinom(n, 1, 0.5)  # random data unconnected to the tree but p=0.5
> > m1<-ace(y, tree, type = "d", model="SYM")
> > m2<-ace(y, tree, type = "d", model="ARD")
> > anova(m1, m2)     # asymmetric evolutionary transition not supported
> >
> > Cheers,
> >
> > Jarrod
> >
> >
> >
> >
> >
> >
> > Quoting Jarrod Hadfield <j.hadfi...@ed.ac.uk> on Thu, 16 Aug 2012 12:30:30 
> > +0100:
> >
> >> Hi,
> >>
> >> I have been helping someone with some analyses and came across some 
> >> routines to estimate asymmetric transition rates between discrete 
> >> characters. This surprised me because its fairly straightforward to prove 
> >> that asymmetric transition rates cannot be identified using data collected 
> >> on the tips of a phylogeny. However when I run these routines (e.g. ace) 
> >> likelihood ratio tests often suggest strong support for asymmetric rates. 
> >> This seems to arise because there is an implicit (and unjustified) prior 
> >> point mass on the ancestral state *probability*. If anyone could point me 
> >> to reference that states this that would be great? I really don't want to 
> >> be saying we have support for processes that we need a fossil record, not 
> >> just a phylogeny, to understand.
> >>
> >> Cheers,
> >>
> >> Jarrod
> >>
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered in
> >> Scotland, with registration number SC005336.
> >>
> >> _______________________________________________
> >> R-sig-phylo mailing list
> >> R-sig-phylo@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >>
> >>
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> > _______________________________________________
> > R-sig-phylo mailing list
> > R-sig-phylo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >
> >
> 
> 
> 
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 

 
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to