Emmanuel wrote: > > There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et > al. developed a procedure to calculate a distance (also in Yang's 2006 > book). An example is given below with the woodmouse: > > matlog <- function(x) { > tmp <- eigen(X) > V <- tmp$vectors > U <- diag(log(tmp$values)) > V %*% U %*% solve(V) > } > > tr <- function(x) sum(diag(x)) > > data(woodmouse) > > PI <- diag(base.freq(woodmouse[1:2, ])) > Ft <- Ftab(woodmouse[1:2, ]) > F <- Ft/sum(Ft) > X <- solve(PI) %*% F > -tr(PI %*% matlog(X)) > > You have to call this code for each pair of sequences (or wrap it in a > function). > > For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol). >
Strictly speaking there *is* a formula for GTR, which I think may be equivalent to the above. The formula is given in my book "Inferring Phylogenies" on page 209 as equation 13.24. Note that for both of these, the equilibrium frequencies of the bases are estimated from the empirical frequencies in the two sequences. That means that for each distance, the equilibrium frequencies are somewhat different, as different sequences are being used to estimate the base frequencies. We are not, in the use of these formulas, estimating one overall set of equilibrium frequencies and then using that for all the the distances in our data set. The Rzhetsky-Nei 1994 distance formula is, I believe, an approximation, though probably a very good one. It is not quite the same as the estimate you would get by maximizing the likelihood. Distance formulas that involve empirically estimated base frequencies, which all of these do, have in common the problem that one either estimates those separately for each pair of sequences, or jointly estimates them from the whole dataset, without using a tree in the process. J.F. ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/