Hi Andy, Emmanuel,
The underflow problem can be solved by normalising the probabilities
at each node so that they never get too small. At each node, compute
the new probability vector v:
v = exp(Q t.l) * v.l * exp(Q t.r) * v.r
where t.l and t.r are the branch lengths for the left and right
daughters, and v.l and v.r are the probability vectors for these
branches. Normalise v by dividing by sum(v); this will become the
next initial condition. Now, just remember what you divided by, and
multiply these together as the final step for the likelihood (really,
sum the log of these to avoid the same issues). This is the algorithm
used by Mesquite for several of the likelihood calculations.
Replacing the dev() function within ace with the following function
agrees with the calculations from the original ace function and should
avoid underflow.
comp <- numeric(nb.tip + nb.node) # Storage...
dev <- function(p, output.liks = FALSE) {
Q[] <- c(p, 0)[rate]
diag(Q) <- -rowSums(Q)
for (i in seq(from = 1, by = 2, length.out = nb.node)) {
j <- i + 1
anc <- phy$edge[i, 1]
des1 <- phy$edge[i, 2]
des2 <- phy$edge[j, 2]
v.l <- expm(Q * phy$edge.length[i]) %*% liks[des1,]
v.r <- expm(Q * phy$edge.length[j]) %*% liks[des2,]
v <- v.l * v.r
comp[anc] <- sum(v)
liks[anc,] <- v / comp[anc]
}
if (output.liks)
liks[-(1:nb.tip), ]
else
-2 * sum(log(comp[-(1:nb.tip)]))
}
I separated out the matrix exponentiation, which can be done with
expm <- function(Q) {
tmp <- eigen(Q, symmetric = FALSE)
tmp$vectors %*% diag(exp(tmp$values)) %*% solve(tmp$vectors)
}
as in the original code, or
expm <- function(...) expm:::expm(..., method="Ward77")
which is faster, and possibly more stable, especially for large
matrices. It comes from the package expm, which is available here:
http://r-forge.r-project.org/projects/expm/
and can be installed by
install.packages("expm", repos="http://R-Forge.R-project.org")
This will still hit problems if there are zero branch lengths with
incompatible states on either side, though.
Cheers,
Rich
On 9-Jun-09, at 1:24 AM, Emmanuel Paradis wrote:
Hi Andy,
Your diagnostic seems correct: this is due to the convergence of the
likelihoods towards 0, then underflow occurs. Log-transformation
doesn't seem a solution here, and I don't see an obvious solution.
If anybody has a suggestion, that will be welcome. I'll try to
explore some possibilities.
EP
Le 06.06.2009 01:14, Andy Raduski a écrit :
Hi Everyone,
I have a problem with the ace function in the ape package. The same
problem appears whether or not I use Emmanuel Paradis' "patch"
posted on this discussion group recently.
Specifically, the problem seems to be related to the size of the
phylogeny. My data set consists of ~1100 taxa with four discrete
character states data (though I encounter the same problem with
binary character analyses, as well). When I run ace, I receive this
error:
"Error in eigen(Q * phy$edge.length[i], symmetric = FALSE) :
infinite or missing values in 'x'"
The error can be replicated on simulated binary character data,
although the above error is occasionally substituted by the
following, instead:
"Warning message: In sqrt(diag(solve(h))) : NaNs produced"
but only when the number of simulated tips is drastically reduced
(<500).
Over many simulated datasets I got both previously mentioned
errors, the one below, as well as a list of warnings (which usually
seem to result from flat likelihood surfaces):
"In nlminb(rep(ip, length.out = np), function(p)
dev(p), ... :imaginary parts discarded in coercion."
I am not entirely sure why these errors occur but it seems to be an
underflow issue with calculating likelihoods, because as the number
of tips in simulated data decrease, the error rate is reduced,
although simulations with fewer than 100 tips still occasionally
fail, and my dataset of 1100 species always fails. Incidentally, a
strikingly similar problem occurs in BayesTraits, which also cannot
be used to analyze large datasets.
I would greatly appreciate any help or advice that anybody can offer.
Thanks in advance,
Andy
Andy Raduski
Graduate Student, Igic Lab
University of Illinois at Chicago
840 W Taylor St., MC067
Chicago, IL 60607
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
--
Emmanuel Paradis
IRD, Montpellier, France
ph: +33 (0)4 67 16 64 47
fax: +33 (0)4 67 16 64 40
http://ape.mpl.ird.fr/
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo