[R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023
A reminder: applications due March 31. The Evolutionary Quantitative Genetics Workshop for 2023 will take place at Friday Harbor Laboratories of the University of Washington from June 4-9. The workshop will review the basics of theory in the field of evolutionary quantitative genetics and its connections to evolution observed at various time scales. One aim of the workshop is to build a bridge between the traditionally separate disciplines of quantitative genetics and phylogenetic comparative methods. The workshop involves lectures, discussions and in-class computer exercises. This workshop has been given yearly since 2011, and at Friday Harbor Laboratories since 2017. It was canceled in 2020, and given as an online workshop in 2021 and 2022. It is intended for graduate students, postdocs, and junior faculty. Information on the subject area, lecturers, costs and application form will be found at: https://fhl.uw.edu/courses/course-descriptions/course/evolutionary-quantitative-genetics-workshop-2023/ and more information, including details of the Workshop in recent years, will be found at https://eqgw.github.io Joe Felsenstein and Steve Arnold -- Joe Felsensteinfelse...@gmail.com, felse...@uw.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA - PS: please do not use j...@gs.washington.edu, which is an alias that some mail systems now mistake as indicating spam. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023
K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP20eBi7Dw$ > > Introduction to Bayesian modelling with INLA (BMIN02) > May 22nd - 26th > https://urldefense.com/v3/__https://www.prstatistics.com/course/introduction-to-bayesian-modelling-with-inla-bmin02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP09ie8kTQ$ > > Reproducible and collaborative data analysis with R (RACR02) > June 13th - 15th > https://urldefense.com/v3/__https://www.prstatistics.com/course/reproducible-and-collaborative-data-analysis-with-r-racr02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP2JO-LW_Q$ > > Species Distribution Modelling With Bayesian Statistics Using R (SDMB05) > September 4th - 8th > https://urldefense.com/v3/__https://www.prstatistics.com/course/online-course-species-distribution-modelling-with-bayesian-statistics-using-r-sdmb05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP0ZmkHHkg$ > > Multivariate Analysis Of Ecological Communities Using R With The VEGAN > package (VGNR05) > September 18th - 22nd > https:// > https://urldefense.com/v3/__http://www.prstatistics.com/course/multivariate-analysis-of-ecological-communities-using-r-with-the-vegan-package-vgnr05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP3z5_1e7g$ > > -- > > Oliver Hooker PhD. > PR statistics > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1_L29Apw$ > Searchable archive at > https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1XIWsucQ$ -- Joe Felsensteinfelse...@gmail.com, felse...@uw.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA - PS: please do not use j...@gs.washington.edu, which is an alias that some mail systems now mistake as indicating spam. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylogenetic correlation analysis
Whether one gets them from PGLS directly or from contrasts, one can get correlations by just inferring the covariance matrix and then calculating r(x,y) = Cov(x,y) / (Var(x) Var(y))^(1/2) where of course Var(x) is also Cov(x,x), and so on. You would not need a separate run to get correlations. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?
Gabriel Ferreira explained: > I will try to better explain my problem, and I really appreciate your time > to help me with this issue. > My study is a conventional ecomorph with linear and univariate > measurements > > So... Some of my traits are linear measures that can and must be > "corrected" by body size, such as tail length. I usually conduct such body > size corrections with phylogenetic regressions using *gls *func. from > *nlme*: > lots of details ... > So I could use the residuals of this regression in the phy ANOVAs ... Does > it make sense? > Well, I am afraid I am lost. Perhaps someone else here could explain the issues to me ... Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?
If wants residuals of values of a trait in each species, taking into account within-species variation and phylogeny, what does it mean if those residuals correlate with those of some other character, or with an environmental variable? Just asking which R machinery to use might wait until it is clear what the intended task is and why that makes sense. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Marguerite asked: > First - Joe - what do you mean by log(grams) has no units? The units of grams > is a unit, so log(mass) will have units of log-gm. As log is not the same as > 1/gm, log(gm) cannot be unit-free. I looked it up on Wikipedia, and was assured by it that Marguerite is right, log(weight) has the same units as weight, except you're supposed to call them log-gm. I do think that unless there is a calibration with time, the tree branch lengths are not in units of time but in units of base substitutions per site. And I remain confused on what the units are, if you compute a linear combination such as 2 log(wt) - 3 log(height). Which princip[al, le] components machinery does. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Ted Garland asked: OK, Joe, that's for one trait at a time. > Would you please continue your discourse, but extend to multiple traits > and their covariances > OK, assuming that’s not a joke which it seems it was. If all characters are log of something, their variances all have units of sites squared per square substitution. But if you have different units in different characters each variance would have units. (CharXunit) squared times sites-squared per square substitution. A covariance would be (CharXunit times CharYunit) times sites squared per square substitution. If you had a principal component (usually misnamed a “principle” component) it is in terms of a linear combination of characters, and I am deeply puzzled how to give their units as they mix them. Joe -- Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Folks -- If the trait is log(grams) then the trait is unit-free. The "time" is probably branch length from a phylogeny. That in turn (from DNA data) is usually DNA substitutions per site. So the units of the standard deviation aresites per substitution. But this is not the standard deviation, it is its square. So (wait for it ...)square sites per square substitution Now that is pretty weird. But years ago people pointed out to me that quantitative geneticists were accustomed to inferring variance components of crop yield. The yield might be in bushels per acre. So the units of its variance was: square bushels per square acre. Don't even try to think about how you square a bushel, or how many dimensions you have to go into to square an acre. Actually, you can think about them: a bushel is three-dimensional volume, and an acre is two dimensional area. So crop yield has units of meters, and variance of crop yield should have units of square meters. That way lies madness ... Joe - Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] collapsing sets of nodes based on label values
Karla Shikev -- > I just want to take a tree with node labels (e.g. > bootstrap values from a RAxML analysis) and collapse all nodes with support > values below, say, 90%. People here have given you various ways of doing this in R. I am not very familiar with the R packages, but just wanted to add one more way. If you have the original set of trees that were used to get the bootstrap values, you can do a version of the bootstrap that makes a tree which shows only the groups that are present in 90% or more of the input trees. This is called an M(0.90) consensus tree (the 0.90 is in a subscript which I can't typeset in an email). It is available in the program Consense in my PHYLIP package, and also in some other phylogeny packages. If you prefer R you can use Liam Revell's Rphylip package, together with installing a copy of PHYLIP, as Rphylip serves as a front-end for PHYLIP within R. J.F. ----- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Multivariate ASR with Discrete Characters
William Gearty -- > > Thank you Don and Julien for your suggestions. I was able to run threshml > using Rthreshml from the Rphylip package. However, now I'm not really sure > what to do with the results. Can I use the output to perform ancestral > state reconstruction/estimation for the continuous (3) and discrete (1) > characters? I see that phytools has ancThresh, but that only seems to work > with a single character. I don't know what are the limitations of Rphylip's interface with Threshml, but I can address Threshml itself -- as I wrote it. In principle the liabilities sampled at the interior nodes of the tree could be used to do ancestral reconstructions at those nodes, in a very straight- forward way. Just consider them, not the rest of the tree, and make a histogram of them at the node. However I never bothered to put in any interface to do that. However note that what Threshml does is to transform the liabilities to independent variables, using the current best estimate of the covariance matrix of liabilities. The Gibbs sampling (and at the tips, the Metropolis/Hastings sampling) occurs in the independent characters. Then you have to transform back to the liabilities before you have samples for those at the interior nodes. So matters are not simple, but in principle it can be done. Note that for continuous traits Threshml can also be used to sample values at the interior nodes. Of course this is a lot more computation than just using likelihood computations -- I've checked and the sampling does infer the same covariances in that case. In this all-continuous case the same issue of transforming to independent characters also comes up. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic varying slopes and intercepts model
Oscar Inostraza -- Is the variable x also evolving on the tree? If so you need to use standard phylogenetically-informed comparative methods to estimate the variances and covariances of changes in both characters. You may not be able to assume that y responds instantly to x. J.F. Joe Felsenstein, j...@gs.washington.edu Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA. 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Nantucket Phylogenetics DevelopeR Workshop
> > We are very happy to announce the 2nd edition of a graduate-level > workshop on phylogenetic method development R. The course will be four > days in length and take place at the University of Massachusetts > Boston's Nantucket Field Station from the 5th to the 8th of November, > 2019. Between this and the Workshop on Molecular Evolution (Woods Hole), the Applied Phylogenetics Workshop (Bodega Bay, California), and the Evolutionary Quantitative Genetics Workshop (Friday Harbor Laboratories), there seems to be a major competition to see who can have their course in the most picturesque marine laboratory. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Link between Mahalanobis distance and PGLS
Guillaume Louvel asked: > > So my first question is: can we directly apply the Mahalanobis distance > to measure a kind of "phylogeny-corrected" distance between 2 vectors of > trait values for a list of species? Since we assume a brownian motion, > we know these vectors should be drawn from a multivariate normal > distribution with known covariance matrix. Therefore the Mahalanobis > distance seems perfectly appropriate to me, is it the case? It is appropriate. In fact this is in effect what regressing contrasts in trait Y on contrasts in trait X is doing. One can alternatively use a multivariate regression appoach, which is what Alan Grafen (1989) did, and the results are the same either way (in the simplest cases). Note that although the contrasts can be treated as independent observations, that is not true for the tip species values -- the Grafen "Phylogenetic Regression" does not treat the tip values as independent, and for the same reason pairwise distances between tips are not independent. > > I don't want to do a statistical test per se, I am rather interested in > ranking many traits according to their distance to a pattern of reference. I am unclear about what that means. > > My second subsidiary question is: can I apply this Mahalanobis distance > if my traits are binary (e.g. presence-absence of some sequence in the > genomes). In that case I know that my trait is not multivariate normal, > but considering that I have millions of traits, shouldn't I expect the > whole set to have some normal characteristics? Basically no. Although people have approximated binary traits by Gaussian variables (I think Paul Harvey and Mark Pagel did in their 1991 book), it is much more appropriate to use a threshold model. See my 2012 paper in American Naturalist or the earlier 2005 sketch of the method in Proc. Royal Society of London series B. A good paper agonizing about all this is: Maddison WP & FitzJohn RG. 2015. The unsolved challenge to phylogenetic correlation tests for categorical characters. Systematic Biology 64: 127–136 though I'd say that the problem is not as "unsolved" as they think. > Finally, if none of the approach above is justified, is there a > multivariate phylogenetic method for discrete/binary traits? Some kind > of adapted phylogenetic PCA ? See above. It does require MCMC, and cannot simply be done with distances. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extended Majority Consensus Topology (Allcompat Summary) in R? And some observations on ape's consensus() function
David Bapst asked: > > I was interested if anyone was familiar with R code that can estimate an > extended majority consensus tree (referred to as an 'allcompat' tree by the > sumt command in MrBayes)? This is a fully bifurcating summary of a tree > posterior, where each clade is maximally resolved by the split that is most > abundant in the considered post-burn-in posterior (i.e., that split which > has the plurality, if not the majority - the highest posterior probability > of any other competing, conflicting splits recovered within the posterior. > So, I guess one could also call these plurality consensus trees...). You can also get the Extended Majority Rule consensus tree from the Rconsense function in Liam Revell's package Rphylip, which is calls programs from my PHYLIP package. Consense in PHYLIP does output, in addition to the consensus tree itself, a list of partitions found in the input trees, and the frequency of each. Rconsense may be able to do that too. Speaking as the one who defined the EMR method, I need to make a warning: EMR makes a tree by ordering the partitions (splits) in order of their frequency. To make Margush and McMorris's Majority Rule consensus tree one simply goes down this list and takes all the partitions that occur more than 50% of the time. EMR continues further, in order to resolve the tree fully. It keep accepting partitions as long as they don't conflict with anything already accepted. But this means that two partitions could be found (say) 45% of the time each. They could both be compatible with the partitions in the majority-rule tree, but in conflict with each other. Which then gets included then depends only on which one is encountered first as one goes down the list of most-frequent partitions. And that will just be a matter of things like the order in which the tree containing them occurs among the input trees. That is one of the limitations of the EMR method. Note also that EMR is subtly different from finding the largest set of split (partitions) that are all mutually compatible. That will often be the same, but not always. The latter is called the Nelson Consensus Tree. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Estimating marginal ancestral states under a non-reversible model of evolution.
Jordan Gault asked: > I would like to do a worked example for an unequal-rates model as well but my > understanding is that the re-rooting method is inappropriate for > non-reversible models of trait evolution. How are marginal ancestral states > estimated for non-reversible models of trait evolution? Why do "unequal rates" make the model nonreversible? I assume that by unequal rates you mean not rates different in different sites, but forward and backward rates different at a single site. If that is the case, the equilibrium frequencies of the two states become correspondingly different, and the model is still reversible. The tree can be rerooted at any interior node and the marginal ancestral states at that node found by the usual likelihood "pruning" algorithm. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] estimate ancestral state with OUwie models
Let me add more warnings to Marguerite and Thomas's excellent responses. People may be tempted to infer ancestral states and then treat those inferences as data (and also to infer ancestral environments and then treat those inferences as data). In fact, I wonder whether that is not the main use people make of these inferences. But not only are those inferences very noisy, they are correlated with each other. So if you infer the ancestral state for the clade (Old World Monkeys, Apes) and also the ancestral state for the clade (New World Monkeys, (Old World Monkeys, Apes)) the two will typically not only be error-prone, but will also typically be subject to strongly correlated errors. Using them as data for further inferences is very dubious. It is better to figure out what your hypothesis is and then test it on the data from the tips of the tree, without the intermediate step of taking ancestral state inferences as observations. The popular science press in particular demands a fly-on-the-wall account of what happened in evolution, and giving them the ancestral state inferences as if they were known precisely is a mistake. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Detection association between traits
Rafa Braga asked > > Does anybody know how to detect association between two traits while removing > the influence of the third one? I have three traits which are named a b and > c. I'm wanting to test the correlation of a and b while controling c as a > covariate. In other words, the correlation between trait a and b can be a > result of the correlation between b and c and I'm wanting to remove the > influence of c on b. This goes all the way back 100 years. Once you have inferred the variances and covariances of the three traits, you can compute the partial correlation between a and b. This is defined as the correlation between the residual of a when predicted by c, and the residual of b when predicted by c. When there is only one variable c, and when r(a,b) is the correlation of a with b, then the partial correlation is [ r(a,b) - r(a,c) r(b,c)] / [(sqrt(1 - r(a,c)^2) sqrt(1 - r(b,c)^2)] For likelihood ratio testing in a linear model, one could compare the likelihoods of models that did or did not have direct connection of a and b. RA Fisher also derived a distribution of the partial correlation coefficient. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA > > > > Yours, > Rafa > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic PCA and measurement error
Rafael and Liam -- > > So far as I know, there is currently no way to explicitly take into account > sampling error in computing principal components while also accounting for > the phylogeny. However, it is relatively straightforward to compute scores > for individuals from a PCA conducted on species means. > > This would look as follows (in which Xm is a matrix containing values for > species for each trait, and Xi is a matrix with the same number of columns > but containing values for individuals): > > pca<-phyl.pca(tree,Xm) > Si<-Xi%*%pca$Evec > > Then, if you have a separate vector containing species ID as a factor, you > could compute means and variances for each component by species. There are methods (not all implemented in R) for taking into account the within-species phenotypic covariation among individuals and also the evolutionary covariances between species (on a phylogeny). These include the method of Ives, Midford, and Garland (2007) and my own method (2008). The former assumes you know the within-species covariance matrix, the latter estimates it from the sampled individuals for each species. Both of these assume that the true within-species phenotypic covariance matrices are the same for all species. For my method, you can use Liam's package Rphylip to call my program Contrast if you also have PHYLIP installed. Once you infer these covariance matrices, those are the relevant sufficient statistics (if the distributions are multivariate normal). The PCA axes for either covariance matrix can then be computed from those, or the canonical variates axes, which are the principal components of the between-species covariation relative to the within-species covariation. You don't need to use the PCA machinery until after you estimate these two covariance matrices. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylogenetic signal with sample sizes but no standard errors
There is also my C program Contrast, which implements a method from a 2008 paper I wrote: Felsenstein, J. 2008. Comparative methods with sampling error and within-species variation: contrasts revisited and revised. American Naturalist 171: 713-725. This estimates the within-species covariances and the between-species evolutionary variances.It is not an R program but can be accessed through Liam Revell's package Rphylip, if you also have my (non-R) package PHYLIP installed. A pretty good (but not ML) estimate of the within-species phenotypic variance can be gotten by pooling the within-species sampling error. The harder part is using that to correct one's estimate of the covariances of the between-species change, which using ordinary methods will have some within-species variation mixed in. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylip file compression ratios
Jacob Berv noted: > > I noticed today that the compression ratio for an interleaved phylip file > (zip compressed) was about 84:1, (390MB uncompressed —> 4.6MB compressed) > whereas the compression ratio for the same data non-interleaved was a much > worse 3.4:1 (390 MB uncompressed —> 113.9 MB). Not knowing much about how zip > compression actually works - I thought this might be an interesting > observation for the group… Interleaved sequences have blocks of (say) 50 bases. Successive lines may repeat a whole block or nearly repeat it. I wonder whether that makes the interleaved format easier to compress. I would guess that the compressibility of interleaved sequences would be highest when the sequences are closely related. In that case there would be 50-base blocks of nearly identical sequences. With less closely related sequences the compressibility should be much lower. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Average Aminoacid Identity tree with bootstrap support. Is it possible?
Salvador Espada Hinojosa -- The problem, of course, is that a random substitution on an interior branch of a tree increases or decreases the size of more than one distance in the distance matrix. The distances aren't independent in their statistical noise. So you can't just sample distances after the distance matrix is computed. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] HKY GTR distances
Emmanuel wrote: > > There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et > al. developed a procedure to calculate a distance (also in Yang's 2006 > book). An example is given below with the woodmouse: > > matlog <- function(x) { > tmp <- eigen(X) > V <- tmp$vectors > U <- diag(log(tmp$values)) > V %*% U %*% solve(V) > } > > tr <- function(x) sum(diag(x)) > > data(woodmouse) > > PI <- diag(base.freq(woodmouse[1:2, ])) > Ft <- Ftab(woodmouse[1:2, ]) > F <- Ft/sum(Ft) > X <- solve(PI) %*% F > -tr(PI %*% matlog(X)) > > You have to call this code for each pair of sequences (or wrap it in a > function). > > For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol). > Strictly speaking there *is* a formula for GTR, which I think may be equivalent to the above. The formula is given in my book "Inferring Phylogenies" on page 209 as equation 13.24. Note that for both of these, the equilibrium frequencies of the bases are estimated from the empirical frequencies in the two sequences. That means that for each distance, the equilibrium frequencies are somewhat different, as different sequences are being used to estimate the base frequencies. We are not, in the use of these formulas, estimating one overall set of equilibrium frequencies and then using that for all the the distances in our data set. The Rzhetsky-Nei 1994 distance formula is, I believe, an approximation, though probably a very good one. It is not quite the same as the estimate you would get by maximizing the likelihood. Distance formulas that involve empirically estimated base frequencies, which all of these do, have in common the problem that one either estimates those separately for each pair of sequences, or jointly estimates them from the whole dataset, without using a tree in the process. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Fwd: Workshop: UWashington.EvolQuantGenetics.Jun5-9
Evolutionary Quantitative Genetics Workshop Friday Harbor Laboratories, University of Washington, 5-9 June 2017 Non-credit workshop. Participants arrive at Friday Harbor Labs on Sunday, June 4, lectures and exercises occur June 5-9, and participants depart on Saturday, June 10, 2017. Application deadline March 1, 2017. Application forms and details here: http://tinyurl.com/EQG2017 Web page: http://depts.washington.edu/fhl/studentSummer2017.html#SumB-genetics Instructors: Dr. Joe Felsenstein Department of Genome Sciences University of Washington, Seattle j...@gs.washington.edu Dr. Stevan J. Arnold Department of Integrative Biology Oregon State University, Corvallis stevan.arn...@oregonstate.edu Previous versions of this 5-day workshop were given at the National Evolutionary Synthesis Center (NESCENT) at Duke University in Durham, North Carolina from 2011-2013, and then at the National Institute for Mathematical and Biological Synthesis (NIMBioS) at the University of Tennessee in Knoxville from 2014-2016. Like past versions, the Friday Harbor version will review the basics of theory in the field of evolutionary quantitative genetics and its connections to evolution observed at various time scales. The aim of the workshop is to build a bridge between the traditionally separate disciplines of quantitative genetics and comparative methods. Quantitative genetic theory for natural populations was developed considerably in the period from 1970 to 1990 and up to the present, and it has been applied to a wide range of phenomena including the evolution of differences between the sexes, sexual preferences, life history traits, plasticity of traits, as well as the evolution of body size and other morphological measurements. Textbooks have not kept pace with these developments, and currently few universities offer courses in this subject aimed at evolutionary biologists. Evolutionary biologists need to understand this field because of the ability to collect large amounts of data by computer, the development of statistical methods for changes of traits on evolutionary trees and for changes in a single species through time, and the realization that quantitative characters will not soon be fully explained by genomics. This workshop aims to fill this need by reviewing basic aspects of theory and illustrating how that theory can be tested with data, both from single species and with multiple-species phylogenies. Participants will use R, an open-source statistical programming language, to build and test evolutionary models. The workshop involves lectures and in-class computer exercises. You can consult the 2016 tutorial website for examples, it will be found at http://www.nimbios.org/tutorials/TT_eqg2016 The intended participants for this tutorial are graduate students, post-docs, and junior faculty members in evolutionary biology. The workshop can accommodate up to 35 participants. Guest instructors will include: * Marguerite Butler, Biology, Univ. Hawai'i, Mānoa * Patrick Carter, Evolutionary Physiology, Washington State University, Pullman * Adam Jones, Biology, Texas A&M University, College Station * Brian O'Meara, Ecology & Evolutionary Biology, Univ. of Tennessee, Knoxville * Josef Uyeda, Biological Sciences, Virginia Tech, Blacksburg Cost: $1000 to be paid to Friday Harbor Laboratories. This fee will cover housing and meals at FHL and all other workshop expenses, except travel. Participants who have been admitted to attend will make their payment prior to arrival at FHL. Details of payment by credit card or check will be provided once the applicant has been admitted to attend. J.F. ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] statistical tests for phylogenetic independent contrasts
Vivek -- > What we are confused about is how to analyze independent contrasts when we > analyze "one trait at a time". > > In our previous works, we have done phylogenetic correction using 'ape' > package in R (Paradis et al., 2004) for computing correlation between pairs > of traits. But in present case, we are not sure how to proceed for > statistical tests if we obtain independent contrasts (from means for each > species?) for a given trait? > You have 20 traits, but you insist on analyzing them one at a time, so that you won't know about any covariation between traits? Not knowing that is a positive objective of your study? The vectors of the individual standardized contrasts are independent multivariate quantities. The contrasts for different traits covary, but the multivariate vectors are i.i.d. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA -- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] simulating continuous data
Bryan McLean -- > I’m working to simulate multiple continuous characters on a known > phylogeny (using several of the standard models), and I want to compare > properties of the simulated datasets to an empirical dataset. My question > is: what is the standard method for ensuring that those datasets > (simulated, empirical) are actually directly comparable, i.e. scaled > similarly? Does this involve specifying a sensible root state (e.g. > ancestral reconstruction) OR just rescaling one or the other datasets > before or after the analysis? Forgive me if this is a bit of a naive > question, just trying to get a sense of standard practices. > It would seem to depend on what you consider to be "scaled similarly". When I simulate multiple characters with correlated Brownian Motion, I specify a covariance matrix for the evolutionary changes, as well as a starting vector of means. Using a matrix square root of the covariance matrix, one can transform the characters so that the covariance matrix of the new characters is an identity matrix. Those are easy to simulate up the tree, and then one transforms back to the original characters. I do this with my own C programs, but it can be done in R too. But what does it mean to be "scaled similarly" ? Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?
Ted commented: > Nice shot across the bow! > Well, thanks but I have been shooting and shooting and shooting, with regard to issues like regressing one character on another and then analyzing only the residuals phylogenetically, and in this case issues like not thinking about how the environments changed along the tree. Not much effect so far so you can expect more shots across the bow. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?
Brian Gill -- > U between a discrete binary predictor > (Latitude: Colorado/Ecuador) and a continuous response (species richness). > > e at the influence of a discrete > binary trait on richness, but I'm not sure if my tree is suitable or if > there is a better approach. > > 1) Diversitree package BiSSE > 2) Caper package using MacroCAIC > > Any suggestions would be greatly appreciated- And what do these methods assume about how that "discrete binary predictor" evolves along the tree? Joe - j...@gs.washington.edu Joe Felsenstein, Department of Genome Sciences and Department of Biology Box 355065, University of Washington, Seattle, WA 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Testing for relationship between one categorical and one continuous variable in a phylogenetic framework.
Liam -- Thanks, I had a "senior moment" and have been corrected: Rphylip, not Phytools. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Testing for relationship between one categorical and one continuous variable in a phylogenetic framework.
Sean -- ... or, if you want to do it *really correctly*, you can use the threshold model of Sewall Wright for the discrete character and use the MCMC approach that I proposed in 2012: Felsenstein, J. 2012. A comparative method for both discrete and continuous characters using the threshold model. American Naturalist 179: 145-156. which is implemented in my program Threshml which can be called from Liam Revell's Phytools R package. It also works for multiple threshold characters and multiple continuous characters. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Determining Order of Trait Evolution
Gavin Leighton asked: > > > > > I have 500 trees of 80 species downloaded from birdtree.org and am > > primarily interested in two traits. I have used PGLS to determine the > > traits are related but would ideally like to test if there is an order to > > trait evolution. To complicate matters one trait (Trait A) is continuous > > while the second (Trait B) is presence/absence. I was hoping someone > could > > direct me to methods (assuming they exist) that would allow me to > determine > > the estimated value of Trait A before a gain of Trait B evolves in a > > lineage. > > > > A superior model (if I do say so myself) is the threshold model of Sewall Wright (1934). I have described in a paper in American Naturalist in 2012 how to use it to model discrete traits on a phylogeny. It allows the case of one trait continuous and another discrete. It is implemented in my program Threshml, which should be callable from Liam Revell's "phytools" R package. However it does not precisely answer the question you posed, but just asks whether the two traits evolve in a correlated fashion. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [MORPHMET] model II regression statistics PAST
A few days ago, Jim Rohlf said, in response to Patrick Arnold's query about how to test the significance of major axis "regression" > > I don't know about how to do it in PAST, but note that because the slope of > the standard major axis "regression" line is just the signed ratio of two > standard deviations, its square follows the F-distribution if one assumes > that the two variances are based on random samples from two normally > distributed populations. Thus a test for a slope = 1 corresponds to a > 2-tailed test for the equality of variances. One can easily look up > probabilities in an F-table or compute confidence limits using standard > methods. Note that if the samples are across different populations or > species, as in many (most?) morphometric applications, then these assumptions > do not hold. This problem was brought to my attention by Paul Harvey in the late 1970s. I suggested that he look for a rotation angle (theta) that would maximize the likelihood under a model in which the two (new) variables are independent Gaussian variables. This also allows a likelihood ratio test of the assertion that theta is zero. The estimate of theta provides an estimate of the angle of the major axis. These can be easily generalized to multiple populations, even when their variances are unequal. The likelihood ratio test is done with a test for equality of correlation matrices which will be found in a multivariate statistics book by Morrison. I am not sure where Paul published this, but I think he did in an appendix to a paper of his in some multiauthor volume. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] summary stats for comparative methods p-values
If the 100 trees are trees sampled from a Bayesian posterior, or else trees from bootstrap samples of your data, then you might just take the estimates from each tree (say estimates of a regression coefficient). Consider their distribution and ask whether the null hypothesis value (such as having slope 0) is in the tail of that histogram. The overall P value will be the proportion of estimated slopes that are below zero (unless you want to do a two-tailed test). Under the null hypothesis this will have the proper rejection probability. As you take more and more trees the power increases some, but the type I error rate stays the same. If the 100 trees are something else, such as the personal opinions of 100 of your friends, then there is no statistical justification for this. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Pairwise Distances
Pedro -- > Well, for now I just want to estimate the raw distances. But thinking about coalescents is certainly interesting. Would you have a suggestion for either ways of thinking? Well, you must be wanting to use those raw distances to infer something. Rates of migration? There are extensive discussions in the book by John Wakeley. Also in the elementary population genetics text by Nielsen and Slatkin. J.F. ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Pairwise Distances
Pedro Taucce -- > Is there a way to estimate pairwise distances between and within groups of > sequences (in my case, each group is a species with lots of individuals)? I > used to do it with MEGA, but now I use Linux only and MEGA isn't getting > along with it. > > The closest way I figured out is the function dist.dna from the ape > package. But I think it does not estimate distances between groups. > You want to use distances between groups? But you don't want to think about coalescents? J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data
I have received phishing spam through the R-sig-phylo mailing list (pretending I had expressed interest in renting something from them) from anya_phill...@casetotours.xyz disguised as a reply to my comment. So that address should immediately be removed from the list. Joe - j...@gs.washington.edu Joe Felsenstein, Department of Genome Sciences and Department of Biology Box 355065, University of Washington, Seattle, WA 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data
I do not know offhand whether there is an R implementation, but how about Mark Pagel's 1994 method for testing whether two 0/1 characters changing along a ohylogeny are changing independently? J.F. - j...@gs.washington.edu Joe Felsenstein, Department of Genome Sciences and Department of Biology Box 355065, University of Washington, Seattle, WA 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] testing for variation in rates of evolution among traits
Warning: You'd have to ensure that the traits for which you are comparing rates are evolving independently, so that they do not covary in their evolutionary changes. I assume Dean Adams's paper involves some way of coping with this. The issue of log-transforms that Ted raised is very important, otherwise big measurements will tend have higher rates of evolution. Joe j...@gs.washington.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)
Alberto Gallano asked: > > Would this also be the case in the situation where n is small enough (~ <15) > to enumerate all possible unique permutations? I was under the impression > that such an 'exact' test provided the true p-value without error. In a case that small, one might be able to evaluate all permutations. (There are 1.3 x 10^12 permutations, but maybe you need combinations, which are fewer). Yes, you would then have an exact P value. But of course that is not the same as an infinitely powerful test -- after all an ordinary t-test gets you an exact P value when its assumptions are met. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)
A number of people have suggested that P values should stabilize after a number of samples (in a permutation test) that depends on the data set. I suspect that these were unintended misstatements. As Dennis Slice has mentioned, one can regard each permutation in the permutation test as a random sample from a distribution. Comparing a test statistic X to its value in the data (say, Y), each permutation draws from a distribution in which there is a probability P that X exceeds Y. So each permutation is (to good approximation) a coin toss with probability P of Heads. There obviously no number of tosses beyond which the fraction of Heads "stabilizes". The fraction of heads after N tosses will depart from the true value P by an amount which has expectation 0, and variance P(1-P)/N. This is a fairly slow approach of the fraction of Heads to the true value. So to get twice as close to the true P value, one needs 4 times as many permutations. And this need for more and more samples continues indefinitely. There is no sudden change as one reaches a threshold number of permutations. But that's what you really meant, right? Joe --- Joe Felsenstein j...@gs.washington.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ancestor vs. change plots
Milton Tan asked: > > I have a question that is perhaps esoteric, since it's on a method I don't > see used often. I am looking at the dynamics of body size evolution, and have > come upon ancestor-vs-change plots described in Alroy 2000 ("Understanding > the dynamics of evolutionary trends", Paleobiology). This is interesting > because it will allow me to see if rate of body size change depends on body > size. I haven't seen this method widely used, so for anyone unaware how this > works: for each branch, you plot the ancestral state vs. the amount/rate of > change along the branch. If the question were just whether the rate of change of the character (body size) depended on its value, then another way would be to look for a transformation of body size that made the instantaneous variance of change constant. i.e., does change in log(size) or in sqrt(size) have constant variance? There are parameterized transformation such as y = (x^p - x^{-p})/(2p) or elsey = ( x^p - 1) / p + 1 for which you could estimate the parameter p by ML. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Sister-clade analysis of discrete characters
I mentioned earlier the alternative of having two-state discrete characters each modeled by an underlying quantitative character with a developmental threshold. This can in principle be extended to three or more states. In fact, Sewall Wright's original example in 1934 had three states. But there are two difficulties: 1. How far away from each other the two thresholds would be becomes something that needs to be estimated. If there are only two states, then since the underlying scale is not directly observed, one can place the threshold anywhere on the underlying "liability" scale, most conveniently at zero. But how far from that the second, third, (etc.), thresholds should be placed then needs estimation as parameters of the model. 2. It is also not obvious whether they third state uses the same underlying scale. For three states, one could have two underlying "liability" scales, with each state determined by some region in this two-dimensional space. Inferring that would be a continuous-state counterpart to Mary Mickevich's "transformation series analysis" of 1982 (see my book, page 81 for references). So thus far the threshold model implementations do not allow three or more states. But the possibilty should be mentioned here. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Sister-clade analysis of discrete characters
This is not a sister-clade method but ... If you are happy with a model in which there are underlying polygenic quantitative characters, covarying ones whose evolution is modeled by Brownian Motion model, and some two-state discrete characters arise by imposing a developmental threshold on each quantitative character, you might look at my 2012 paper in American Naturalist: http://www.jstor.org/stable/10.1086/663681 The method is designed for use with a known phylogeny and infers the evolutionary covariances of the underlying quantitative characters (the "liabilities"). My C program Threshml analysis this model by MCMC sampling. It can, I believe, be called by Liam Revell's R package phytools. This threshold model, a phylogenetic version of one which originated in 1934 with Sewall Wright, is mentioned in the Maddison and FitzJohn paper that William Gearty cited. It is just briefly mentioned by them, I gather because the authors forgot about it until the last minute when writing the review. Nevertheless it is getting increasing use. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] multipage pdf of a huge tree
Jonas Eberle -- > thank you for your answer! Actually, I used to make large size pdfs with a > tiny font size in these cases. It is then even possible to print this > single-page-pdf on multiple pages in Acrobat Reader. The problem was that > my current tree is really huge - with about 23000 tips. This took me to the > limits of this approach, since pdf size is limited to 200 inch by Adobe and > there also seems to be a lower limit for font size in pdfs (at least during > export form R?). > > I didn't knew that Drawtree is able to split the tree over multiple pages. > I guess this is an option in the command line version? I will check that > out. In the mean time I found a way to do the job in R (see my post before) > that seems to produce reasonable results. > > PHYLIP programs do not enter settings on the command-line. They have a menu. The Drawgram and Drawtree programs have a choice that gives a submenu in which multiple pages can be selected. This submenu is not available in the GUI (Java front end) version of those programs, but is available in the character-mode menu that appears if the program is run from the command line. It is the selection made by typing the character "#". J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] multipage pdf of a huge tree
Jonas Eberle wrote: > thanks a lot! I didn't knew splitplotTree yet. Great function! However, my > tree has several thousands of tips (yes, it's a bit crazy but unfortunately > necessary...) and I guess it's only possible to split it on two pages with > splitplotTree. Or am I missing something? > It's not in R (unless available through Liam Revell's "phytools" package), but in PHYLIP the tree-drawing programs Drawgram and Drawtree have the capability of splitting a plot into a rectangular array of plots, and putting these out onto separate files (not PDFs, but Postscript is possible). This was intended to help people make large posters using printers that can only do a single page. However ... I do not see why this is necessary. Most tree-drawing programs should be able to write a file that has the large tree plotted on it. If you don't want to print the resulting tree on paper, it would then be possible to view the tree in an application such as Adobe Acrobat Reader and zoom in on it and see the tiny branches and their labels. Making multiple plots for one tree would probably confuse the matter. Or is there something I am missing here? J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic analysis with sequencing error
Brian O'Meara added: > Something like that should work. Unlike the case of ambiguity, where if you > see, say, a Y at a site you'd give the probability of a C or a T at that > site each 1, not 0.5 (Felsenstein's book has a good discussion of this) I > think you're right that you'd want the probabilities to add up to one > across bases: with sequencing error, the probability of it being A in the > sequence given that it truly is A in the species is 99%. I poked around > phangorn and didn't see an easy way to do this, but it should be possible > (might require going into the C code to do this). > To emphasize what Brian said: the probabilities at a site, in this computation, are *not* the probabilities of alternative outcomes. They are the probabilities of one outcome given four different underlying situations. Thus they don't have to add up to 1. For the ambiguity case, in my book I warned: "You may be tempted to make the quantities (1/4, 1/4. 1/4, 1/4), but you should resist this temptation." For simple models of sequencing error the four numbers can add up to 1. Can, but don't have to. If an A is more likely to be misread than a G (say), then you get numbers that don't add up to 1, and that's OK, you should not worry about that, because they aren't the probabilities of four different outcomes. If we made up a 4 x 4 table where the first row shows the probabilities of the four observations given that the base really is A, and second row is the probabilities given that it really is G, etc., then the four quantities you want at one site at one tip of the tree are the four numbers in a column. (Which column depends on what you observed). The rows have to add to 1; the columns don't have to. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic analysis with sequencing error
George Shireff asked: > > I was wondering if there is a way to incorporate sequencing error in > phylogenetic analysis, if I know there is sequencing error but it's > randomly distributed. > > I imagine this could be as simple as setting the probability of the > observed base at each tip to 0.99 rather than 1, with a probability of > 0.01/3 rather than 0 for each of the other tips (for a sequencing error of > 1%). I was hoping that something like pml (phangorn) can be modified to do > that, but I haven't been able to figure it out myself yet. How to do this in likelhood computations (and thus in either likelihood or Bayesian methods) is explained in my book, on pages 255-266 in the subsection of Chapter 16 called "Handling ambiguity and error". I think that this is the first published treatment of that. My colleagues Mary Kuhner and Jim McGill published a paper in 2014 on this, with simulations showing that it helps to model the sequencing error: Kuhner MK, McGill J Correcting for sequencing error in maximum likelihood phylogeny inference. G3 (Bethesda). 2014 Nov 4;4(12):2545-52. doi: 10.1534/g3.114.014365. I believe that one major phylogeny program has very recently added code to model this. We have a version of Dnaml for PHYLIP that can do it, and hope to release it soon. As George Shireff saw, it is actually easy to do, and the computations are not any slower. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] hierarchical model with phylogenetic dependence term
This question was asked by Peter Smits and by Edwin Lebrija Trejos. In Peter Smit's question it was this: > I have a similar question to Edwin. I too am working with a hierarchical > Bayesian model, though I've implemented it in stan. I've included a > phylogenetic random effect term, which is modeled as being distributed as > multivariate normal with known covariance matrix up to a constant, > sigma_{phy}. This follows Lynch '91 Am Nat and Housworth et al. '04 Am Nat > by drawing on the similarity with the "animal model" from quantitative > genetics. > > My question is about the scaling of the covariance matrix: is it necessary > to have the the diagonal terms satisfy x <= 1 and all the off diagonals to > be 0 <= x < 1? I have a time scaled phylogeny from which I have my > covariance matrix, so currently all elements of the matrix are not scaled > so that the greatest distance from the root to a tip is 1. Currently, the > elements of the matrix are just the sum of the shared branch lengths. Is > this appropriate? Why or why not? > > I hope people will correct me on this, but my take is: 1. To infer a variance term for the "phylogenetic random term" one has to scale that term somehow, and then the variance will specify how many of those scaling units are in this term. If you had the tree depth be 2 instead of 1, the variance inferred would then be half as great, because it would be saying how many multiples of 2 rather than how many multiples of 1. 2. I have not had time to read through all of Edwin's code to see exactly what the model is. However, note that there is a distinction between "individual" effects that are on a whole species, and "individual" effects that are on a single sampled individual. The latter are taking into account that the mean phenotype of each species is only known from a finite sample of individuals. So it is taking into account within-species phhenotypic variation and finiteness of sample sizes. The relevant methods there are by Ives, Midford and Garland (2007) and by me (2008). Ives et al. assume within-species phenotypic variances and covariances are known, I give methods for inferring them. The methods of Lynch and Housworth are in effect either assuming a sample size of 1 for each species, or are considering their "individual" effect to be on the whole species. Within-species phenotypic variation can be a substantial problem, as Ricklefs and Starck (1996) noted. In their example, the largest contrasts were between closely-related species. They suspected that this was due to within-species phenotypic variation (which shows up as variance due to sampling of the individual specimens measured). It causes trouble because the small branch lengths between closely-related species are an inadequate predictor of how different the species means will be. In effect, the model is wrong so some of the changes attributed to between-species evolution are actually within-species sampling variation (phenotypic variance). Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Cross-validation with independent contrasts
Diego Bilski -- > I'm wondering if it would be statistically/philosophically correct to use > n-fold cross-validation to evaluate a linear regression with independent > contrasts. My doubt comes from the fact that when simply dividing the IC > dataset in, lets say, 10 folds, some folds will remove the contrasts of > internal nodes without necessarily removing an entire clade above that > point, producing what can be viewed as two independent clades (a graphical > example would be, in Felsenstein's seminal paper, fig. 8, remove the > contrast at node 13, while keeping those at nodes 9 and/or 10). and Ted Garland wrote: Couldn't you also just do this back at the level of the original tree and > tip data, creating subsets by pruning the tree before you compute contrasts? > Under the model of multivariate normality with Brownian Motion change along the phylogeny, the contrasts are i.i.d. so of course one can use them as points for cross-validation. But of course, unless the regression is nonlinear, there is already a parametric framework for distributions of regression coefficients (and other associated phenomena) in that i.i.d. MVN framework. The issue of what entities should be sampled in cross-validation depends on how, at what level, you expect the model to depart from multivariate normality with Brownian Motion. Diego and Ted seem to have some such expectation but I can't see what that alternative model would be. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Midpoint rooting routine?
Liam said: > Joe's correct that this is not yet in our project to create an R interface > for PHYLIP (Rphylip, https://github.com/liamrevell/Rphylip). As soon as I > get a chance to work on this project today, I will add it. That may be hard to do, as Retree is interactive and has two levels of menus. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Midpoint rooting routine?
Todd -- > Does anyone know of an existing midpoint rooting routine? I am > displaying trees and would like to show them as midpoint rooted. > I've been using the Phangorn package, which does the midpoint rooting > perfectly, but it has some dependencies that make it unstable on the linux > machines I've been using. When I looked last, Phangorn was the only package > I could find with midpoint routing. Retree in PHYLIP can do that. Option M once the tree is read in. You can write a script to feed the menu settings to the program. I am not sure Liam Revell has Retree in his Rphylip project (that makes it easy to call PHYLIP programs from R). But you can automate the whole process yourself without too much difficulty -- see the section on "Running PHYLIP programs from a command file" in the main PHYLIP documentation file main.html, and see of course also retree.html. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Sampling from all bifurcating and multifurcating trees
Eduardo Ascarrunz wrote: > Also, I figured out I could work with unrooted trees. I suspect the procedure > would be similar to your rooted method, wouldn't it? There is a 1-1 correspondence between n-species rooted tree topologies and (n+1)-species unrooted tree topologies (this is discussed in Chapter 3 of my book). So you can simply generate a bifurcating rooted tree, or a multifurcating rooted tree, of n-1 tips, and then unroot it, making the previous root now be species n. Then you get a randomly sampled unrooted tree. This works for tree topologies but not, I think, for labeled histories. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] R help generating a heat map
One can sample from all possible rooted labeled histories by randomly bifurcating lineages, and at the end assigning the names randomly to the tips. Note that a "labeled history" is not the same as a tree topology. One can sample randomly from all topologies of rooted bifurcating trees with labeled tips by adding the species one by one to a one-species tree, each time choosing to split them off from one of the branches. The enumeration of all such trees is based on counting all possible ways of doing this, so choosing one such sequence should choose one of them at random. Jim Rohlf has a paper with an algorithm equivalent to this in (I think) Systematic Zoology but I cannot locate it at the moment. For rooted multifurcating trees a similar method could be used. In my 1978 paper in Systematic Zoology I showed that each such tree corresponded to way adding the species 1 to n in order, where at each step we choose equiprobably from among all branches and all interior nodes (so that if there are 13 branches and 6 interior nodes we split off one of these 19 at random). The enumeration algorithm for which this is based is also discussed in detail in Chapter 3 of my book. Joe On Mon, Dec 23, 2013 at 8:57 PM, Eduardo Ascarrunz wrote: > Hi Liam, > > Thank you! That looks clever. How does this method bias the sampling? I > think it could be useful for test running my code anyway. > > Looking forward your findings, and all the best, > > E. > On 24 Dec 2013 04:49, "Liam J. Revell" wrote: > > > Hi Eduardo. > > > > You could try something like this: > > > > randomFurcTrees<-function(n,N=1){ > > foo<-function(n){ > > t<-di2multi(rtree(n,br=sample(c(0,1), > > size=2*(n-2),replace=TRUE),rooted=FALSE)) > > t$edge.length<-NULL > > t > > } > > trees<-if(N>1) replicate(N,foo(12),simplify=FALSE) else foo(n) > > if(N>1) class(trees)<-"multiPhylo" > > return(trees) > > } > > > > What this does is it generates random bifurcating topologies (using rtree > > in ape) with branch lengths that can be 0 or 1; then it collapses all > zero > > length branches to polytomies. This is (demonstrably) *not* the same as > > picking trees at random from the set of all bi- and multifurcating > > topologies. It's not immediately obvious how you could do that, but I'll > > think about it. > > > > All the best, Liam > > > > Liam J. Revell, Assistant Professor of Biology > > University of Massachusetts Boston > > web: http://faculty.umb.edu/liam.revell/ > > email: liam.rev...@umb.edu > > blog: http://blog.phytools.org > > > > On 12/23/2013 10:15 PM, Eduardo Ascarrunz wrote: > > > >> Hello everyone, > >> > >> Newbie here. I'm looking for a way to generate random trees of N tips, > >> allowing multifurcations. My N would be >12, so it wouldn't be practical > >> (nor possible) to work with all the possible trees (cf. allFurcTrees). > I'd > >> be happy with a set of 1000 trees, sampled equiprobably. I'd much > >> appreciate any help. > >> > >> Best to all, > >> > >> E. > >> > >> [[alternative HTML version deleted]] > >> > >> ___ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> Searchable archive at http://www.mail-archive.com/r- > >> sig-ph...@r-project.org/ > >> > >> > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Transforming data for OU model
Ted Garland wrote: Our programs are in DOS (What's DOS? The last operating system that worked ...). However, I think you can also do this in R somewhere, maybe phytools? Liam Revell, want to jump in here? Just to remind people: DOS (also called MSDOS) executables programs can be run in Windows using the Command Prompt tool, which you will find in the Accessories menu that is found in the menu opened by the All Programs tab in the Start Menu. Joe ---- Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] FW: PGLS vs lm
Luke Matthews -- > If there is specific concern about one trait consistently lagging behind > another (Joe's latest complaint on pgls), then there are techniques specific > for this. Charlie Nunn has published one such technique for evolutionary > lag. More generally, Joe's complaint that pgls assumes traits are at > adaptive equilibrium points to infer evolutionary adaptation is generic to > most evolutionary biology (including most applications of independent > contrasts, optimal foraging, sexual selection studies, etc.) and in no way > specific to pgls or comparative methods. I'm not advocating traits always > are at equilibrium. Indeed, some of my work on capuchin sexual behavior is a > bit heterodox because I've proposed what is going on is not at an equilibriu! > m state. But assuming that most of the time most traits are at something > close to adaptive equilibrium is generic to evolutionary biology. I guess I'm not an evolutionary biologist then. If an adaptive peak is moving around, I am unwilling to assume that the population always succeeds in staying on top of it. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
Tom Schoenemann asked me: > With respect to your crankiness, is this the paper by Hansen that you are > referring to?: > > Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. > (2012). A phylogenetic comparative method for studying multivariate > adaptation. Journal of Theoretical Biology, 314(0), 204-215. > > I wrote Bartoszek to see if I could get his R code to try the method > mentioned in there. If I can figure out how to apply it to my data, that will > be great. I agree that it is clearly a mistake to assume one variable is > responding evolutionarily only to the current value of the other (predictor > variables). I'm glad to hear that *somebody* here thinks it is a mistake (because it really is). I keep mentioning it here, and Hansen has published extensively on it, but everyone keeps saying "Well, my friend used it, and he got tenure, so it must be OK". The paper I saw was this one: Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the evolutionary regression: The interplay between observational and biological errors in phylogenetic comparative studies. Systematic Biology 61 (3): 413-425. ISSN 1063-5157. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
If the "regressions" are being done in a model which implies that the two variables are multivariate normal, then we can simply estimate the parameters of that joint distribution, which are of course the two means and the three elements of the covariance matrix. If we then test whether Cov(X,Y) is different from zero, that should be equivalent to a test of significance of either regression. /* crankiness on */ Note of course that most "phylogenetic" regressions are being done wrong: if they assume that Y responds to the current value of X, but when the value of Y may actually be the result of optimum selection which is affected by past values of X which we do not observe directly. I've complained about this here in the past, to no avail, Thomas Hansen, in a recent paper, made the same point, with evidence too. /* crankiness off */ J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about measurement error in phylogenetic signal (Krzysztof Bartoszek)
In addition to the references to papers by Hansen and Bartoszek, and by Ives, Midford and Garland, I would biasedly suggest this paper: Felsenstein, J. 2008. Comparative methods with sampling error and within-species variation: contrasts revisited and revised. American Naturalist 171: 713-725. The method estimates the within-species phenotypic variation (which, when you are analysing species means is the relevamt "measurement error" and also includes actual measurement error) and corrects for it. The software announced there is not in R, but I believe that Liam Revell's phytools package can call our program. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] dist.dna -- inconsistent behavior with ambiguous sequences
Emmanuel Paradis wrote: > The current behaviour of dist.dna() is to ignore ambiguous nucleotides. The > difficulty is that comparisons with ambiguous bases may or may not be > informative depending on the model. For instance, the comparison "R-Y" is > clearly a transversion, while the example you give ("A-S") may be either a > transition or a transversion, so it is informative for the JC69 model but not > for K80. Maybe some weighting scheme could be used for this problem. If > someone is aware of some systematic treatment of this issue, please let me > know. > > With real data, it is wise to check the quality of the alignment, for > instance with base.freq(, all = TRUE) or image(). If you view the two-species distance as an ML estimate of the length of a two-species tree, then one can make use of RY ambiguities (and others). That is the best and most fundamental method. I used it in Dnadist in PHYLIP. That approach does (effectively) drop any site that has a total ambiguity such as an N nucleotide, but all other ambiguity codes do affect the distance. It is slower but worth it. It does *not* mean that when one species has A and the other has A/G that you just count a match of 1/2, as for a small distance most of the likelihood is contributed by the A possibility. For longer branches the contributions of the two terms are more nearly equal. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] phylogenetic multiple correspondence analysis
Danny Rojas wrote: > Anybody know if there is an alternative to the phylogenetic principal > component analysis of Revell (2009; Evolution 63-12: 3258–3268), let's say > a phylogenetic multiple correspondence analysis, to analyze non-continuous > variables? I would appreciate any suggestion. Use my program Threshml (sorry, it's not written in R) which models 0/1 variables using Sewall Wright's (1934) "threshold model", and can infer principal components of the underlying continuous variables. It can also mix continuous and 0/1 variables. Felsenstein, J. 2012. A comparative method for both discrete and continuous characters using the threshold model. American Naturalist 179: 145-156 General packages such a MCMCGlimm can (probably) be made to do the equivalent, with some torture. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about analysing trait differences
I wrote: Thus if we find (chimp, bonobo) to be the "cherry" we remove them, leaving a tree such as (macacque, (gibbon, (orang, gorilla))); so now (orang, gorilla) is a cherry. but should have written ...leaving a tree such as ((macacque, (gibbon, (orang, (gorilla, human; so now (gorilla, human) is a cherry. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about analysing trait differences
Brian O'Meara wrote: The problem is reconstructing overlap down the tree (it's possible to do, but whether it's possible to do well is another question). One thing you could do to avoid this is to use the *other* method from Felsenstein's 1985 independent contrasts paper, taking pairs of species independent of other pairs. A way to do this: 1) Get a clade of size two (aka a "cherry" sensu Semple and Steel). Every tree has at least one. 2) Difference in overlap and diference in mass for the pair of taxa making up this clade becomes one point (perhaps standardize for time, and positivize it such that you subtract them in the order that leaves the mass difference positive). 3) Prune these two taxa off the tree. 4) Go back to step 1. When you're done, assuming no polytomies, you'll have zero or one species left. [it's possible to do with polytomies, too: if assumed to be hard, then just do a pair of taxa from a terminal polytomy and leave the rest for the next step. If assumed to be soft, then just take two from a terminal polytomy and discard the remaining taxa] Contrasts gives you Ntax - 1 contrasts. This gives you floor(Ntax/ 2) but avoids having to do reconstructions. I have code around to do this but can't find it at the moment, but it's not hard to write -- post again if you need help with it and no better ideas have been proposed. Note that this method (which goes all the way back to Salisbury in 1942 as mentioned in my 2004 book) requires that in step 3 "prune" does not mean using the "pruning" algorithm of likelihood evaluation. It means removing these two tips (and their shared most recent common ancestor), leaving behind nothing. So not calculating and adding to the tree any phenotypes for the most recent common ancestor. Thus if we find (chimp, bonobo) to be the "cherry" we remove them, leaving a tree such as (macacque, (gibbon, (orang, gorilla))); so now (orang, gorilla) is a cherry. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] statistically test whether two characters evolve dependently
New Bio wrote: > Thanks, Liam. good to know. I think extending the tools to analyze traits > of ordered/unordered multistates can be very useful. There are many > interesting traits such as oxygent requirement (anaerobic, facultative, > aerobic) of microbes, which is ordered multistates, and habitats (water, > air, soil), which is unordered. For approaches like the threshold model, it is not simple to see how to handle multistate characters. Should we assume one scale? If so, how far from the 0/1 threshold should we place the 1/2 threshold? That becomes an additional parameter to be estimated. Or should we have an additional axis, so that 0/1 is on the x axis, while [01]/2 is on another axis? And then what do we do about state 3 if it also exists? That way lies madness ... or perhaps a good Ph.D. thesis topic. (This is in some way related to Transformation Series Analysis, which was an issue with parsimony methods. One imagined one's states arranged in a character "transformation series" which could even be a tree, the Character State Tree. Then one wanted to infer the phylogeny and at the same time also the CST, using parsimony as the criterion. In a sense what I am raising is the likelihood and modeling equivalent problem.) Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Seeking list of nucleotide substitution models
Emmanuel Paradis answered Daniel Barker's inquiry: >> Is there a review or list of ~every specific nucleotide substitution model >> that has been proposed or used in the literature (with references)? > Hi Daniel, > > Have you looked at the help page of dist.dna? The list of references there is > focused on calculating distances but some are general. There are a few more > references in my book too. I suggest you also look at "Inferring Phylogenies". where you will find, in the section on "Other distances" in Chapter 13 that Andrey Zharkikh had a very good paper in 1994 covering many of the distances that had been invented up to that date, and explaining relationships between them: Zharkikh, A. 1994. Estimation of evolutionary distances between nucleotide sequences. Journal of Molecular Evolution 39: 315.329 Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] combining p-values in PCMs
John D asked: > Given that things have been quiet in the list lately, I think this > could be a good time for me to ask your opinion about this issue. > > Imagine that I run a PIC analysis on two traits using 1000 > post-burn-in trees. What would be the best way to summarize these > results? Average p-values across all analyses? Perhaps a specific > method to combine the resulting probabilities e.g. Fisher's test? If you want to know about a regression coefficient, or other parameter, the 1000 inferred values in the 1000 trees can be taken as a posterior distribution if the trees come from a Bayesian inference (I am assuming from your description that they do). If you want to test whether, say, the regression coefficient is positive, you can take this posterior distribution and simply ask whether the credible interval for the parameter includes zero. In this case that interval would be the upper 95% of the sample values. Bringing in P values or Fisher's test just mixes Bayesian with frequentist or likelihood methods, which will get two groups of people annoyed at you, whereas using only one of these methods has the advantage of irritating only one group of us. I note that John's email address is "dobzhanski". Good to hear from you again, Professor Dobzhansky. You may recall that you gave me a tour of your lab at Rockefeller University in the summer of 1963 when Dick Lewontin sent me from Rochester to pick up some data paperwork for him for a joint project the two of you were working on. (Okay, the last part is "Poisson d'Avrile") Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Ancestral state estimates of continuous traits
Emmanuel Paradis wrote: > The new default for ace will be to use REML for continuous > characters and ML for discrete ones. after Ted Garland wrote: >> Yes, we have always used REML for this! (Emmanuel wrote: ) >>> You can also try method="REML" which gives much better >>> estimates of sigma^2 than ML. I think I'll make it the >>> default for continuous characters. ** bragging mode on ** Glad to see you have all caught up to this paper: Felsenstein, J. 1973. Maximum likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics 25: 471-492. (which can be accessed at that journal free of charge) ** bragging mode off ** Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] testing for correlates of rates of evolution
Rob Lanfear wrote: > In particular, Liam's using squared contrasts in y, so that's > asking whether the absolute size of changes in y depends on > x at the ancestral node. I might have missed something here, > but that sounds very similar in principle to Freckleton's > test of whether the variance of trait differences is > unrelated to their absolute values [1], except that the > latter looks for correlations between the absolute value of > differences in x versus x at the ancestral node. It might be > useful to consult that paper [1] to get some more ideas for > how to interpret those kinds of results. If there is proportionality between standard deviation of change and the value of the character, that is essentially saying that the log(x) changes at rate independent of its value. Similarly, if the variance of change is proportional to the value of the character, that is essentially the same as saying that the square root of the character changes at a rate independent of its value. See this Wikipedia page: http://en.wikipedia.org/wiki/Variance-stabilizing_transformation Perhaps the whole test can be done by considering different transforms of the data (there are parameterized families of them) and use the likelihood to test values of the transform parameters (with appropriate correction of the likelihood by having a Jacobian term). Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] From ClustalW2 Tree to Heat Map in R
Klaus Schliep wrote: > There is quite some irony that in phylogenetic reconstruction often > non-ultrametric methods are preferred, even though the time to the > last common ancestor (LCA) should be for each extend species the same. > However other fields use heavily ultrametric methods (hclust) even > there exist no evident equivalent property like the LCA. The irony, such as it is, is due to the fact that we can only see amounts of difference, not amounts of time, and, darn it, different organisms have different biologies so they change at different rates. Which makes biology more interesting but makes molecular change less clocklike. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Why no branch lengths on consensus trees?
Emmanuel Paradis wrote: > If you are Bayesian, the trees sampled from an MCMC are here for estimation > including of the branch lengths, so you use them to compute some sort of > consensus topology as well as its branch lengths. So it makes sense that > MrBayes can do a consensus tree with branch lengths. I endorse the rest of Emmanuel's advice but let me quibble with this one. The posterior on trees may not consist mostly of trees varying around a single consensus. If the posterior had, for example, two modes, each centered around a different tree, a single consensus tree might not be appropriate, and branch lengths computed by averaging lengths over the two modes might not be a good guide to what the trees in the posterior looked like. I don't know enough about MrBayes features to know whether they have some way around this. There is a similar issue with parsimony methods -- the set of most parsimonious trees may have a consensus, which may well not be a most parsimonious tree. People who see the consensus of most parsimonious trees may not realize that the particular tree they are looking at is not most parsimonious. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA (from 1 October 2012 to 10 December 2012 on sabbatical leave at) Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94710 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Why no branch lengths on consensus trees?
Daniel Barker wrote: > What should branch lengths on a consensus tree represent? > > Scott Chamberlain had written: > >> When making a consensus tree using ape::consensus the branch lengths are >> lost. Is there a way to not lose the branch lengths? Or to add them >> somehow >> to the consensus tree after making it. The issue of what branch lengths ought to be on a consensus tree is not simple. If we have three rooted trees: ((A:1,B:1):1,C:2); ((A:1,B:1):1,C:2); (A:2,(B:1,C:1):1); the consensus tree should be the first tree, but what branch length should be used for (say) the branch ancestral to the AB clade? 1? 0.667? The minute you open this can of worms it becomes clear that the answer depends on what you want that number to convey and what interpretations your audience will tend to draw from the number. There is no "obvious" answer. So this is not a mere technical computing question. By the way, in my 2004 book, you will find me agonizing about this on page 526, coming down on the side of 0.667, but not overwhelmingly convincingly. You could argue that a branch length should be set 0 when the branch is not there, and all the resulting values averaged, or you could argue that the average should only be taken over those trees for which that branch is present. One possible way to solve the problem is to take the consensus tree as if it were a user-defined tree, use your whole data set, and infer branch lengths on that tree. Daniel has already expressed his legitimate concerns in such a case as to whether it takes (for example) trifurcations as if they were real rather than an expression of our uncertainty. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA (from 1 October 2012 to 10 December 2012 on sabbatical leave at) Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94710 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Cluster tips into OTUs
Miguel Verdu -- > I want to perform a mothur-like OTU-based approach, but based on phylogenetic > instead of DNA sequence distances. Is there any way to cluster tips of a tree > into OTUs determined by a threshold of phylogenetic distances?. In other > words, I want to collapse all the tips connected with distances less than X > into the same OTUs. I don't know which R program will do this (probably just a one-line "if" command) but the obvious method would be to take the distance matrix and reduce all the elements of it that are less than X to 0.000. Then run any distance method. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA (from 1 October 2012 to 10 December 2012 on sabbatical leave at) Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94710 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative analyse and heritability
Tristan Lefebure wrote: > We are struggling with a comparative analysis that involves 2 traits: > - trait #1 is heritable (genome size) and clearly has a strong phylogenetic > signal > - trait #2 is non heritable (time of colonisation of a new habitat, that > convergently happened on the terminal branches), calculating contrasts on > this > trait would be meaningless. > > If one wants to test the link between these two variables, how would you do > this? > > We thought about using two different phylogenies in a phylogenetic > independent > contrast analysis: the species tree phylogeny for trait 1, and a star tree > for > trait 2. But how do you deal with a unique internal node? The time it takes to colonize a new habitat seems to me to be a function of various underlying variables. Those may have changed along the tree, and not necessarily very quickly. So it seems that an ordinary comparative methods analysis is justified. It is very similar to having a character such as time to escape a predator. The fact that it does not take millions of years to escape one attack is really irrelevant. Joe` Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] testing binomial characters
Theodore Garland Jr wrote: > Check this: > > Ives, A. R., and T. Garland, Jr. 2010. Phylogenetic logistic regression for > binary dependent variables. Systematic Biology 59:9-26. In addition, check this: Felsenstein, J. 2012. A comparative method for both discrete and continuous characters using the threshold model. American Naturalist 179: 145-156. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] asymmetric transitions
Daniel Barker wrote: > On ancestral state reconstructions. I've recently started using Ziheng > Yang's terminology - of referring to reconstructions, derived from ML > transition rates and equilibrium frequencies, as 'empirical Bayes' > reconstructions. I believe this to be the most useful way to describe > these methods. > > My question. Were does empirical Bayes stand, w.r.t. the Likelihood > Principle? > > Empirical Bayes seems beyond the scope of the Likelihood Principle, or in > violation of it. The biological hypotheses (here, of ancestral state) are > not expressed as parameters of the model, so the relative support is not > judged by a likelihood ratio. On the other hand, by only using the data at > hand, empirical Bayes does comply with 'the irrelevance of outcomes not > actually observed'. > > If anyone can provide or point to some thoughts on this, I'd be very > grateful - for ancestral states or in general. It has long been recognized -- since Anthony Edwards's 1970 paper and Elizabeth Thompson's brilliant 1975 thesis and book -- that the interior node reconstructions are not, strictly speaking, MLEs. They are maximum posterior probability estimates instead. The root ancestral states can be considered MLEs if that state is one of the parameters of the model. If instead (as often done for DNA and protein data) the root ancestral state is considered to be drawn from the equilibrium distribution of the stochastic process, then they too are MPP estimates. I would only call them empirical Bayes estimators if one made an ML estimate of some stuff (the whole tree topology, which sounds like an extreme case) and then, assuming the correctness of that estimate, the ancestral states are inferred by being Bayesian. In that case probably the only thing that would be taken to have a prior on it would be the ancestral state. Then you could take the MPP as the modal estimate from the posterior. So, if I have understood his point correctly, Ziheng is formally correct, but it is a case where one is not being very Bayesian. For my money, you are a Bayesian not just if you use a prior, but if you are willing to use a controversial prior. And in this case the prior is pretty uncontroversial. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] asymmetric transitions
Mark Holder wrote: With respect to Jarrod's simulations, I have a few thoughts: 1. I don't understand the claim (in the original email) that "its fairly straightforward to prove that asymmetric transition rates cannot be identified using data collected on the tips of a phylogeny" It seems like this is something that is routinely done in phylogenetics, and proofs of identifiability of GTR exist (demonstrating that this indeed feasible and not some computational artifact). I agree. This whole discussion may have started from a "fact" that is *not* "fairly straightforward to prove". In which case it is not necessary to bring speciation rates or priors into it. A reversible two-state model should be able to have its parameters estimated on a given tree, clocklike or not. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] simulate traits evolution in correlated with body mass
Liam Revell wrote: > library(phytools) > y<-r*x+sqrt(1-r^2)*fastBM(tree,sig2=mean(pic(x,tree)^2)) > This should give you the correlation r on average and the y|x should be > Brownian. (If x is Brownian, then both y & x & y|x will be too.) If y is evolving in response to x, and x is changing too, then wouldn't the expectation of y be a value that is predicted, not by the current x, but to some extent also from past x's? One would want some much more specific model. *** whining on *** This is part of my stream of oft-repeated, widely-ignored complaints that one ought not regress present-day y's on present-day x's. Lots of people are doing it -- and they're all wrong. *** whining off *** Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Ben Bolker -- >> Now we can apply Fisher's Likelihood Ratio Test of Fisher as well >> as the AIC. The LRT tells us that the expectation > > ... under the null hypothesis where M0 (the simpler model) is true? Yup. I'm considering that case to see how the AIC fits in with the LRT. Of course the AIC is proposed for a much wider range of cases. >> of >> 2 log(L1) - 2 log(L0) is d1 - d0 (because it is distributed as a >> Chi-Square with that number of degrees of freedom). >> >> But the AIC tells us that the expectation is 2(d1 - d0). > > Maybe I'm missing something, but I don't see how the AIC tells us > something about the expectation of 2 log(L1) - 2 log(L0) ? It gives us > the expectation of the Kullback-Leibler distance, which is something > like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of > outcomes and q(i) is the predicted distribution of outcomes ... so it's > something more like a marginal log-likelihood difference rather than a > maximum log-likelihood difference ... Well, the AIC ends up with comparing -2 log(L) + 2d for the two hypotheses. The difference of these for models M1 and M0 is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I missed something here? So the expectation of the difference is log likelihood *is* described by the AIC, right? And isn't it (in view of Fisher's distribution) wrong too? That is what disturbs me and makes me feel there is something I don't understand about the AIC argument. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LL ratio test
Carl Boettiger wrote: > Others on the list can weigh in with more authority, but perhaps this will > get the discussion started. Yes, it's important to know whether the parameters are nested, and the issue of being at the end of a parameter range is serious. > Recall that AIC values are a fequentist statistic: and they obey the very > same same distribution as the likelihood ratio, (recall it is a difference > log likelihoods, just shifted by the difference in the number of parameters > (e.g. -2 [ log L1 - log L0 - (k1 - k0)]). Recall that the maximum > likelihood estimate (MLE) is a biased estimate of the likelihood of your > data and that AIC penalty is simply creating an asymptotically unbiased** > estimator of the true model likelihood, which is a frequentist concept to > begin with. Why we report confidence intervals/p-values in the case of one > of these statistics but not the other is not obvious to me either. I will confess my relative ignorance of AIC issues (my phylogeny book has a simple, elegant, and clear explanation -- which I wrote in a hurry while excited that I finally understood this, and which turns out to make no sense whatsoever and should be firmly ignored by all). But I do know this: If we have the likelihood ratio R = L(p')/L(p) where p' is the ML parameter values and p is the true parameter values, and where p is in the interior of the set of possible parameters, then RA Fisher showed about 1922 that asymptotically with large amounts of data: 2 log(R) is distributed as chi-square with D degrees of freedom, where D is the difference of the number of parameters being estimated in p' and the number of parameters being estimated in p. Now we know that the expectation of that chi-squared variable is D. So to correct the bias in R we should subtract D. That sounds like what Carl is explaining too. It sounds like a very simple and clear explanation of the AIC. Unfortunately that subtraction is *not* what AIC does. It subtracts 2D. The reason it does so is unclear to me. It involves some kind of prior on models, I think. As far as I am concerned it is "like the peace of god", in that it "passeth human understanding". Maybe the experts here can give me a simple explanation. Otherwise maybe we should honor Fisher (not me) and only subtract D, and call the result the FIC, But that works only for nested hypotheses, and the main point of the AIC is to deal with non-nested hypotheses. To make matters worse, in my field the AIC has the reputation of too easily favoring the most complex hypothesis, so maybe we should be subtracting more than 2D, not less. Clueless in Seattle. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] are partition of a tree statistically independent?
in response to Inti Pedroso's original question: >> Let say we have a phylogenetic tree and we generate a partition on the tree, >> i.e., split the in two by "cutting" at an internal node. Is the evolutionary >> process in the two partition independent? If that is the case it means that >> p(T_1,T_2) = p(T_1)*p(T_2) with p = probability and T_1 and T_2 the tree >> partitions 1 and 2. Arne Mooer wrote: >> It depends on what process you are considering, but this is usually an >> assumption folks make for, e.g. simple trait evolution or diversification. >> For any autocorrelated process (e.g. some common relaxed clock models), it >> will depend on whether your splits are or are not nested on the directed >> (rooted) tree. I disagree. If we have the assumption of a Markov process, when we divide the tree into two partitions at a point p0, then the likelihood of the full tree is *not* P(X|T) = P(X1|T1) P(X2|T2). What is true is that if we knew the state of the process at p0, and call that X0, then if the Markov process is reversible, P(X|T) = P(X0) P(X1|X0, T1) P(X2|X0, T2) where P(X0) is the prior probability of the process being in state X0. We typically do not know X0 so one must sum over all values of X0: P(X|T) = \sum_{X0} P(X0) P(X1|X0, T1) P(X2|X0, T2) where \sum_(X0} is summation over all X0. This conditional independence given X0 is a basic assumption used in the statistical field of graphical model, and phylogeny models are graphical models.. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Using AIC/AICc/BIC to select trait a model of trait evolution
Matt Pennell wrote: > My question is: how many observations do we have when we compare trait > evolutionary models? People tend to use the number of tips of taxa for > which we have trait values. However, this may not be technically > accurate. > First, of course, both the branch lengths and the tip values factor > into > the likelihood equations so it seems sensible that these are both > somehow > included as observations. Second, the trait values we observe are > of course > not independent (that is the whole reason we are using a phylogeny > in the > first place!!). It is unclear whether/how this fact should factor > into our > calculation of the n. I know that it phylogenetics, when people do > model > selection for the model of sequence evolution, they use the number > of sites > in the alignment though i am not sure there is a clear > justification for > this either. On just one of these points: number of sites in a molecular alignment is completely justified as the number to use in computing the number of data points, if the evolution at the sites is independent (i.e. independent, given the true phylogeny). In that case each column of the data matrix is drawn i.i.d. from a distribution. Of course, even correlations of rates of evolution among neighboring sites violates this. Whether and how number of species comes in is trickier. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
Liam Revell wrote: > > Poaching Intensity = beta0 + beta1*Body Size + e > > I think it depends on how the residual error in the model is > distributed (esp. correlated) among species. It seems possible to > invent hypothetical scenarios (as I did in my previous email) about > how the residual error in poaching intensity given body size could > be phylogenetically autocorrelated, but this is fundamentally an > empirical question. If the residual error of poaching intensity > given body size is phylogenetically correlated and we ignore this > then we risk overestimating the predictive value of our model. > > In addition, the residual error is likely/guaranteed to be non- > Brownian if the response variable is binary (e.g., extant v. > extinct). For these type of data the tree should not be ignored, > but simple GLS regression is probably not appropriate. One option > might be the phylogenetic logistic regression of Ives & Garland > (2009), but I'm not too familiar with this method. The real problem would come if the characters evolved to respond to the poaching intensity, and that evolution was inherited along the tree. Then our uncertainty about the poaching intensity in the past would be a big problem. But if poaching is only a present-day phenomenon, it would (presumably) respond to only today's characters, and they would not yet have responded to it, so there would be no problem. But I do think it is a Big Mistake (and apparently a frequent one), when people measure the residual of a character regressed on environmental variables, then casually assume that it is undergoing Brownian Motion, when the environmental variables may have been different in the past. That's probably not an issue here. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
Liam wrote: > I'm not sure I entirely agree that we need to assume that the environmental > trait is evolving on the tree by Brownian motion. I believe that so long as > Y|X (in David's example, growth rate given habitat degradation) is evolving > by Brownian motion, we should be OK to use PIC regression. I worry. Why are we to assume that current phenotypes are distributed around optima that are based on *current* environmental variables? Joe --- Joe Felsenstein, j...@gs.washington.edu Department of Genome Sciences and Department of Biology University of Washington Box 355065 Seattle WA 98195-5065 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits
David Bapst wrote -- > Let's say that there exists a worker who is measuring several different > traits across a number of species and then testing for correlations among > these traits. The first test is body size versus growth rate and they use > independent contrasts or PGLS to test for a the correlation, accounting for > phylogeny. Both of these traits are inherited, evolving variables. Now > let's say they'd like to test for the relationship between growth rate and > some metric of the anthropogenic degradation of that species' habitat. Now > what? It is even valid to apply PIC to the habitat degradation metric even > though it is not an inherited, evolving trait? It's unclear to me. ... > One explanation I know of is that when we apply phylogenetic comparative > methods to these quasi-traits to consider their relationship to another > trait, we are assuming that these variables are actually the result of some > underlying, unobserved set of traits which are evolving along the > phylogeny. I don't have an easy answer to this: it is a serious issue. One is assuming that the environmental variable is evolving along the phylogeny according to a Browian motion process. This may well not be a reasonable assumption. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Is correlation of PICs, with tip data and each terminal node split into male and female, a valid method?
Anthony Ives wrote: > Just a quick comment on your response to Dan. You can use REML to compare > models, but only to compare the "variance" components and only if they have > the same "mean" components (e.g., contain the same predictor variables). If > there are differences in the mean components, you need to use ML, because > (very loosely speaking) REML is based on estimating the variances conditional > on the mean. I have come across examples where the results using ML and REML > (accidentally) are quite different, with ML being correct. This is a good point. But in many cases a REML analysis drops only the grand mean. In effect it is ML, after passing the data through a "filter" which replaces it by the differences of each point from the grand mean for that character. If the grand mean is not of interest, REML should work as well as ML. If we are trying to test some hypothesis about the value of the grand mean, of course REML is inappropriate. An example would be simple one-way ANOVA which is actually a REML analysis, and it does test means of the groups. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Is correlation of PICs, with tip data and each terminal node split into male and female, a valid method?
Dan Rabosky wrote: > A quick addendum to this thread (and I think Ted and others have > discussed this previously) is that, to compare models in this > fashion, you want to make sure you are fitting the models with > maximum likelihood and not restricted maximum likelihood (REML). > THis is not the default for the gls function from nlme, so to > obtain comparable likelihoods, you want > > myModel <- gls(, method = 'ML') > > or something similar. I haven't been following the details of this discussion, but let me just add that (as a fan of REML) that for most tests you can do ML for two models and compare the likelihoods, or you can do REML for two models and compare those likelihoods. The results will usually not be very different. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Show Informative Sites?
Leandro Jones wrote: I just wanted to clarify this so that the outbursts of probabilistic fundamentalism don't confuse the boys that started this post. On my part I was just wanting to ensure that parsimony fundamentalism did not confuse people and get in the way of a reasonable statistical analysis. Joe ---- Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Show Informative Sites?
Leandro Jones -- > Nick's rules won't allways work in a Parsimony context. For example, a > position like this one: > 1 A > 2 A > 3 T > 4 C > would be "informative" under rules (a) and (b), but it is in reality > uninformative, as any of the possible trees have a length of 2. Thus, > this character tells us nothing about _phylogeny_. I disagree. If the only way you can interpret anything is by parsimony, sure. But for statistical phylogenetics, it has information. It works against any phylogeny that has all its branch lengths short, for example. Eliminating that character, not telling the statistical method you did that, and then going ahead with the analysis is a Big Mistake. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Show Informative Sites?
Nick Matzke wrote: > All sites are informative under likelihood, ... and under Bayesian methods, and under distance matrix methods too. If you leave out "uninformative" sites without compensating for doing that by an explicit statistical correction, you will have some very unpleasant surprises. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Geiger's phy.manova returns same p-value for different data
Luke Harmon wrote: That represents the p-value you get when the test statistic for the data is more extreme than any of the 1000 (default) simulations. It's exactly 1/1001. You can probably get a lower p-value if you run more simulations! A good argument for making the default 999 simulations. It was pointed out to me years ago by the late Zygmund "Bill" Birnbaum, who wrote a paper on it, that in a simulation test, under the null hypothesis, the data too comes from the null hypothesis and therefore when you want to have 1000 reps you take the data plus 999 simulated cases. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: Re: R: Re: R: ancestral state reconstruction for tips
Pas said: > What I', trying to do now is writing a R routine to back-calculate > the "expected" branch lengths for the "unusual" critters, given the > fitted ancestral values and tip values of the phenotypes, and > assuming BM, in order to compare the actual branch lengths to the > expected. The ratio of these lengths, if I'm not delusional and > definitely lucky, is a per-lineage rate of phenotypic evolution. > The bottom line is answering the question: how long should the > branch leading to that particular species be if it evolved at the > same rate of its sister species? Good way to approach it. If you can calculate the likelihood of trees, one way would be to not bother fitting any ancestral values: just try different lengths for the branch that connects the fossil to the tree, and see which one maximizes the likelihood. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: Re: R: ancestral state reconstruction for tips
Folks -- I was intending my most recent message to be apologetic -- that I was perhaps overreactive. Certainly Pas has not raised unreasonable objections or been obstructive with my grants! (Others have). Let me raise an issue so I understand him more clearly: Pas, are you saying that you see phenotypes in the fossils that seem incompatible with the Brownian Motion assumption? Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips
David Bapst wrote: > I think Joe is right that realizing a model is an inaccurate or > imprecise description of reality should impel us to develop better > models of the world around us, because this partly how science moves > forward. However, I don't think pointing out that a model is deficient > requires that that person must themselves develop an alternative. > After all, an alternative model that capture a more realistic level of > complexity may not be possible in some situations (it is certainly > possible in trait evolution models, however.) Requiring such a thing > would put too much pressure on scientific whistle-blowers, who play a > very important role in reminding the rest of us that the world is more > than the models we use to understand it and make our predictions. As a theoretician, I am perhaps oversensitive to the folks who, after a lecture in which I present a simple model, triumphantly declare "but you didn't include predator satiation". Then they walk away looking very pleased with themselves. There is a similar problem with the quibblers who inhabit grant review panels, and are always asking me to do much more complicated models that are impossibly hard (and they are not aware how hard they are). Just understand, when you raise legitimate concerns, that us model-analyzers are also used to getting a lot of these unreasonable demands too, and may be grumpy as a result. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips
Pasquale Raia said: > Of course Ted is right, but my problem with this computation, or > with the > simple exercise I was proposing is well another: as a > paleontologist I often > come across pretty exceptional phenotypes (dwarf hippos and > elephants, huge > flightless birds, to make a few examples). When you use methods > like this (I > mean Garland and Ives') and compare the output with those > phenotypes, as I did, > you immediately realize what the the bottom line is: no matter if > they are > nodes or tips, by using the expected (under BM) covariance the > estimated > phenotypes are dull, perfectly reasonable but very different from > anything > exceptional you may find yourself to work with. This is why I feel > it is > difficult to rely on those (unobserved) values to begin with. I think that what is being said is that Brownian Motion is too sedate a process and does not predict some of the large changes actually seen in the fossil record. That's a legitimate point but does put the onus on the maker of the point to propose some other stochastic process that is tractable and has these large changes (and that fits with known Mendelian and Darwinian mechanisms). Just complaining that the Brownian stochastic process is no good is insufficient. If we want to add the fossils to the calculation, then they will of course pressure the Brownian Motion process to change more in their vicinity, which may help some. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] comparative methods at subspecies level
Paolo Piras wrote: > I write you in order to ask a phylosophical opinion in > applying phylogenetic comparative methods to > subspecies belonging to the same species for which a > phylogenetic/phylogeographic scenario is known on the > basis of molecular studies. I have a number (>10) of > population samples belonging to two species at > continental scale. Some OTUs are surely NOT panmittic > with the others while for some others this is not > sure. Someone could argument against the use of > comparative methods when a gene flow is still present > among OTUs. However, I think that, if possible, AT > LEAST the covariance due to populations relationships > should be removed. You dont think that it is always > better to take in account all possible relatedness > betwen OTUs? In this paper > > D. C. BLACKBURN,G. J. MEASEY. 2009.Dispersal to or > from an African biodiversity hotspot? Molecular > Ecology; Volume 18, Issue 9, pages 19041915, May > 2009. > > the authors apply common comparative methods to a > phylogeography of populations belonging to ONE > species. I found their strategy appropriate. I wrote > to ask your opinion about this issue. Using standard tree-based methods to treat a single species which has populations that exchange migrants is inappropriate. In a 2002 paper I discussed how to do this properly using contrasts which are derived from the eigenvectors and eigenvalues of (an estimate of) the migration matrix between populations. Such migration matrices can now be inferred using programs LAMARC and Migrate. The paper is here: Felsenstein, J. 2002. Contrasts for a within-species comparative method. pp. 118-129 in "Modern Developments in Theoretical Population Genetics", ed. M. Slatkin and M. Veuille. Oxford University Press, Oxford. Here http://evolution.gs.washington.edu/papers/spectrum/ is where you can get a PDF of a preprint version of it. The method is univariate at the moment but a multivariate version is under development -- the univariate version can be used to see whether a character has significant covariation with an environmental measurement. The method is also described in a recent paper by Stone, Nee and me: Stone, G. N., S. Nee, and J. Felsenstein. 2011. Controlling for non-independence in comparative analysis of patterns across populations within species. Philosophical Transactions of the Royal Society B 366 (1569): 1410-1424. which discusses the general issue. The (other) authors preferred, for greater comprehensibility, to describe my method slightly incorrectly (I got outvoted). Anyone considering this issue should consider these two papers carefully. Of course mixing within- and between-species analyses is more difficult yet. I hope to release an R package later this year to do the one-character analysis (it is not too hard to put one together yourself in the meantime). Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
Paolo Piras wrote -- Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? The contrast algorithm if continued to the root, makes the correct ancestral reconstruction for the root. You are correct that values for higher nodes in the tree are not the correct ML reconstruction for those nodes. If the tree is rerooted at any interior node and the algorithm used for that, then that node's reconstruction will be correct. There are ways of re-using information so that the total effort of doing this for all interior nodes will be no worse than about twice that of a single pass through the tree. However people may prefer to use PGLS, which if properly done should give the proper estimates for all nodes. There is some discussion of this in Rohlf's 2001 paper in Evolution. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] comparative analysis using multiple regression of contrasts?
Not only can't I understand the underlying model here, I can't type correctly either: Julien Claude, not Clause. If it's any excuse, S is next to D on my keyboard. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] comparative analysis using multiple regression of contrasts?
Folks -- Julien Clause wrote: for type II sums of squares you are right, however, when there are multiple factors pics and gls usually still provide different F values and p-values even if you set the marginality stuff in the analysis of variance. (they are not much different, but i still wonder why results are the same with one explanatory variable but different when you consider several). As I see it, contrast analyses through the origin are not a so usual regressions since no intercept is estimated: they however result in similar output when only one explanatory variable is included. Although I did not investigate type I and type II error rate when the response was continuous and the explanatory variable was dummy, I still guess that there are still something to do for modifying ancestral character state reconstruction in the contrast analysis for the dummy variable and computing their contrast: it is hard to believe that the brownian model will apply to that dummy variable because the expected variance of such a character can not be properly gaussian, but certainly more following something that has to see with the binomial law and logit family link. I wonder if someone has worked in this direction for contrasts...if not, this is probably something interesting i would try to investigate. I'm trying to understand what sort of model everyone is talking about here (not the details of how to do it in R but what process is assumed). So the "independent" variables are all characters of these species, and they change along the tree by a process of multivariate Brownian motion (with, of course, covariation of their change due to genetic correlations and selective correlation? If you have not yet heard of "selective correlation" please check chapter 24 of my book or review articles of mine going back to 1988). OK. Now what about the response variable (or variables)? Is it assumed that they are also characters that covary with the other variables? In which case, why treat them specially? Why not just infer the mutual covariance matrix of phylogenetic change of all variables and use that? Because in that model the means and covariances of the full set of characters are the sufficient statistics, and anything else interesting that you want to estimate must just be functions of them. Or is it assumed that the response variable is not a character of the species? That the error of its regression on the other variables is i.i.d.? Regressing on reconstructed ancestral states seems not correct in any of these cases, as those are not observations and are a linear function of the observed characters at the tip of the tree, in which case one is just making life harder by regressing on that reconstruction rather than on those tip phenotypes. But even that seems wrong if the response variable is a character of the species, as I just noted. Someone please educate me. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulating BM data on phylogeny
Ted wrote -- > Based on zillions of BM simulations we have done with DOS PDSIMUL, > you should never see a shift in mean value (versus starting value > at root of tree), unless you are intentionally modeling a trend. > Something must be wrong. in response to Dean Adams: >> For these simulations I generate a tree (in this > case a >> perfectly-balanced tree) and simulate 100 data sets on the same >> phylogeny using a particular initial BM rate parameter (sigma). ... >> However, the most curious finding is that for all methods, as sigma >> increases, so too does the mean trait value across the tips (and > the >> converse occurs as sigma decreases). This observation is curious to > me, >> as one should not see a predictable shift in the mean under > Brownian >> motion. Is it possible that the simulation is doing Brownian Motion on some other scale, such as a log scale? If one does BM on the logs and then looks at the original phenotype scale, you *would* expect that as the variance among species increases, so does the mean of the species means. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples
Emmanuel wrote: Is it a problem with ties or with identical sequences? I guess you can solve the latter easily (eg, using the haplotype function in pegas), and this will solve the vast majority of ties. Other cases of ties will certainly not result in such high bootstrap values (that's my intuition). My intuition disagrees with this. If several sequences are nearly identical, but each differs from their consensus at K sites, them if the tree-making algorithm does not randomise addition order of species (or otherwise somehow randomize the resolution of ties), there are likely to be artificially high bootstrap values even with nonzero branch lenghs. 1. Dropping identical sequences is a good thing to do, especially with a distance-based method. One issue is that after bootstrap sampling, some sequences that were not identical may become identical. So the tree-making method ought to be able to handle identical sequences. 2. A high bootstrap support (or another form of support) associated with a zero-length branch is an indication that something's wrong there. Again, it can be a problem even when the branch lengths are nonzero. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples
Klaus Schliep wrote -- > it is not that surprising. NJ normally does not produce poytomies, > just edge weights of length 0. How these are broken may depends from > the input order (from labels in the distance matrix like in this > implementation) or could be broken randomly. I added some code below > to highlight it. When there are ties in the distance matrix, NJ and UPGMA can lead to a situation where the ties get resolved in an arbitrary way dependent only on the input order of species. This can lead to excessive support for group AB in a situation where there are three species ABC and their further resolution is arbitrary. See Backeljau et al. MBE 1996. Farris et al. (Cladistics, 1996) pointed out that this can lead to strong support for AB that is illusory. (I commented on this on pages 168-169 of my phylogeny book). In PHYLIP we have put code in that, when multiple bootstrap replicates are being analyzed, automatically defaults to randomizing the order of species in the data set. That solves the problem, and is perhaps what PAUP* does too. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] simulating trees
Back on April 27 in the discussion of how to simulate a birth-death process that starts T units of time ago (where that is not the first split time) Luke Harmon wrote: > That seems correct to me. The probability that the tree died out before the > first speciation event (which is excluded, as Liam says) is d/(b+d). in reaction to Liam's statement that > > In both cases, you get only trees that did not go extinct along the root > edge (that is, you exclude trees that went extinct before the first > speciation). I think that any program doing the simulation necessarily conditions on not losing all lineages before the present. But that is different from conditioning on the original lineage dying before it gets a chance to speciate. While that is d/(b+d), we need to also exclude the more complicated case that also allows for the possibility that some speciations occurred, but that all of those lineages then died before the present. Kendall (Annals of Mathematical Statistics, 1948), in the original solution of the birth-death process gave that probability (see my phylogeny book, page 565 which gives the opposite, the probability of the descendants of one species *not* dying out): P(n = 0 | t) = (d*exp((b-d)*t) - d) / (b*exp((b-d)*t) - d) Interestingly, Kendall (1948, p. 5) actually cites some earlier people who got these formulae. I cited only Kendall, but should have looked at his paper rather than secondary sources, which cite only him. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo