[R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023

2023-03-06 Thread Joe Felsenstein
A reminder: applications due March 31.

The Evolutionary Quantitative Genetics Workshop for 2023 will take
place at Friday Harbor Laboratories of the University of Washington
from June 4-9.  The workshop will review the basics of theory in the
field of evolutionary quantitative genetics and its connections to
evolution observed at various time scales. One aim of the workshop is
to build a bridge between the traditionally separate disciplines of
quantitative genetics and phylogenetic comparative methods. The
workshop involves lectures, discussions and in-class computer
exercises.  This workshop has been given yearly since 2011, and at
Friday Harbor Laboratories since 2017. It was canceled in 2020, and
given as an online workshop in 2021 and 2022.  It is intended for
graduate students, postdocs, and junior faculty.  Information on the
subject area, lecturers, costs and application form will be found at:

https://fhl.uw.edu/courses/course-descriptions/course/evolutionary-quantitative-genetics-workshop-2023/

and more information, including details of the Workshop in recent
years, will be found at

https://eqgw.github.io

Joe Felsenstein and Steve Arnold
-- 
Joe Felsensteinfelse...@gmail.com,   felse...@uw.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA
-
 PS: please do not use  j...@gs.washington.edu, which is an alias
 that some mail systems now mistake as indicating spam.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Evolutionary Quantitative Genetics Workshop 2023

2023-01-27 Thread Joe Felsenstein
K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP20eBi7Dw$
>
> Introduction to Bayesian modelling with INLA (BMIN02)
> May 22nd - 26th
> https://urldefense.com/v3/__https://www.prstatistics.com/course/introduction-to-bayesian-modelling-with-inla-bmin02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP09ie8kTQ$
>
> Reproducible and collaborative data analysis with R (RACR02)
> June 13th - 15th
> https://urldefense.com/v3/__https://www.prstatistics.com/course/reproducible-and-collaborative-data-analysis-with-r-racr02/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP2JO-LW_Q$
>
> Species Distribution Modelling With Bayesian Statistics Using R (SDMB05)
> September 4th - 8th
> https://urldefense.com/v3/__https://www.prstatistics.com/course/online-course-species-distribution-modelling-with-bayesian-statistics-using-r-sdmb05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP0ZmkHHkg$
>
> Multivariate Analysis Of Ecological Communities Using R With The VEGAN
> package (VGNR05)
> September 18th - 22nd
> https://
> https://urldefense.com/v3/__http://www.prstatistics.com/course/multivariate-analysis-of-ecological-communities-using-r-with-the-vegan-package-vgnr05/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP3z5_1e7g$
>
> --
>
> Oliver Hooker PhD.
> PR statistics
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1_L29Apw$
> Searchable archive at 
> https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!K-Hz7m0Vt54!lAc-_tyBKyUS0BwdXHfS_bddA8mSgcsrMp6Nly1GJM-UkNMs09pABZP0gIb89WvlGmsVldrV5B-5tMdJeYl2P7XvDP1XIWsucQ$



-- 
Joe Felsensteinfelse...@gmail.com,   felse...@uw.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA
-
 PS: please do not use  j...@gs.washington.edu, which is an alias
 that some mail systems now mistake as indicating spam.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic correlation analysis

2021-06-13 Thread Joe Felsenstein
Whether one gets them from PGLS directly or from contrasts, one can
get correlations by just inferring the covariance matrix and then
calculating
r(x,y) =  Cov(x,y) / (Var(x) Var(y))^(1/2)
where of course Var(x) is also  Cov(x,x), and so on.
You would not need a separate run to get correlations.

Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?

2021-05-27 Thread Joe Felsenstein
Gabriel Ferreira explained:


> I will try to better explain my problem, and I really appreciate your time
> to help me with this issue.
> My study is a conventional ecomorph with linear and univariate
> measurements
>

> So... Some of my traits are linear measures that can and must be
> "corrected" by body size, such as tail length. I usually conduct such body
> size corrections with phylogenetic regressions using *gls *func. from
> *nlme*:
>

 lots of details ...


> So I could use the residuals of this regression in the phy ANOVAs ... Does
> it make sense?
>


Well, I am afraid I am lost.  Perhaps someone else here could explain the
issues to me ...


Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?

2021-05-27 Thread Joe Felsenstein
If wants residuals of values of a trait in each species, taking into
account within-species variation and phylogeny, what does it mean if those
residuals correlate with those of some other character, or with an
environmental variable?

Just asking which R machinery to use might wait until it is clear what the
intended task is and why that makes sense.

Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Joe Felsenstein
Marguerite asked:

> First - Joe - what do you mean by log(grams) has no units? The units of grams 
> is a unit, so log(mass) will have units of log-gm.  As log is not the same as 
> 1/gm, log(gm) cannot be unit-free.

I looked it up on Wikipedia, and was assured by it that Marguerite is
right, log(weight) has the same units as weight, except you're
supposed to call them log-gm.


I do think that unless there is a calibration with time, the tree
branch lengths are not in units of time but in units of base
substitutions per site.

And I remain confused on what the units are, if you compute a linear
combination such as  2 log(wt) - 3 log(height). Which princip[al, le]
components machinery does.

Joe
--
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Joe Felsenstein
Ted Garland asked:

OK, Joe, that's for one trait at a time.
> Would you please continue your discourse, but extend to multiple traits
> and their covariances
>

OK, assuming that’s not a joke which it seems it was.  If all characters
are log of something, their variances all have units of sites squared per
square substitution.


But if you have different units in different characters each variance would
have units. (CharXunit) squared times sites-squared per square
substitution.  A covariance would be (CharXunit times CharYunit) times
sites squared per square substitution.

If you had a principal component (usually misnamed a “principle” component)
it is in terms of a linear combination of characters, and I am deeply
puzzled how to give their units as they mix them.

Joe

-- 

Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Joe Felsenstein
Folks --

If the trait is  log(grams)  then the trait is unit-free.   The "time"
is probably branch length from a phylogeny.  That in turn (from DNA
data) is usually   DNA substitutions per site.

So the units of the standard deviation aresites per substitution.
But this is not the standard deviation, it is its square.

So (wait for it ...)square sites per square substitution

Now that is pretty weird.  But years ago people pointed out to me that
quantitative geneticists were accustomed to inferring variance
components of crop yield.  The yield might be in  bushels per acre.
So the units of its variance was:  square bushels per square acre.
Don't even try to think about how you square a bushel, or how many
dimensions you have to go into to square an acre.   Actually, you can
think about them: a bushel is three-dimensional volume, and an acre is
two dimensional area.  So crop yield has units of  meters, and
variance of crop yield should have units of  square meters.

That way lies madness ...

Joe
-
Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] collapsing sets of nodes based on label values

2020-06-02 Thread Joe Felsenstein
Karla Shikev --

> I just want to take a tree with node labels (e.g.
> bootstrap values from a RAxML analysis) and collapse all nodes with support
> values below, say, 90%.


People here have given you various ways of doing this in R.  I am not
very familiar with the R packages, but just wanted to add one more
way.  If you have the original set of trees that were used to get the
bootstrap values, you can do a version of the bootstrap that makes a
tree which shows only the groups that are present in 90% or more of
the input trees.  This is called an M(0.90) consensus tree (the 0.90
is in a subscript which I can't typeset in an email).  It is available
in the program Consense in my PHYLIP package, and also in some other
phylogeny packages.   If you prefer R you can use Liam Revell's
Rphylip package, together with installing a copy of PHYLIP, as Rphylip
serves as a front-end for PHYLIP within R.

J.F.
-----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Multivariate ASR with Discrete Characters

2020-03-04 Thread Joe Felsenstein
William Gearty --

>
> Thank you Don and Julien for your suggestions. I was able to run threshml
> using Rthreshml from the Rphylip package. However, now I'm not really sure
> what to do with the results. Can I use the output to perform ancestral
> state reconstruction/estimation for the continuous (3) and discrete (1)
> characters? I see that phytools has ancThresh, but that only seems to work
> with a single character.

I don't know what are the limitations of Rphylip's interface
with Threshml, but I can address Threshml itself -- as I
wrote it.

In principle the liabilities sampled at the interior
nodes of the tree could be used to do ancestral
reconstructions at those nodes, in a very straight-
forward way.  Just consider them, not the rest of
the tree, and make a histogram of them at the node.

However I never bothered to put in any interface
to do that.  However note that what Threshml does
is to transform the liabilities to independent variables,
using the current best estimate of the covariance
matrix of liabilities.  The Gibbs sampling (and at the
tips, the Metropolis/Hastings sampling) occurs in
the independent characters.  Then you have to
transform back to the liabilities before you have
samples for those at the interior nodes.

So matters are not simple, but in principle it can
be done.  Note that for continuous traits Threshml
can also be used to sample values at the interior
nodes.  Of course this is a lot more computation
than just using likelihood computations -- I've
checked and the sampling does infer the same
covariances in that case.  In this all-continuous
case the same issue of transforming to independent
characters also comes up.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic varying slopes and intercepts model

2019-09-10 Thread Joe Felsenstein
Oscar Inostraza --

Is the variable  x also evolving on the tree?  If so you need to use
standard phylogenetically-informed comparative methods to estimate the
variances and covariances of changes in both characters.

You may not be able to assume that  y   responds instantly to  x.

J.F.

Joe Felsenstein, j...@gs.washington.edu
Department of Genome Sciences, University of Washington, Box 355065,
Seattle, WA. 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Nantucket Phylogenetics DevelopeR Workshop

2019-08-08 Thread Joe Felsenstein
>
> We are very happy to announce the 2nd edition of a graduate-level
> workshop on phylogenetic method development R. The course will be four
> days in length and take place at the University of Massachusetts
> Boston's Nantucket Field Station from the 5th to the 8th of November,
> 2019.

Between this and the Workshop on Molecular Evolution (Woods Hole), the
Applied Phylogenetics Workshop (Bodega Bay, California), and the
Evolutionary Quantitative Genetics Workshop (Friday Harbor
Laboratories), there seems to be a major competition to see who can
have their course in the most picturesque marine laboratory.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Link between Mahalanobis distance and PGLS

2019-01-04 Thread Joe Felsenstein
Guillaume Louvel asked:

>
> So my first question is: can we directly apply the Mahalanobis distance
> to measure a kind of "phylogeny-corrected" distance between 2 vectors of
> trait values for a list of species? Since we assume a brownian motion,
> we know these vectors should be drawn from a multivariate normal
> distribution with known covariance matrix. Therefore the Mahalanobis
> distance seems perfectly appropriate to me, is it the case?

It is appropriate.  In fact this is in effect what
regressing contrasts in trait Y on contrasts in
trait X is doing.  One can alternatively use a
multivariate regression appoach, which is what
Alan Grafen (1989) did, and the results are
the same either way (in the simplest cases).

Note that although the contrasts can be
treated as independent observations, that is
not true for the tip species values  -- the
Grafen "Phylogenetic Regression" does not
treat the tip values as independent, and for
the same reason pairwise distances between
tips are not independent.


>
> I don't want to do a statistical test per se, I am rather interested in
> ranking many traits according to their distance to a pattern of reference.

I am unclear about what that means.

>
> My second subsidiary question is: can I apply this Mahalanobis distance
> if my traits are binary (e.g. presence-absence of some sequence in the
> genomes). In that case I know that my trait is not multivariate normal,
> but considering that I have millions of traits, shouldn't I expect the
> whole set to have some normal characteristics?

Basically no.  Although people have approximated
binary traits by Gaussian variables (I think Paul
Harvey and Mark Pagel did in their 1991 book),
it is much more appropriate to use a threshold
model.  See my 2012 paper in American Naturalist
or the earlier 2005 sketch of the method in
Proc. Royal Society of London series B.

A  good paper agonizing about all this is:

Maddison WP & FitzJohn RG. 2015. The unsolved challenge to
phylogenetic correlation tests for categorical characters. Systematic
Biology 64: 127–136

though I'd say that the problem is not as "unsolved" as they think.



> Finally, if none of the approach above is justified, is there a
> multivariate phylogenetic method for discrete/binary traits? Some kind
> of adapted phylogenetic PCA ?

See above.  It does require MCMC, and cannot
simply be done with distances.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Extended Majority Consensus Topology (Allcompat Summary) in R? And some observations on ape's consensus() function

2018-10-27 Thread Joe Felsenstein
David Bapst asked:

>
> I was interested if anyone was familiar with R code that can estimate an
> extended majority consensus tree (referred to as an 'allcompat' tree by the
> sumt command in MrBayes)? This is a fully bifurcating summary of a tree
> posterior, where each clade is maximally resolved by the split that is most
> abundant in the considered post-burn-in posterior (i.e., that split which
> has the plurality, if not the majority - the highest posterior probability
> of any other competing, conflicting splits recovered within the posterior.
> So, I guess one could also call these plurality consensus trees...).

You can also get the Extended Majority Rule consensus tree from the
Rconsense function in Liam Revell's package Rphylip, which is calls
programs from my PHYLIP package.  Consense in PHYLIP does output, in
addition to the consensus tree itself, a list of partitions found in
the input trees, and the frequency of each.  Rconsense may be able to
do that too.

Speaking as the one who defined the EMR method, I need to make a
warning:  EMR makes a tree by ordering the partitions (splits) in
order of their frequency.  To make Margush and McMorris's Majority
Rule consensus tree one simply goes down this list and takes all the
partitions that occur more than 50% of the time.  EMR continues
further, in order to resolve the tree fully.  It keep accepting
partitions as long as they don't conflict with anything already
accepted.

But this means that two partitions could be found (say) 45% of the
time each.   They could both be compatible with the partitions in the
majority-rule tree, but in conflict with each other.  Which then gets
included then depends only on which one is encountered first as one
goes down the list of most-frequent partitions.  And that will just be
a matter of things like the order in which the tree containing them
occurs among the input trees.  That is one of the limitations of the
EMR method.

Note also that EMR is subtly different from finding the largest set of
split (partitions) that are all mutually compatible.  That will often
be the same, but not always.  The latter is called the Nelson
Consensus Tree.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Estimating marginal ancestral states under a non-reversible model of evolution.

2018-09-18 Thread Joe Felsenstein
Jordan Gault asked:

> I would like to do a worked example for an unequal-rates model as well but my 
> understanding is that the re-rooting method is inappropriate for 
> non-reversible models of trait evolution. How are marginal ancestral states 
> estimated for non-reversible models of trait evolution?

Why do "unequal rates" make the model nonreversible?

I assume that by unequal rates you mean not
rates different in different sites, but forward and
backward rates different at a single site.

If that is the case, the equilibrium frequencies
of the two states become correspondingly
different, and the model is still reversible.  The
tree can be rerooted at any interior node
and the marginal ancestral states at that node
found by the usual likelihood "pruning"
algorithm.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] estimate ancestral state with OUwie models

2018-06-12 Thread Joe Felsenstein
Let me add more warnings to Marguerite and Thomas's excellent
responses.   People may be tempted to infer ancestral states and then
treat those inferences as data (and also to infer ancestral
environments and then treat those inferences as data).  In fact, I
wonder whether that is not the main use people make of these
inferences.

But not only are those inferences very noisy, they are correlated with
each other.  So if you infer the ancestral state for the clade (Old
World Monkeys, Apes) and also the ancestral state for the clade (New
World Monkeys, (Old World Monkeys, Apes)) the two will typically not
only be error-prone, but will also typically be subject to strongly
correlated errors.  Using them as data for further inferences is very
dubious.  It is better to figure out what your hypothesis is and then
test it on the data from the tips of the tree, without the
intermediate step of taking ancestral state inferences as
observations.

The popular science press in particular demands a fly-on-the-wall
account of what happened in evolution, and giving them the ancestral
state inferences as if they were known precisely is a mistake.

Joe
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Detection association between traits

2018-03-28 Thread Joe Felsenstein
Rafa Braga asked

>
> Does anybody know how to detect association between two traits while removing 
> the influence of the third one? I have three traits which are named a b and 
> c. I'm wanting to test the correlation of a and b while controling c as a 
> covariate. In other words, the correlation between trait a and b can be a 
> result of the correlation between b and c and I'm wanting to remove the 
> influence of c on b.


This goes all the way back 100 years.

Once you have inferred the variances and covariances of
the three traits, you can compute the partial correlation
between  a  and  b.

This is defined as the correlation between the residual
of  a  when predicted by  c,  and the residual of  b  when
predicted by  c.

When there is only one variable  c,   and when
r(a,b)  is the correlation of  a  with  b, then the
partial correlation is

[ r(a,b) - r(a,c) r(b,c)] / [(sqrt(1 - r(a,c)^2) sqrt(1 - r(b,c)^2)]

For likelihood ratio testing in a linear model,
one could compare the likelihoods of models
that did or did not have direct connection of
a  and  b.  RA Fisher also derived a distribution
of the partial correlation coefficient.


Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA


>
>
>
> Yours,
> Rafa
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic PCA and measurement error

2018-03-12 Thread Joe Felsenstein
Rafael and Liam --

>
> So far as I know, there is currently no way to explicitly take into account
> sampling error in computing principal components while also accounting for
> the phylogeny. However, it is relatively straightforward to compute scores
> for individuals from a PCA conducted on species means.
>
> This would look as follows (in which Xm is a matrix containing values for
> species for each trait, and Xi is a matrix with the same number of columns
> but containing values for individuals):
>
> pca<-phyl.pca(tree,Xm)
> Si<-Xi%*%pca$Evec
>
> Then, if you have a separate vector containing species ID as a factor, you
> could compute means and variances for each component by species.


There are methods (not all implemented in R) for taking into
account the within-species phenotypic covariation among
individuals and also the evolutionary covariances between
species (on a phylogeny).  These include the method of
Ives, Midford, and Garland (2007) and my own method
(2008).  The former assumes you know the within-species
covariance matrix, the latter estimates it from the sampled
individuals for each species.  Both of these assume that the
true within-species phenotypic covariance matrices are the
same for all species.

For my method, you can use Liam's package Rphylip to
call my program Contrast if you also have PHYLIP installed.

Once you infer these covariance matrices, those are the
relevant sufficient statistics (if the distributions are
multivariate normal).  The PCA axes for either covariance
matrix can then be computed from those, or the
canonical variates axes, which are the principal components
of the between-species covariation relative to the
within-species covariation.

You don't need to use the PCA machinery until after you
estimate these two covariance matrices.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic signal with sample sizes but no standard errors

2017-08-04 Thread Joe Felsenstein
There is also my C program Contrast, which implements a method from a
2008 paper I wrote:

Felsenstein, J.  2008.  Comparative methods with sampling error and
within-species variation: contrasts revisited and revised. American
Naturalist 171: 713-725.

This estimates the within-species covariances and the between-species
evolutionary variances.It is not an R program but can be accessed
through Liam Revell's package Rphylip, if you also have my (non-R)
package PHYLIP installed.

A pretty good (but not ML) estimate of the within-species phenotypic
variance can be gotten by pooling the within-species sampling error.
The harder part is using that to correct one's estimate of the
covariances of the between-species change, which using ordinary
methods will have some within-species variation mixed in.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylip file compression ratios

2017-04-18 Thread Joe Felsenstein
Jacob Berv noted:

>
> I noticed today that the compression ratio for an interleaved phylip file 
> (zip compressed) was about 84:1, (390MB uncompressed —> 4.6MB compressed) 
> whereas the compression ratio for the same data non-interleaved was a much 
> worse 3.4:1 (390 MB uncompressed —> 113.9 MB). Not knowing much about how zip 
> compression actually works - I thought this might be an interesting 
> observation for the group…


Interleaved sequences have blocks of (say) 50 bases.  Successive lines
may repeat a whole block or nearly repeat it.  I wonder whether that
makes the interleaved format easier to compress.

I would guess that the compressibility of interleaved sequences would
be highest when the sequences are closely related.  In that case there
would be 50-base blocks of nearly identical sequences.  With less
closely related sequences the compressibility should be much lower.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Average Aminoacid Identity tree with bootstrap support. Is it possible?

2017-04-07 Thread Joe Felsenstein
Salvador Espada Hinojosa --

The problem, of course, is that a random substitution on an interior
branch of a tree increases or decreases the size of more than one
distance in the distance matrix.  The distances aren't independent in
their statistical noise.  So you can't just sample distances after the
distance matrix is computed.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] HKY GTR distances

2017-02-03 Thread Joe Felsenstein
Emmanuel wrote:

>
> There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez et
> al. developed a procedure to calculate a distance (also in Yang's 2006
> book). An example is given below with the woodmouse:
>
> matlog <- function(x) {
> tmp <- eigen(X)
> V <- tmp$vectors
> U <- diag(log(tmp$values))
> V %*% U %*% solve(V)
> }
>
> tr <- function(x) sum(diag(x))
>
> data(woodmouse)
>
> PI <- diag(base.freq(woodmouse[1:2, ]))
> Ft <- Ftab(woodmouse[1:2, ])
> F <- Ft/sum(Ft)
> X <- solve(PI) %*% F
> -tr(PI %*% matlog(X))
>
> You have to call this code for each pair of sequences (or wrap it in a
> function).
>
> For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol).
>

Strictly speaking there *is* a formula for GTR, which I think may be
equivalent to the above.  The formula is given in my book "Inferring
Phylogenies" on page 209 as equation 13.24.  Note that for both of
these, the equilibrium frequencies of the bases are estimated from the
empirical frequencies in the two sequences.  That means that for each
distance, the equilibrium frequencies are somewhat different, as
different sequences are being used to estimate the base frequencies.
We are not, in the use of these formulas, estimating one overall set
of equilibrium frequencies and then using that for all the the
distances in our data set.

The Rzhetsky-Nei 1994 distance formula is, I believe, an
approximation, though probably a very good one.  It is not quite the
same as the estimate you would get by maximizing the likelihood.

Distance formulas that involve empirically estimated base frequencies,
which all of these do, have in common the problem that one either
estimates those separately for each pair of sequences, or jointly
estimates them from the whole dataset, without using a tree in the
process.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] Fwd: Workshop: UWashington.EvolQuantGenetics.Jun5-9

2017-01-31 Thread Joe Felsenstein
Evolutionary Quantitative Genetics Workshop

Friday Harbor Laboratories, University of Washington, 5-9 June 2017


Non-credit workshop.
Participants arrive at Friday Harbor Labs on Sunday, June 4, lectures and
exercises occur June 5-9, and participants depart on Saturday, June 10, 2017.

Application deadline March 1, 2017.
Application forms and details here:
  http://tinyurl.com/EQG2017
Web page:
  http://depts.washington.edu/fhl/studentSummer2017.html#SumB-genetics

Instructors:

Dr. Joe Felsenstein
Department of Genome Sciences
University of Washington, Seattle
j...@gs.washington.edu

Dr. Stevan J. Arnold
Department of Integrative Biology
Oregon State University, Corvallis
stevan.arn...@oregonstate.edu


Previous versions of this 5-day workshop were given at the National
Evolutionary Synthesis Center (NESCENT) at Duke University in Durham, North
Carolina from 2011-2013, and then at the National Institute for Mathematical
and Biological Synthesis (NIMBioS) at the University of Tennessee in Knoxville
from 2014-2016. Like past versions, the Friday Harbor version will review the
basics of theory in the field of evolutionary quantitative genetics and its
connections to evolution observed at various time scales. The aim of the
workshop is to build a bridge between the traditionally separate disciplines of
quantitative genetics and comparative methods.

Quantitative genetic theory for natural populations was developed considerably
in the period from 1970 to 1990 and up to the present, and it has been applied
to a wide range of phenomena including the evolution of differences between the
sexes, sexual preferences, life history traits, plasticity of traits, as well
as the evolution of body size and other morphological measurements. Textbooks
have not kept pace with these developments, and currently few universities
offer courses in this subject aimed at evolutionary biologists. Evolutionary
biologists need to understand this field because of the ability to collect
large amounts of data by computer, the development of statistical methods for
changes of traits on evolutionary trees and for changes in a single species
through time, and the realization that quantitative characters will not soon be
fully explained by genomics. This workshop aims to fill this need by reviewing
basic aspects of theory and illustrating how that theory can be tested with
data, both from single species and with multiple-species phylogenies.
Participants will use R, an open-source statistical programming language, to
build and test evolutionary models.

The workshop involves lectures and in-class computer exercises. You can
consult the 2016 tutorial website for examples, it will be found at
  http://www.nimbios.org/tutorials/TT_eqg2016
The intended participants for this
tutorial are graduate students, post-docs, and junior faculty members in
evolutionary biology. The workshop can accommodate up to 35 participants. Guest
instructors will include:

* Marguerite Butler, Biology, Univ. Hawai'i, Mānoa
* Patrick Carter, Evolutionary Physiology, Washington State University, Pullman
* Adam Jones, Biology, Texas A&M University, College Station
* Brian O'Meara, Ecology & Evolutionary Biology, Univ. of Tennessee, Knoxville
* Josef Uyeda, Biological Sciences, Virginia Tech, Blacksburg

Cost: $1000 to be paid to Friday Harbor Laboratories. This fee will cover
housing and meals at FHL and all other workshop expenses, except travel.
Participants who have been admitted to attend will make their payment prior to
arrival at FHL. Details of payment by credit card or check will be provided
once the applicant has been admitted to attend.


J.F.
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] statistical tests for phylogenetic independent contrasts

2016-08-03 Thread Joe Felsenstein
Vivek --


>
What we are confused about is how to analyze independent contrasts when we
> analyze "one trait at a time".
>
> In our previous works, we have done phylogenetic correction using 'ape'
> package in R (Paradis et al., 2004) for computing correlation between pairs
> of traits. But in present case, we are not sure how to proceed for
> statistical tests if we obtain independent contrasts (from means for each
> species?) for a given trait?
>


You have 20 traits, but you insist on analyzing them one at a time, so that
you won't know about any covariation between traits?   Not knowing that is
a positive objective of your study?

The vectors of  the individual standardized contrasts are independent
multivariate quantities.  The contrasts for different traits covary, but
the multivariate vectors are i.i.d.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA


-- 

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] simulating continuous data

2016-05-10 Thread Joe Felsenstein
Bryan McLean --


> I’m working to simulate multiple continuous characters on a known
> phylogeny (using several of the standard models), and I want to compare
> properties of the simulated datasets to an empirical dataset. My question
> is: what is the standard method for ensuring that those datasets
> (simulated, empirical) are actually directly comparable, i.e. scaled
> similarly? Does this involve specifying a sensible root state (e.g.
> ancestral reconstruction) OR just rescaling one or the other datasets
> before or after the analysis? Forgive me if this is a bit of a naive
> question, just trying to get a sense of standard practices.
>

It would seem to depend on what you consider to be "scaled similarly".

When I simulate multiple characters with correlated Brownian Motion, I
specify a covariance matrix for the evolutionary changes, as well as a
starting vector of means.  Using a matrix square root of the covariance
matrix, one can transform the characters so that the covariance matrix of
the new characters is an identity matrix.  Those are easy to simulate up
the tree, and then one transforms back to the original characters.

I do this with my own C programs, but it can be done in R too.

But what does it mean to be "scaled similarly" ?

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?

2016-04-14 Thread Joe Felsenstein
Ted commented:


> Nice shot across the bow!
>

Well, thanks but I have been shooting and shooting and shooting, with
regard to issues like regressing one character on another and then
analyzing only the residuals phylogenetically, and in this case issues like
not thinking about how the environments changed along the tree.   Not much
effect so far so you can expect more shots across the bow.

Joe
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] BiSSE/MacroCAIC on Non-Ultrametric Tree with Polytomies?

2016-04-14 Thread Joe Felsenstein
Brian Gill --

>
U between a discrete binary predictor
> (Latitude: Colorado/Ecuador) and a continuous response (species richness).
>
> e at the influence of a discrete
> binary trait on richness, but I'm not sure if my tree is suitable or if
> there is a better approach.
>
> 1) Diversitree package BiSSE
> 2) Caper package using MacroCAIC
>
> Any suggestions would be greatly appreciated-

And what do these methods assume about how that "discrete binary predictor"
evolves along the tree?

Joe
-
j...@gs.washington.edu
Joe Felsenstein, Department of Genome Sciences and Department of Biology
Box 355065, University of Washington, Seattle, WA  98195-5065

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Testing for relationship between one categorical and one continuous variable in a phylogenetic framework.

2016-04-09 Thread Joe Felsenstein
Liam --

Thanks, I had a "senior moment" and have been corrected:  Rphylip, not
Phytools.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Testing for relationship between one categorical and one continuous variable in a phylogenetic framework.

2016-04-08 Thread Joe Felsenstein
Sean --

... or, if you want to do it *really correctly*, you can use the threshold
model of Sewall Wright for the discrete character and use the MCMC approach
that I proposed in 2012:

Felsenstein, J.  2012.  A comparative method for both discrete and
continuous characters using the threshold model. American Naturalist 179:
145-156.

which is implemented in my program Threshml which can be called from Liam
Revell's Phytools R package.  It also works for multiple threshold
characters and multiple continuous characters.

Joe
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Determining Order of Trait Evolution

2016-04-04 Thread Joe Felsenstein
Gavin Leighton asked:

> >
> > > I have 500 trees of 80 species downloaded from birdtree.org and am
> > primarily interested in two traits. I have used PGLS to determine the
> > traits are related but would ideally like to test if there is an order to
> > trait evolution. To complicate matters one trait (Trait A) is continuous
> > while the second (Trait B) is presence/absence. I was hoping someone
> could
> > direct me to methods (assuming they exist) that would allow me to
> determine
> > the estimated value of Trait A before a gain of Trait B evolves in a
> > lineage.
> > >
>

A superior model (if I do say so myself) is the threshold model of Sewall
Wright (1934).  I have described in a paper in American Naturalist in 2012
how to use it to model discrete traits on a phylogeny.  It allows the case
of one trait continuous and another discrete.

It is implemented in my program Threshml, which should be callable from
Liam Revell's "phytools" R package.

However it does not precisely answer the question you posed, but just asks
whether the two traits evolve in a correlated fashion.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] [MORPHMET] model II regression statistics PAST

2016-03-18 Thread Joe Felsenstein

A few days ago, Jim Rohlf said, in response to Patrick Arnold's query about how 
to test the significance of major axis "regression"

>
> I don't know about how to do it in PAST, but note that because the slope of 
> the standard major axis "regression" line is just the signed ratio of two 
> standard deviations, its square follows the F-distribution if one assumes 
> that the two variances are based on random samples from two normally 
> distributed populations. Thus a test for a slope = 1 corresponds to a 
> 2-tailed test for the equality of variances.  One can easily look up 
> probabilities in an F-table or compute confidence limits using standard 
> methods. Note that if the samples are across different populations or 
> species, as in many (most?) morphometric applications, then these assumptions 
> do not hold.


This problem was brought to my attention by Paul Harvey in the late 1970s.
I suggested that he look for a rotation angle (theta) that would maximize the
likelihood under a model in which the two (new) variables are independent
Gaussian variables.  This also allows a likelihood ratio test of the assertion
that theta is zero.  The estimate of theta provides an estimate of the angle of
the major axis.  These can be easily generalized to multiple populations, even
when their variances are unequal.

The likelihood ratio test is done with a test for equality of correlation
matrices which will be found in a multivariate statistics book by Morrison.

I am not sure where Paul published this, but I think he did in an appendix to a
paper of his in some multiauthor volume.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] summary stats for comparative methods p-values

2016-03-10 Thread Joe Felsenstein
If the 100 trees are trees sampled from a Bayesian posterior, or else trees
from bootstrap samples of your data, then you might just take the estimates
from each tree (say estimates of a regression coefficient).  Consider their
distribution and ask whether the null hypothesis value (such as having
slope 0) is in the tail of that histogram.

The overall P value will be the proportion of estimated slopes that are
below zero (unless you want to do a two-tailed test).

Under the null hypothesis this will have the proper rejection probability.
As you take more and more trees the power increases some, but the type I
error rate stays the same.

If the 100 trees are something else, such as the personal opinions of 100
of your friends, then there is no statistical justification for this.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Pairwise Distances

2016-02-18 Thread Joe Felsenstein
Pedro --

> Well, for now I just want to estimate the raw distances. But thinking
about coalescents is certainly interesting. Would you have a suggestion for
either ways of thinking?

Well, you must be wanting to use those raw distances to infer something.
Rates of migration?

There are extensive discussions in the book by John Wakeley.  Also in the
elementary population genetics text by Nielsen and Slatkin.

J.F.
----
Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Pairwise Distances

2016-02-18 Thread Joe Felsenstein
Pedro Taucce --


> Is there a way to estimate pairwise distances between and within groups of
> sequences (in my case, each group is a species with lots of individuals)? I
> used to do it with MEGA, but now I use Linux only and MEGA isn't getting
> along with it.
>
> The closest way I figured out is the function dist.dna from the ape
> package. But I think it does not estimate distances between groups.
>

You want to use distances between groups?  But you don't want to think
about coalescents?

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

2016-02-10 Thread Joe Felsenstein
I have received phishing spam through the R-sig-phylo mailing list
(pretending I had expressed interest in renting something from them) from

anya_phill...@casetotours.xyz

disguised as a reply to my comment.  So that address should immediately be
removed from the list.

Joe
-
j...@gs.washington.edu
Joe Felsenstein, Department of Genome Sciences and Department of Biology
Box 355065, University of Washington, Seattle, WA  98195-5065

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

2016-02-10 Thread Joe Felsenstein
I do not know offhand whether there is an R implementation, but how about
Mark Pagel's 1994 method for testing whether two 0/1 characters changing
along a ohylogeny are changing independently?

J.F.
-
j...@gs.washington.edu
Joe Felsenstein, Department of Genome Sciences and Department of Biology
Box 355065, University of Washington, Seattle, WA  98195-5065

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] testing for variation in rates of evolution among traits

2015-07-17 Thread Joe Felsenstein
Warning: You'd have to ensure that the traits for which you are comparing
rates are evolving independently, so that they do not covary in their
evolutionary changes.

I assume Dean Adams's paper involves some way of coping with this.  The
issue of log-transforms that Ted raised is very important, otherwise big
measurements will tend have higher rates of evolution.

Joe

j...@gs.washington.edu

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)

2015-06-08 Thread Joe Felsenstein
Alberto Gallano asked:

>
> Would this also be the case in the situation where n is small enough (~ <15)
> to enumerate all possible unique permutations? I was under the impression
> that such an 'exact' test provided the true p-value without error.

In a case that small, one might be able to evaluate all permutations.
(There are 1.3 x 10^12 permutations, but maybe you need combinations,
which are fewer).

Yes, you would then have an exact P value.  But of course that is not
the same as an infinitely powerful test -- after all an ordinary
t-test gets you an exact P value when its assumptions are met.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] [MORPHMET] Re: Stability of p-values (physignal and testing for morphological integration)

2015-06-07 Thread Joe Felsenstein
A number of people have suggested that P values should stabilize after a
number of samples (in a permutation test) that depends on the data set.

I suspect that these were unintended misstatements.  As Dennis Slice has
mentioned, one can regard each permutation in the permutation test as a
random sample from a distribution.  Comparing a test statistic X to its
value in the data (say, Y), each permutation draws from a distribution in
which there is a probability P that X exceeds Y.

So each permutation is (to good approximation) a coin toss with probability
P of Heads.  There obviously no number of tosses beyond which the fraction
of Heads "stabilizes".  The fraction of heads after N tosses will depart
from the true value P by an amount which has expectation 0, and variance
P(1-P)/N.  This is a fairly slow approach of the fraction of Heads to the
true value.

So to get twice as close to the true P value, one needs 4 times as many
permutations.  And this need for more and more samples continues
indefinitely.  There is no sudden change as one reaches a threshold number
of permutations.

But that's what you really meant, right?

Joe
---
Joe Felsenstein  j...@gs.washington.edu

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] ancestor vs. change plots

2015-04-24 Thread Joe Felsenstein
Milton Tan asked:

>
> I have a question that is perhaps esoteric, since it's on a method I don't 
> see used often. I am looking at the dynamics of body size evolution, and have 
> come upon ancestor-vs-change plots described in Alroy 2000 ("Understanding 
> the dynamics of evolutionary trends", Paleobiology). This is interesting 
> because it will allow me to see if rate of body size change depends on body 
> size. I haven't seen this method widely used, so for anyone unaware how this 
> works: for each branch, you plot the ancestral state vs. the amount/rate of 
> change along the branch.

If the question were just whether the rate of change of the character
(body size) depended on its value, then another way would be to look
for a transformation of body size that made the instantaneous variance
of change constant.  i.e., does change in  log(size)  or in
sqrt(size)  have constant variance?  There are parameterized
transformation such as   y = (x^p - x^{-p})/(2p)  or elsey = ( x^p
- 1) / p  + 1  for which you could estimate the parameter  p  by ML.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Sister-clade analysis of discrete characters

2015-04-16 Thread Joe Felsenstein
I mentioned earlier the alternative of having two-state discrete
characters each modeled by an underlying quantitative character with a
developmental threshold.

This can in principle be extended to three or more states.  In fact,
Sewall Wright's original example in 1934 had three states.  But there
are two difficulties:

1. How far away from each other the two thresholds would be becomes
something that needs to be estimated.  If there are only two states,
then since the underlying scale is not directly observed, one can
place the threshold anywhere on the underlying "liability" scale, most
conveniently at zero.  But how far from that the second, third,
(etc.), thresholds should be placed then needs estimation as
parameters of the model.

2. It is also not obvious whether they third state uses the same
underlying scale.  For three states, one could have two underlying
"liability" scales, with each state determined by some region in this
two-dimensional space.  Inferring that would be a continuous-state
counterpart to Mary Mickevich's "transformation series analysis" of
1982 (see my book, page 81 for references).

So thus far the threshold model implementations do not allow three or
more states.  But the possibilty should be mentioned here.


Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Sister-clade analysis of discrete characters

2015-03-23 Thread Joe Felsenstein
This is not a sister-clade method but ...

If you are happy with a model in which there are underlying polygenic
quantitative characters, covarying ones whose evolution is modeled by
Brownian Motion model, and some two-state discrete characters arise by
imposing a developmental threshold on each quantitative character, you
might look at my 2012 paper in American Naturalist:
http://www.jstor.org/stable/10.1086/663681
The method is designed for use with a known phylogeny and infers the
evolutionary covariances of the underlying quantitative characters (the
"liabilities").

My C program Threshml analysis this model by MCMC sampling.  It can, I
believe, be called by Liam Revell's R package  phytools.

This threshold model, a phylogenetic version of one which originated in
1934 with Sewall Wright, is mentioned in the Maddison and FitzJohn paper
that William Gearty cited.  It is just briefly mentioned by them, I gather
because the authors forgot about it until the last minute when writing the
review. Nevertheless it is getting increasing use.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] multipage pdf of a huge tree

2015-03-05 Thread Joe Felsenstein
Jonas Eberle --


> thank you for your answer! Actually, I used to make large size pdfs with a
> tiny font size in these cases. It is then even possible to print this
> single-page-pdf on multiple pages in Acrobat Reader. The problem was that
> my current tree is really huge - with about 23000 tips. This took me to the
> limits of this approach, since pdf size is limited to 200 inch by Adobe and
> there also seems to be a lower limit for font size in pdfs (at least during
> export form R?).
>
> I didn't knew that Drawtree is able to split the tree over multiple pages.
> I guess this is an option in the command line version? I will check that
> out. In the mean time I found a way to do the job in R (see my post before)
> that seems to produce reasonable results.
>
>
PHYLIP programs do not enter settings on the command-line.  They have a
menu.  The Drawgram and Drawtree programs have a choice that gives a
submenu in which multiple pages can be selected. This submenu is not
available in the GUI (Java front end) version of those programs, but is
available in the character-mode menu that appears if the program is run
from the command line.  It is the selection made by typing the character
 "#".

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] multipage pdf of a huge tree

2015-03-05 Thread Joe Felsenstein
Jonas Eberle wrote:


> thanks a lot! I didn't knew splitplotTree yet. Great function! However, my
> tree has several thousands of tips (yes, it's a bit crazy but unfortunately
> necessary...) and I guess it's only possible to split it on two pages with
> splitplotTree. Or am I missing something?
>

It's not in R (unless available through Liam Revell's "phytools" package),
but in PHYLIP the tree-drawing programs Drawgram and Drawtree have the
capability of splitting a plot into a rectangular array of plots, and
putting these out onto separate files (not PDFs, but Postscript is
possible).

This was intended to help people make large posters using printers that can
only do a single page.

However ...

I do not see why this is necessary.  Most tree-drawing programs should be
able to write a file that has the large tree plotted on it.  If you don't
want to print the resulting tree on paper, it would then be possible to
view the tree in an application such as Adobe Acrobat Reader and zoom in on
it and see the tiny branches and their labels.  Making multiple plots for
one tree would probably confuse the matter.

Or is there something I am missing here?

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic analysis with sequencing error

2015-01-23 Thread Joe Felsenstein
Brian O'Meara added:


> Something like that should work. Unlike the case of ambiguity, where if you
> see, say, a Y at a site you'd give the probability of a C or a T at that
> site each 1, not 0.5 (Felsenstein's book has a good discussion of this) I
> think you're right that you'd want the probabilities to add up to one
> across bases: with sequencing error, the probability of it being A in the
> sequence given that it truly is A in the species is 99%. I poked around
> phangorn and didn't see an easy way to do this, but it should be possible
> (might require going into the C code to do this).
>

To emphasize what Brian said: the probabilities at a site, in this
computation, are *not* the probabilities of alternative outcomes.  They are
the probabilities of one outcome given four different underlying
situations.  Thus they don't have to add up to 1.  For the ambiguity case,
in my book I warned:  "You may be tempted to make the quantities (1/4, 1/4.
1/4, 1/4), but you should resist this temptation."

For simple models of sequencing error the four numbers can add up to 1.
Can, but don't have to.  If an A is more likely to be misread than a G
(say), then you get numbers that don't add up to 1, and that's OK, you
should not worry about that, because they aren't the probabilities of four
different outcomes.  If we made up a 4 x 4 table where the first row shows
the probabilities of the four observations given that the base really is A,
and second row is the probabilities given that it really is G, etc., then
the four quantities you want at one site at one tip of the tree are the
four numbers in a column.  (Which column depends on what you observed).
The rows have to add to 1; the columns don't have to.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Phylogenetic analysis with sequencing error

2015-01-23 Thread Joe Felsenstein
George Shireff asked:

>
> I was wondering if there is a way to incorporate sequencing error in
> phylogenetic analysis, if I know there is sequencing error but it's
> randomly distributed.
>
> I imagine this could be as simple as setting the probability of the
> observed base at each tip to 0.99 rather than 1, with a probability of
> 0.01/3 rather than 0 for each of the other tips (for a sequencing error of
> 1%). I was hoping that something like pml (phangorn) can be modified to do
> that, but I haven't been able to figure it out myself yet.


How to do this in likelhood computations (and thus in either likelihood or
Bayesian methods) is explained in my book, on pages 255-266 in the
subsection of Chapter 16 called "Handling ambiguity and error".  I think
that this is the first published treatment of that.

My colleagues Mary Kuhner and Jim McGill published a paper in 2014 on this,
with simulations showing that it helps to model the sequencing error:

Kuhner MK, McGill J
Correcting for sequencing error in maximum likelihood phylogeny inference.
G3 (Bethesda). 2014 Nov 4;4(12):2545-52. doi: 10.1534/g3.114.014365.

I believe that one major phylogeny program has very recently added code to
model this.  We have a version of Dnaml for PHYLIP that can do it, and hope
to release it soon.

As George Shireff saw, it is actually easy to do, and the computations are
not any slower.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] hierarchical model with phylogenetic dependence term

2014-12-15 Thread Joe Felsenstein
This question was asked by Peter Smits and by Edwin Lebrija Trejos.  In
Peter Smit's question it was this:


> I have a similar question to Edwin. I too am working with a hierarchical
> Bayesian model, though I've implemented it in stan. I've included a
> phylogenetic random effect term, which is modeled as being distributed as
> multivariate normal with known covariance matrix up to a constant,
> sigma_{phy}. This follows Lynch '91 Am Nat and Housworth et al. '04 Am Nat
> by drawing on the similarity with the "animal model" from quantitative
> genetics.
>
> My question is about the scaling of the covariance matrix: is it necessary
> to have the the diagonal terms satisfy x <= 1 and all the off diagonals to
> be 0 <= x < 1? I have a time scaled phylogeny from which I have my
> covariance matrix, so currently all elements of the matrix are not scaled
> so that the greatest distance from the root to a tip is 1. Currently, the
> elements of the matrix are just the sum of the shared branch lengths. Is
> this appropriate? Why or why not?
>
>
I hope people will correct me on this, but my take is:

1. To infer a variance term for the "phylogenetic random term"
one has to scale that term somehow, and then the variance
will specify how many of those scaling units are in this term.  If
you had the tree depth be 2 instead of 1, the variance inferred
would then be half as great, because it would be saying
how many multiples of 2 rather than how many multiples of 1.

2. I have not had time to read through all of Edwin's code to
see exactly what the model is.  However, note that there is a
distinction between "individual" effects that are on a whole
species, and "individual" effects that are on a single sampled
individual.  The latter are taking into account that the mean
phenotype of each species is only known from a finite sample
of individuals.  So it is taking into account within-species
phhenotypic variation and finiteness of sample sizes.  The
relevant methods there are by Ives, Midford and Garland
(2007) and by me (2008).  Ives et al. assume within-species
phenotypic variances and covariances are known, I give
methods for inferring them.

The methods of Lynch and Housworth are in effect
either assuming a sample size of 1 for each species,
or are considering their "individual" effect to be on the
whole species.

Within-species phenotypic variation can be a substantial
problem, as Ricklefs and Starck (1996) noted.  In their
example, the largest contrasts were between closely-related
species.  They suspected that this was due to within-species
phenotypic variation (which shows up as variance due to
sampling of the individual specimens measured).  It
causes trouble because the small branch lengths between
closely-related species are an inadequate predictor of
how different the species means will be.  In effect, the
model is wrong so some of the changes attributed to
between-species evolution are actually within-species
sampling variation (phenotypic variance).

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Cross-validation with independent contrasts

2014-05-16 Thread Joe Felsenstein
Diego Bilski --

> I'm wondering if it would be statistically/philosophically correct to use
> n-fold cross-validation to evaluate a linear regression with independent
> contrasts. My doubt comes from the fact that when simply dividing the IC
> dataset in, lets say, 10 folds, some folds will remove the contrasts of
> internal nodes without necessarily removing an entire clade above that
> point, producing what can be viewed as two independent clades (a graphical
> example would be, in Felsenstein's seminal paper, fig. 8, remove the
> contrast at node 13, while keeping those at nodes 9 and/or 10).

and Ted Garland wrote:

Couldn't you also just do this back at the level of the original tree and
> tip data, creating subsets by pruning the tree before you compute contrasts?
>

Under the model of multivariate normality with Brownian Motion
change along the phylogeny, the contrasts are i.i.d. so of course
one can use them as points for cross-validation.  But of course,
unless the regression is nonlinear, there is already a parametric
framework for distributions of regression coefficients (and other
associated phenomena) in that i.i.d. MVN framework.

The issue of what entities should be sampled in cross-validation
depends on how, at what level, you expect the model to depart from
multivariate normality with Brownian Motion.  Diego and Ted seem to
have some such expectation but I can't see what that alternative
model would be.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Midpoint rooting routine?

2014-01-30 Thread Joe Felsenstein
Liam said:

> Joe's correct that this is not yet in our project to create an R interface
> for PHYLIP (Rphylip, https://github.com/liamrevell/Rphylip). As soon as I
> get a chance to work on this project today, I will add it.

That may be hard to do, as Retree is interactive and
has two levels of menus.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Midpoint rooting routine?

2014-01-30 Thread Joe Felsenstein
Todd --

> Does anyone know of an existing midpoint rooting routine? I am
> displaying trees and would like to show them as midpoint rooted.
> I've been using the Phangorn package, which does the midpoint rooting
> perfectly, but it has some dependencies that make it unstable on the linux
> machines I've been using. When I looked last, Phangorn was the only package
> I could find with midpoint routing.

Retree in PHYLIP can do that.  Option M once the tree is read in.  You
can write a script to feed the menu settings to the program.  I am not
sure Liam Revell has Retree in his Rphylip project (that makes it easy
to call PHYLIP programs from R).  But you can automate the whole
process yourself without too much difficulty -- see the section on
"Running PHYLIP programs from a command file" in the main PHYLIP
documentation file  main.html,  and see of course also  retree.html.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Sampling from all bifurcating and multifurcating trees

2013-12-24 Thread Joe Felsenstein
Eduardo Ascarrunz  wrote:

> Also, I figured out I could work with unrooted trees. I suspect the procedure 
> would be similar to your rooted method, wouldn't it?

There is a 1-1 correspondence between n-species rooted tree topologies
and (n+1)-species unrooted tree topologies (this is discussed in
Chapter 3 of my book).  So you can simply generate a bifurcating
rooted tree, or a multifurcating rooted tree, of  n-1  tips, and then
unroot it, making the previous root now be species  n.   Then you get
a randomly sampled unrooted tree.

This works for tree topologies but not, I think, for labeled histories.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] R help generating a heat map

2013-12-23 Thread Joe Felsenstein
One can sample from all possible rooted labeled histories by randomly
bifurcating lineages, and at the end assigning the names randomly to the
tips.  Note that a "labeled history" is not the same as a tree topology.

One can sample randomly from all topologies of rooted bifurcating trees
with labeled tips by adding the species one by one to a one-species tree,
each time choosing to split them off from one of the branches.  The
enumeration of all such trees is based on counting all possible ways of
doing this, so choosing one such sequence should choose one of them at
random. Jim Rohlf has a paper with an algorithm equivalent to this in (I
think) Systematic Zoology but I cannot locate it at the moment.

For rooted multifurcating trees a similar method could be used.  In my 1978
paper in Systematic Zoology I showed that each such tree corresponded to
way adding the species 1 to n in order, where at each step we choose
equiprobably from among all branches and all interior nodes (so that if
there are 13 branches and 6 interior nodes we split off one of these 19 at
random). The enumeration algorithm for which this is based is also
discussed in detail in Chapter 3 of my book.

Joe


On Mon, Dec 23, 2013 at 8:57 PM, Eduardo Ascarrunz  wrote:

> Hi Liam,
>
> Thank you! That looks clever. How does this method bias the sampling? I
> think it could be useful for test running my code anyway.
>
> Looking forward your findings, and all the best,
>
> E.
> On 24 Dec 2013 04:49, "Liam J. Revell"  wrote:
>
> > Hi Eduardo.
> >
> > You could try something like this:
> >
> > randomFurcTrees<-function(n,N=1){
> >   foo<-function(n){
> > t<-di2multi(rtree(n,br=sample(c(0,1),
> >   size=2*(n-2),replace=TRUE),rooted=FALSE))
> > t$edge.length<-NULL
> > t
> >   }
> >   trees<-if(N>1) replicate(N,foo(12),simplify=FALSE) else foo(n)
> >   if(N>1) class(trees)<-"multiPhylo"
> >   return(trees)
> > }
> >
> > What this does is it generates random bifurcating topologies (using rtree
> > in ape) with branch lengths that can be 0 or 1; then it collapses all
> zero
> > length branches to polytomies. This is (demonstrably) *not* the same as
> > picking trees at random from the set of all bi- and multifurcating
> > topologies. It's not immediately obvious how you could do that, but I'll
> > think about it.
> >
> > All the best, Liam
> >
> > Liam J. Revell, Assistant Professor of Biology
> > University of Massachusetts Boston
> > web: http://faculty.umb.edu/liam.revell/
> > email: liam.rev...@umb.edu
> > blog: http://blog.phytools.org
> >
> > On 12/23/2013 10:15 PM, Eduardo Ascarrunz wrote:
> >
> >> Hello everyone,
> >>
> >> Newbie here. I'm looking for a way to generate random trees of N tips,
> >> allowing multifurcations. My N would be >12, so it wouldn't be practical
> >> (nor possible) to work with all the possible trees (cf. allFurcTrees).
> I'd
> >> be happy with a set of 1000 trees, sampled equiprobably. I'd much
> >> appreciate any help.
> >>
> >> Best to all,
> >>
> >> E.
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ___
> >> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >> Searchable archive at http://www.mail-archive.com/r-
> >> sig-ph...@r-project.org/
> >>
> >>
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>



-- 

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Transforming data for OU model

2013-11-28 Thread Joe Felsenstein


Ted Garland wrote:

Our programs are in DOS (What's DOS?  The last operating system  
that worked ...).
However, I think you can also do this in R somewhere, maybe  
phytools?  Liam Revell, want to jump in here?


Just to remind people: DOS (also called MSDOS) executables programs  
can be run in Windows using the Command Prompt tool, which you will  
find in the Accessories menu that is found in the menu

opened by the All Programs tab in the Start Menu.

Joe
----
Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] FW: PGLS vs lm

2013-07-30 Thread Joe Felsenstein

Luke Matthews --

> If there is specific concern about one trait consistently lagging behind 
> another (Joe's latest complaint on pgls), then there are techniques specific 
> for this.  Charlie Nunn has published one such technique for evolutionary 
> lag.  More generally, Joe's complaint that pgls assumes traits are at 
> adaptive equilibrium points to infer evolutionary adaptation is generic to 
> most evolutionary biology (including most applications of independent 
> contrasts, optimal foraging, sexual selection studies, etc.) and in no way 
> specific to pgls or comparative methods.  I'm not advocating traits always 
> are at equilibrium.  Indeed, some of my work on capuchin sexual behavior is a 
> bit heterodox because I've proposed what is going on is not at an equilibriu!
> m state.  But assuming that most of the time most traits are at something 
> close to adaptive equilibrium is generic to evolutionary biology.

I guess I'm not an evolutionary biologist then.  If an adaptive peak is moving 
around, I am unwilling to assume that the population always succeeds in staying 
on top of it.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Joe Felsenstein

Tom Schoenemann asked me:

> With respect to your crankiness, is this the paper by Hansen that you are 
> referring to?:
> 
> Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. 
> (2012). A phylogenetic comparative method for studying multivariate 
> adaptation. Journal of Theoretical Biology, 314(0), 204-215.
> 
> I wrote Bartoszek to see if I could get his R code to try the method 
> mentioned in there. If I can figure out how to apply it to my data, that will 
> be great. I agree that it is clearly a mistake to assume one variable is 
> responding evolutionarily only to the current value of the other (predictor 
> variables). 

I'm glad to hear that *somebody* here thinks it is a mistake (because it really 
is).  I keep mentioning it here, and Hansen has published extensively on it, 
but everyone keeps saying "Well, my friend used it, and he got tenure, so it 
must be OK". 

The paper I saw was this one:

Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the evolutionary 
regression: The interplay between observational and biological errors in 
phylogenetic comparative studies. Systematic Biology  61 (3): 413-425.  ISSN 
1063-5157.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Joe Felsenstein

If the "regressions" are being done in a model which implies 
that the two variables are multivariate normal, then we can 
simply estimate the parameters of that joint distribution, 
which are of course the two means and the three elements of the 
covariance matrix.

If we then test whether  Cov(X,Y) is different from zero, that 
should be equivalent to a test of significance of either 
regression.

/* crankiness on */
Note of course that most "phylogenetic" regressions are being 
done wrong: if they assume that Y responds to the current value 
of X, but when the value of Y may actually be the result of 
optimum selection which is affected by past values of X which 
we do not observe directly.

I've complained about this here in the past, to no avail,  
Thomas Hansen, in a recent paper, made the same point, with 
evidence too.
/* crankiness off */

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about measurement error in phylogenetic signal (Krzysztof Bartoszek)

2013-07-08 Thread Joe Felsenstein

In addition to the references to papers by Hansen and Bartoszek, and by Ives, 
Midford and Garland, I would biasedly suggest this paper:

Felsenstein, J. 2008. Comparative methods with sampling error and 
within-species variation: contrasts revisited and revised. American Naturalist 
171: 713-725.

The method estimates the within-species phenotypic variation (which, when you 
are analysing species means is the relevamt "measurement error" and also 
includes actual measurement error) and corrects for it.

The software announced there is not in R, but I believe that Liam Revell's  
phytools  package can call our program.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] dist.dna -- inconsistent behavior with ambiguous sequences

2013-05-02 Thread Joe Felsenstein

Emmanuel Paradis wrote:

> The current behaviour of dist.dna() is to ignore ambiguous nucleotides. The 
> difficulty is that comparisons with ambiguous bases may or may not be 
> informative depending on the model. For instance, the comparison "R-Y" is 
> clearly a transversion, while the example you give ("A-S") may be either a 
> transition or a transversion, so it is informative for the JC69 model but not 
> for K80. Maybe some weighting scheme could be used for this problem. If 
> someone is aware of some systematic treatment of this issue, please let me 
> know.
> 
> With real data, it is wise to check the quality of the alignment, for 
> instance with base.freq(, all = TRUE) or image().

If you view the two-species distance as an ML estimate of the length of a 
two-species tree, then one can make use of RY ambiguities (and others).  That 
is the best and most fundamental method.  I used it in Dnadist in PHYLIP.  That 
approach does (effectively) drop any site that has a total ambiguity such as an 
N nucleotide, but all other ambiguity codes do affect the distance.

It is slower but worth it.  It does *not* mean that when one species has A and 
the other has A/G that you just count a match of 1/2, as for a small distance 
most of the likelihood is contributed by the A possibility.  For longer 
branches the contributions of the two terms are more nearly equal.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic multiple correspondence analysis

2013-04-25 Thread Joe Felsenstein

Danny Rojas wrote:

> Anybody know if there is an alternative to the phylogenetic principal
> component analysis of Revell (2009; Evolution 63-12: 3258–3268), let's say
> a phylogenetic multiple correspondence analysis, to analyze non-continuous
> variables? I would appreciate any suggestion.

Use my program Threshml (sorry, it's not written in R) which models 0/1
variables using Sewall Wright's (1934) "threshold model", and can infer 
principal
components of the underlying continuous variables. It can also mix
continuous and 0/1 variables.

Felsenstein, J. 2012. A comparative method for both discrete and continuous
characters using the threshold model. American Naturalist 179: 145-156

General packages such a MCMCGlimm can (probably) be made to do
the equivalent, with some torture.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about analysing trait differences

2013-04-22 Thread Joe Felsenstein


I wrote:


Thus if we find (chimp, bonobo) to be the "cherry" we
remove them, leaving a tree such as

(macacque, (gibbon, (orang, gorilla)));

so now (orang, gorilla)  is a cherry.


but should have written

...leaving a tree such as

((macacque, (gibbon, (orang, (gorilla, human;

so now (gorilla, human) is a cherry.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about analysing trait differences

2013-04-22 Thread Joe Felsenstein


Brian O'Meara wrote:

The problem is reconstructing overlap down the tree (it's possible  
to do,
but whether it's possible to do well is another question). One  
thing you
could do to avoid this is to use the *other* method from  
Felsenstein's 1985
independent contrasts paper, taking pairs of species independent of  
other

pairs. A way to do this:

1) Get a clade of size two (aka a "cherry" sensu Semple and Steel).  
Every

tree has at least one.
2) Difference in overlap and diference in mass for the pair of taxa  
making

up this clade becomes one point (perhaps standardize for time, and
positivize it such that you subtract them in the order that leaves  
the mass

difference positive).
3) Prune these two taxa off the tree.
4) Go back to step 1. When you're done, assuming no polytomies,  
you'll have
zero or one species left. [it's possible to do with polytomies,  
too: if
assumed to be hard, then just do a pair of taxa from a terminal  
polytomy
and leave the rest for the next step. If assumed to be soft, then  
just take

two from a terminal polytomy and discard the remaining taxa]

Contrasts gives you Ntax - 1 contrasts. This gives you floor(Ntax/ 
2) but

avoids having to do reconstructions. I have code around to do this but
can't find it at the moment, but it's not hard to write -- post  
again if

you need help with it and no better ideas have been proposed.


Note that this method (which goes all the way back to Salisbury
in 1942 as mentioned in my 2004 book) requires that in
step 3 "prune" does not mean using the "pruning" algorithm
of likelihood evaluation.  It means removing these two tips
(and their shared most recent common ancestor), leaving
behind nothing. So not calculating and adding to the
tree any phenotypes for the most recent common ancestor.

Thus if we find (chimp, bonobo) to be the "cherry" we
remove them, leaving a tree such as

(macacque, (gibbon, (orang, gorilla)));

so now (orang, gorilla)  is a cherry.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] statistically test whether two characters evolve dependently

2013-04-09 Thread Joe Felsenstein

New Bio wrote:

> Thanks, Liam. good to know. I think extending the tools to analyze traits
> of ordered/unordered multistates can be very useful. There are many
> interesting traits such as oxygent requirement (anaerobic, facultative,
> aerobic) of microbes, which is ordered multistates, and habitats (water,
> air, soil), which is unordered.


For approaches like the threshold model, it is not simple to see how to
handle multistate characters. Should we assume one scale? If so, how
far from the 0/1 threshold should we place the 1/2 threshold? That
becomes an additional parameter to be estimated.

Or should we have an additional axis, so that 0/1 is on the  x  axis,
while  [01]/2 is on another axis?  And then what do we do about
state 3 if it also exists? That way lies madness ... or perhaps a
good Ph.D. thesis topic.

(This is in some way related to Transformation Series Analysis,
which was an issue with parsimony methods. One imagined one's
states arranged in a character "transformation series" which could
even be a tree, the Character State Tree. Then one wanted to
infer the phylogeny and at the same time also the CST, using
parsimony as the criterion.  In a sense what I am raising is the
likelihood and modeling equivalent problem.)

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Seeking list of nucleotide substitution models

2013-04-06 Thread Joe Felsenstein

Emmanuel Paradis answered Daniel Barker's inquiry:

>> Is there a review or list of ~every specific nucleotide substitution model
>> that has been proposed or used in the literature (with references)?

> Hi Daniel,
> 
> Have you looked at the help page of dist.dna? The list of references there is 
> focused on calculating distances but some are general. There are a few more 
> references in my book too. I suggest you also look at "Inferring Phylogenies".

where you will find, in the section on "Other distances" in Chapter 13  that 
Andrey Zharkikh had a very good paper in 1994 covering many of the distances 
that had been invented up to that date, and explaining relationships between 
them:

Zharkikh, A. 1994. Estimation of evolutionary
distances between nucleotide sequences.
Journal of Molecular Evolution 39: 315.329

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] combining p-values in PCMs

2013-04-01 Thread Joe Felsenstein

John D asked:

> Given that things have been quiet in the list lately, I think this
> could be a good time for me to ask your opinion about this issue.
> 
> Imagine that I run a PIC analysis on two traits using 1000
> post-burn-in trees. What would be the best way to summarize these
> results? Average p-values across all analyses? Perhaps a specific
> method to combine the resulting probabilities e.g. Fisher's test?

If you want to know about a regression coefficient, or other 
parameter, the 1000 inferred values in the 1000 trees can be 
taken as a posterior distribution if the trees come from a 
Bayesian inference (I am assuming from your description that 
they do).

If you want to test whether, say, the regression coefficient is 
positive, you can take this posterior distribution and simply 
ask whether the credible interval for the parameter includes 
zero. In this case that interval would be the upper 95% of the 
sample values.

Bringing in P values or Fisher's test just mixes Bayesian with 
frequentist or likelihood methods, which will get two groups of 
people annoyed at you, whereas using only one of these methods 
has the advantage of irritating only one group of us.


I note that John's email address is "dobzhanski". Good to hear 
from you again, Professor Dobzhansky. You may recall that you 
gave me a tour of your lab at Rockefeller University in the 
summer of 1963 when Dick Lewontin sent me from Rochester to 
pick up some data paperwork for him for a joint project the two 
of you were working on.

(Okay, the last part is "Poisson d'Avrile")


Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Ancestral state estimates of continuous traits

2013-03-18 Thread Joe Felsenstein

Emmanuel Paradis wrote:

> The new default for ace will be to use REML for continuous 
> characters and ML for discrete ones.

after Ted Garland wrote:

>>  Yes, we have always used REML for this!

(Emmanuel wrote: )

>>>   You can also try method="REML" which gives much better 
>>> estimates of sigma^2 than ML. I think I'll make it the 
>>> default for continuous characters.

** bragging mode on **

Glad to see you have all caught up to this paper:

Felsenstein, J. 1973. Maximum likelihood estimation
  of evolutionary trees from continuous characters.
  American Journal of Human Genetics 25: 471-492.

(which can be accessed at that journal free of charge)

** bragging mode off **

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] testing for correlates of rates of evolution

2013-03-15 Thread Joe Felsenstein

Rob Lanfear wrote:

> In particular, Liam's using squared contrasts in y, so that's 
> asking whether the absolute size of changes in y depends on  
> x at the ancestral node. I might have missed something here, 
> but that sounds very similar in principle to Freckleton's 
> test of whether the variance of trait differences is 
> unrelated to their absolute values [1], except that the 
> latter looks for correlations between the absolute value of 
> differences in x versus x at the ancestral node. It might be 
> useful to consult that paper [1] to get some more ideas for 
> how to interpret those kinds of results.

If there is proportionality between standard deviation of 
change and the value of the character, that is essentially 
saying that the log(x) changes at rate independent of its 
value.  Similarly, if the variance of change is proportional 
to the value of the character, that is essentially the same 
as saying that the square root of the character changes at a 
rate independent of its value.

See this Wikipedia page:

http://en.wikipedia.org/wiki/Variance-stabilizing_transformation

Perhaps the whole test can be done by considering different
transforms of the data (there are parameterized families of them)
and use the likelihood to test values of the transform parameters (with
appropriate correction of the likelihood by having a Jacobian term).

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] From ClustalW2 Tree to Heat Map in R

2013-01-10 Thread Joe Felsenstein

Klaus Schliep wrote:

> There is quite some irony that in phylogenetic reconstruction often
> non-ultrametric methods are preferred, even though the time to the
> last common ancestor (LCA) should be for each extend species the same.
> However other fields use heavily ultrametric methods (hclust) even
> there exist no evident equivalent property like the LCA.

The irony, such as it is, is due to the fact that we can only see amounts of 
difference, not amounts of time, and, darn it, different organisms have 
different biologies so they change at different rates.  Which makes biology 
more interesting but makes molecular change less clocklike.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-22 Thread Joe Felsenstein

Emmanuel Paradis wrote:

> If you are Bayesian, the trees sampled from an MCMC are here for estimation 
> including of the branch lengths, so you use them to compute some sort of 
> consensus topology as well as its branch lengths. So it makes sense that 
> MrBayes can do a consensus tree with branch lengths.


I endorse the rest of Emmanuel's advice but let me quibble with this one.  The 
posterior on trees may not consist mostly of trees varying around a single 
consensus.  If the posterior had, for example, two modes, each centered around 
a different tree, a single consensus tree might not be appropriate, and branch 
lengths computed by averaging lengths over the two modes might not be a good 
guide to what the trees in the posterior looked like.  I don't know enough 
about MrBayes features to know whether they have some way around this.

There is a similar issue with parsimony methods -- the set of most parsimonious 
trees may have a consensus, which may well not be a most parsimonious tree. 
People who see the consensus of most parsimonious trees may not realize that 
the particular tree they are looking at is not most parsimonious.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-21 Thread Joe Felsenstein

Daniel Barker wrote:

> What should branch lengths on a consensus tree represent?
> 
> Scott Chamberlain had written:
> 
>> When making a consensus tree using ape::consensus the branch lengths are
>> lost. Is there a way to not lose the branch lengths? Or to add them
>> somehow
>> to the consensus tree after making it.

The issue of what branch lengths ought to be on a consensus tree is not simple. 
 If we have three rooted trees:
((A:1,B:1):1,C:2);
((A:1,B:1):1,C:2);
(A:2,(B:1,C:1):1);

the consensus tree should be the first tree, but what branch length should be 
used for (say) the branch ancestral to the AB clade?  1?   0.667?

The minute you open this can of worms it becomes clear that the answer depends 
on what you want that number to convey and what interpretations your audience 
will tend to draw from the number.  There is no "obvious" answer. So this is 
not a mere technical computing question.

By the way, in my 2004 book, you will find me agonizing about this on page 526, 
coming down on the side of 0.667, but not overwhelmingly convincingly.  You 
could argue that a branch length should be set 0 when the branch is not there, 
and all the resulting values averaged, or you could argue that the average 
should only be taken over those trees for which that branch is present.

One possible way to solve the problem is to take the consensus tree as if it 
were a user-defined tree, use your whole data set, and infer branch lengths on 
that tree.  Daniel has already expressed his legitimate concerns in such a case 
as to whether it takes (for example) trifurcations as if they were real rather 
than an expression of our uncertainty.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Cluster tips into OTUs

2012-11-15 Thread Joe Felsenstein

Miguel Verdu --

> I want to perform a mothur-like OTU-based approach, but based on phylogenetic 
> instead of DNA sequence distances. Is there any way to cluster tips of a tree 
> into OTUs determined by a threshold of phylogenetic distances?. In other 
> words, I want to collapse all the tips connected with distances less than X 
> into the same OTUs.

I don't know which R program will do this (probably just a one-line "if" 
command) but the obvious method would be to take the distance matrix and reduce 
all the elements of it that are less than X to 0.000. Then run any distance 
method.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Comparative analyse and heritability

2012-09-06 Thread Joe Felsenstein

Tristan Lefebure wrote:

> We are struggling with a comparative analysis that involves 2 traits:
> - trait #1 is heritable (genome size) and clearly has a strong phylogenetic 
> signal
> - trait #2 is non heritable (time of colonisation of a new habitat, that 
> convergently happened on the terminal branches), calculating contrasts on 
> this 
> trait would be meaningless.
> 
> If one wants to test the link between these two variables, how would you do 
> this?
> 
> We thought about using two different phylogenies in a phylogenetic 
> independent 
> contrast analysis: the species tree phylogeny for trait 1, and a star tree 
> for 
> trait 2. But how do you deal with a unique internal node? 

The time it takes to colonize a new habitat seems to me to
be a function of various underlying variables.  Those may
have changed along the tree, and not necessarily very
quickly.  So it seems that an ordinary comparative
methods analysis is justified.

It is very similar to having a character such as time to
escape a predator.  The fact that it does not take millions
of years to escape one attack is really irrelevant.

Joe`

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] testing binomial characters

2012-08-29 Thread Joe Felsenstein

Theodore Garland Jr wrote:

> Check this:
> 
> Ives, A. R., and T. Garland, Jr. 2010. Phylogenetic logistic regression for 
> binary dependent variables. Systematic Biology 59:9-26.

In addition, check this:

Felsenstein, J.  2012. A comparative method for both discrete and continuous 
characters using the threshold model. American Naturalist 179: 145-156. 

Joe

Joe Felsenstein  j...@gs.washington.edu
 Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, 
Seattle Wa 98195-5065

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-18 Thread Joe Felsenstein

Daniel Barker wrote:

> On ancestral state reconstructions. I've recently started using Ziheng
> Yang's terminology - of referring to reconstructions, derived from ML
> transition rates and equilibrium frequencies, as 'empirical Bayes'
> reconstructions. I believe this to be the most useful way to describe
> these methods.
> 
> My question. Were does empirical Bayes stand, w.r.t. the Likelihood
> Principle?
> 
> Empirical Bayes seems beyond the scope of the Likelihood Principle, or in
> violation of it. The biological hypotheses (here, of ancestral state) are
> not expressed as parameters of the model, so the relative support is not
> judged by a likelihood ratio. On the other hand, by only using the data at
> hand, empirical Bayes does comply with 'the irrelevance of outcomes not
> actually observed'.
> 
> If anyone can provide or point to some thoughts on this, I'd be very
> grateful - for ancestral states or in general.

It has long been recognized -- since Anthony Edwards's 1970 paper and
Elizabeth Thompson's brilliant 1975 thesis and book --
that the interior node reconstructions are not, strictly speaking, MLEs.
They are maximum posterior probability estimates instead.  The root
ancestral states can be considered MLEs if that state is one of the
parameters of the model.  If instead (as often done for DNA and
protein data) the root ancestral state is considered to be drawn from
the equilibrium distribution of the stochastic process, then they too are
MPP estimates.

I would only call them empirical Bayes estimators if one made an ML
estimate of some stuff (the whole tree topology, which sounds like an
extreme case) and then, assuming the correctness of that estimate,
the ancestral states are inferred by being Bayesian.   In that case
probably the only thing that would be taken to have a prior on it
would be the ancestral state.  Then you could take the MPP as
the modal estimate from the posterior.

So, if I have understood his point correctly, Ziheng is formally correct, 
but it is a case where one is not being very Bayesian.  For my money,
you are a Bayesian not just if you use a prior, but if you are willing to
use a controversial prior.   And in this case the prior is pretty
uncontroversial.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Joe Felsenstein


Mark Holder wrote:


With respect to Jarrod's simulations, I have a few thoughts:
	1. I don't understand the claim (in the original email) that "its  
fairly straightforward to prove that asymmetric transition rates  
cannot be identified using data collected on the tips of a  
phylogeny" It seems like this is something that is routinely done  
in phylogenetics, and proofs of identifiability of GTR exist  
(demonstrating that  this indeed feasible and not some  
computational artifact).


I agree.  This whole discussion may have started from a "fact" that  
is *not* "fairly straightforward to prove".   In which case it is not  
necessary to bring speciation rates or priors into it.


A reversible two-state model should be able to have its parameters  
estimated on a given tree, clocklike or not.


Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] simulate traits evolution in correlated with body mass

2012-07-24 Thread Joe Felsenstein

Liam Revell wrote:

> library(phytools)
> y<-r*x+sqrt(1-r^2)*fastBM(tree,sig2=mean(pic(x,tree)^2))

> This should give you the correlation r on average and the y|x should be 
> Brownian. (If x is Brownian, then both y & x & y|x will be too.)

If  y  is evolving in response to  x, and  x  is changing too, then wouldn't 
the expectation of  y  be a value that is predicted, not by the current  x,  
but to some extent also from past  x's?

One would want some much more specific model.

*** whining on ***
This is part of my stream of oft-repeated, widely-ignored complaints that one 
ought not regress present-day  y's  on present-day  x's.  Lots of people are 
doing it -- and they're all wrong.
*** whining off ***

Joe

Joe Felsenstein  j...@gs.washington.edu
 Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, 
Seattle Wa 98195-5065

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Joe Felsenstein

Ben Bolker --

>> Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
>> as the AIC.  The LRT tells us that the expectation 
> 
>  ... under the null hypothesis where M0 (the simpler model) is true?

Yup.  I'm considering that case to see how the AIC fits in with the LRT.
Of course the AIC is proposed for a much wider range of cases.

>> of   
>> 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
>> Chi-Square with that number of degrees of freedom).
>> 
>> But the AIC tells us that the expectation is  2(d1 - d0).
> 
>  Maybe I'm missing something, but I don't see how the AIC tells us
> something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
> the expectation of the Kullback-Leibler distance, which is something
> like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
> outcomes and q(i) is the predicted distribution of outcomes ... so it's
> something more like a marginal log-likelihood difference rather than a
> maximum log-likelihood difference ...

Well, the AIC ends up with comparing   -2 log(L) + 2d  for the two
hypotheses.   The difference of these for models  M1 and M0
is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I
missed something here?  So the expectation of the difference
is log likelihood  *is*  described by the AIC, right?   And isn't it
(in view of Fisher's distribution) wrong too?   That is what
disturbs me and makes me feel there is something I don't
understand about the AIC argument.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-14 Thread Joe Felsenstein

Carl Boettiger wrote:

> Others on the list can weigh in with more authority, but perhaps this will
> get the discussion started.

Yes, it's important to know whether the parameters are nested, and
the issue of being at the end of a parameter range is serious.

> Recall that AIC values are a fequentist statistic: and they obey the very
> same same distribution as the likelihood ratio, (recall it is a difference
> log likelihoods, just shifted by the difference in the number of parameters
> (e.g. -2 [ log L1 -  log L0 - (k1 - k0)]).  Recall that the maximum
> likelihood estimate (MLE) is a biased estimate of the likelihood of your
> data and that AIC penalty is simply creating an asymptotically unbiased**
> estimator of the true model likelihood, which is a frequentist concept to
> begin with.  Why we report confidence intervals/p-values in the case of one
> of these statistics but not the other is not obvious to me either.

I will confess my relative ignorance of AIC issues (my phylogeny book
has a simple, elegant, and clear explanation -- which I wrote in a hurry 
while excited that I finally understood this, and which turns out to
make no sense whatsoever and should be firmly ignored by all).

But I do know this: If we have the likelihood ratio  R = L(p')/L(p)  where
p'  is the ML parameter values and  p  is the true parameter values,
and where  p  is in the interior of the set of possible parameters,
then RA Fisher showed about 1922 that asymptotically with large
amounts of data:

2 log(R)   is distributed as  chi-square with  D  degrees of freedom,
where  D  is the difference of the number of parameters being 
estimated in  p' and the number of parameters being estimated in
p.   Now we know that the expectation of that chi-squared variable
is  D.   So to correct the bias in  R   we should subtract  D.   That sounds
like what Carl is explaining too.

It sounds like a very simple and clear explanation of the AIC.  Unfortunately
that subtraction is *not* what AIC does.   It subtracts  2D.   The reason
it does so is unclear to me.  It involves some kind of prior on models,
I think.  As far as I am concerned it is "like the peace of god", in that it
"passeth human understanding".

Maybe the experts here can give me a simple explanation.  Otherwise
maybe we should honor Fisher (not me) and only subtract  D, and call the
result the FIC,  But that works only for nested hypotheses, and the main
point of the AIC is to deal with non-nested hypotheses.  To make matters
worse, in my field the AIC has the reputation of too easily favoring the
most complex hypothesis, so maybe we should be subtracting more than
2D, not less.

Clueless in Seattle.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] are partition of a tree statistically independent?

2012-06-08 Thread Joe Felsenstein

in response to Inti Pedroso's original question:

>> Let say we have a phylogenetic tree and we generate a partition on the tree, 
>> i.e., split the in two by "cutting" at an internal node. Is the evolutionary 
>> process in the two partition independent? If that is the case it means that 
>> p(T_1,T_2) = p(T_1)*p(T_2) with p = probability and T_1 and T_2 the tree 
>> partitions 1 and 2.


Arne Mooer wrote:

>> It depends on what process you are considering, but this is usually an 
>> assumption folks make for, e.g. simple trait evolution or diversification.  
>> For any autocorrelated process (e.g. some common relaxed clock models), it 
>> will depend on whether your splits are or are not nested on the directed 
>> (rooted) tree.


I disagree.  If we have the assumption of a Markov process, when we divide the 
tree into two partitions at a point   p0,  then the likelihood of the full tree 
is *not*  P(X|T) =  P(X1|T1) P(X2|T2).   What is true is that if we knew the 
state of the process at p0, and call that  X0, then if the Markov process is 
reversible,

P(X|T)  =  P(X0) P(X1|X0, T1) P(X2|X0, T2)

where P(X0) is the prior probability of the process being in state X0.

We typically do not know X0 so one must sum over all values of X0:

P(X|T)  =   \sum_{X0}  P(X0) P(X1|X0, T1) P(X2|X0, T2)

where  \sum_(X0}  is summation over all  X0.


This conditional independence given X0 is a basic assumption used in the 
statistical field of graphical model, and phylogeny models are graphical 
models..

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Using AIC/AICc/BIC to select trait a model of trait evolution

2012-05-02 Thread Joe Felsenstein

Matt Pennell wrote:

> My question is: how many observations do we have when we compare trait
> evolutionary models? People tend to use the number of tips of taxa for
> which we have trait values. However, this may not be technically  
> accurate.
> First, of course, both the branch lengths and the tip values factor  
> into
> the likelihood equations so it seems sensible that these are both  
> somehow
> included as observations. Second, the trait values we observe are  
> of course
> not independent (that is the whole reason we are using a phylogeny  
> in the
> first place!!). It is unclear whether/how this fact should factor  
> into our
> calculation of the n. I know that it phylogenetics, when people do  
> model
> selection for the model of sequence evolution, they use the number  
> of sites
> in the alignment though i am not sure there is a clear  
> justification for
> this either.


On just one of these points:  number of sites in a molecular  
alignment is completely justified as the number to use in computing  
the number of data points, if the evolution at the sites is  
independent (i.e. independent, given the true phylogeny).   In that  
case each column of the data matrix is drawn i.i.d. from a distribution.

Of course, even correlations of rates of evolution among neighboring  
sites violates this.

Whether and how number of species comes in is trickier.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-14 Thread Joe Felsenstein

Liam Revell wrote:

> > Poaching Intensity = beta0 + beta1*Body Size + e
>
> I think it depends on how the residual error in the model is  
> distributed (esp. correlated) among species.  It seems possible to  
> invent hypothetical scenarios (as I did in my previous email) about  
> how the residual error in poaching intensity given body size could  
> be phylogenetically autocorrelated, but this is fundamentally an  
> empirical question.  If the residual error of poaching intensity  
> given body size is phylogenetically correlated and we ignore this  
> then we risk overestimating the predictive value of our model.
>
> In addition, the residual error is likely/guaranteed to be non- 
> Brownian if the response variable is binary (e.g., extant v.  
> extinct).  For these type of data the tree should not be ignored,  
> but simple GLS regression is probably not appropriate.  One option  
> might be the phylogenetic logistic regression of Ives & Garland  
> (2009), but I'm not too familiar with this method.

The real problem would come if the characters evolved to respond to  
the poaching intensity, and that evolution was inherited along the  
tree.  Then our uncertainty about the poaching intensity in the past  
would be a big problem.

But if poaching is only  a present-day phenomenon, it would  
(presumably) respond to only today's characters, and they would not  
yet have responded to it, so there would be no problem.

But I do think it is a Big Mistake (and apparently a frequent one),  
when people measure the residual of a character regressed on  
environmental variables,
then casually assume that it is undergoing Brownian Motion, when the  
environmental variables may have been different in the past.   That's  
probably not an issue here.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-10 Thread Joe Felsenstein
Liam wrote:

> I'm not sure I entirely agree that we need to assume that the environmental 
> trait is evolving on the tree by Brownian motion.  I believe that so long as 
> Y|X (in David's example, growth rate given habitat degradation) is evolving 
> by Brownian motion, we should be OK to use PIC regression.

I worry.  Why are we to assume that current phenotypes are distributed around 
optima that are based on  *current* environmental variables?

Joe
---
Joe Felsenstein, j...@gs.washington.edu
Department of Genome Sciences and Department of Biology
University of Washington
Box 355065
Seattle WA 98195-5065

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-10 Thread Joe Felsenstein

David Bapst wrote --

> Let's say that there exists a worker who is measuring several different
> traits across a number of species and then testing for correlations among
> these traits. The first test is body size versus growth rate and they use
> independent contrasts or PGLS to test for a the correlation, accounting for
> phylogeny. Both of these traits are inherited, evolving variables. Now
> let's say they'd like to test for the relationship between growth rate and
> some metric of the anthropogenic degradation of that species' habitat. Now
> what? It is even valid to apply PIC to the habitat degradation metric even
> though it is not an inherited, evolving trait? It's unclear to me.

...

> One explanation I know of is that when we apply phylogenetic comparative
> methods to these quasi-traits to consider their relationship to another
> trait, we are assuming that these variables are actually the result of some
> underlying, unobserved set of traits which are evolving along the
> phylogeny.

I don't have an easy answer to this: it is a serious issue.  One is assuming 
that the environmental variable is evolving along the phylogeny according to a 
Browian motion process.   This may well not be a reasonable assumption.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Is correlation of PICs, with tip data and each terminal node split into male and female, a valid method?

2011-11-05 Thread Joe Felsenstein

Anthony Ives wrote:

> Just a quick comment on your response to Dan.  You can use REML to compare 
> models, but only to compare the "variance" components and only if they have 
> the same "mean" components (e.g., contain the same predictor variables).  If 
> there are differences in the mean components, you need to use ML, because 
> (very loosely speaking) REML is based on estimating the variances conditional 
> on the mean.  I have come across examples where the results using ML and REML 
> (accidentally) are quite different, with ML being correct.

This is a good point.  But in many cases a REML analysis drops only the grand 
mean.  In effect it is ML, after passing the data through a "filter" which 
replaces it by the differences
of each point from the grand mean for that character.

If the grand mean is not of interest, REML should work as well as ML.  If we 
are trying to test some hypothesis about the value of the grand mean, of course 
REML is inappropriate.

An example would be simple one-way ANOVA which is actually a REML analysis, and 
it does test means of the groups.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Is correlation of PICs, with tip data and each terminal node split into male and female, a valid method?

2011-11-04 Thread Joe Felsenstein

Dan Rabosky wrote:

> A quick addendum to this thread (and I think Ted and others have  
> discussed this previously) is that, to compare models in this  
> fashion, you want to make sure you are fitting the models with  
> maximum likelihood and not restricted maximum likelihood (REML).  
> THis is not the default for the gls function from nlme, so to  
> obtain comparable likelihoods, you want
>
> myModel <- gls(, method = 'ML')
>
> or something similar.

I  haven't been following the details of this discussion, but let me  
just add that (as a fan of REML) that for most tests you can do ML  
for two models and compare
the likelihoods, or you can do REML for two models and compare those  
likelihoods.  The results will usually not be very different.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Show Informative Sites?

2011-09-22 Thread Joe Felsenstein


 Leandro Jones wrote:


I just wanted to clarify this so that the outbursts of probabilistic
fundamentalism don't confuse the boys that started this post.


On my part I was just wanting to ensure that parsimony fundamentalism  
did not confuse people and get in the way of a reasonable statistical  
analysis.


Joe
----
Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Show Informative Sites?

2011-09-21 Thread Joe Felsenstein

Leandro Jones --

> Nick's rules won't allways work in a Parsimony context. For example, a
> position like this one:
> 1 A
> 2 A
> 3 T
> 4 C
> would be "informative" under rules (a) and (b), but it is in reality
> uninformative, as any of the possible trees have a length of 2. Thus,
> this character tells us nothing about _phylogeny_.

I disagree.  If the only way you can interpret anything is by parsimony,
sure.  But for statistical phylogenetics, it has information.  It works
against any phylogeny that has all its branch lengths short, for example.

Eliminating that character, not telling the statistical method you did
that, and then going ahead with the analysis is a Big Mistake.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Show Informative Sites?

2011-09-17 Thread Joe Felsenstein

Nick Matzke wrote:

> All sites are informative under likelihood,

... and under Bayesian methods, and under distance matrix methods too.  If you 
leave out "uninformative" sites without compensating for doing that by an 
explicit statistical correction, you will have some very unpleasant surprises.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Geiger's phy.manova returns same p-value for different data

2011-08-16 Thread Joe Felsenstein


Luke Harmon wrote:

That represents the p-value you get when the test statistic for the  
data is more extreme than any of the 1000 (default) simulations.  
It's exactly 1/1001.


You can probably get a lower p-value if you run more simulations!


A good argument for making the default 999 simulations.  It was  
pointed out to me years ago by the late Zygmund "Bill" Birnbaum, who  
wrote a paper on it, that in a simulation test, under the null  
hypothesis, the data too comes from the null hypothesis and therefore  
when you want to have 1000 reps you take the data plus 999 simulated  
cases.


Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: Re: R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

Pas said:

> What I', trying to do now is writing a R routine to back-calculate  
> the "expected" branch lengths for the "unusual" critters, given the  
> fitted ancestral values and tip values of the phenotypes, and  
> assuming BM, in order to compare the actual branch lengths to the  
> expected. The ratio of these lengths, if I'm not delusional and  
> definitely lucky, is a per-lineage rate of phenotypic evolution.  
> The bottom line is answering the question: how long should the  
> branch leading to that particular species be if it evolved at the  
> same rate of its sister species?

Good way to approach it.  If you can calculate the likelihood of  
trees, one way would be to not bother fitting any ancestral values:  
just try different lengths for the branch that connects the fossil to  
the tree, and see which one maximizes the likelihood.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

Folks --

I was intending my most recent message to be apologetic --
that I was perhaps overreactive.  Certainly Pas has not
raised unreasonable objections or been obstructive with
my grants! (Others have).

Let me raise an issue so I understand him more clearly:
Pas, are you saying that you see phenotypes in the fossils
that seem incompatible with the Brownian Motion assumption?

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

David Bapst wrote:

> I think Joe is right that realizing a model is an inaccurate or
> imprecise description of reality should impel us to develop better
> models of the world around us, because this partly how science moves
> forward. However, I don't think pointing out that a model is deficient
> requires that that person must themselves develop an alternative.
> After all, an alternative model that capture a more realistic level of
> complexity may not be possible in some situations (it is certainly
> possible in trait evolution models, however.) Requiring such a thing
> would put too much pressure on scientific whistle-blowers, who play a
> very important role in reminding the rest of us that the world is more
> than the models we use to understand it and make our predictions.

As a theoretician, I am perhaps oversensitive to the folks who, after
a lecture in which I present a simple model, triumphantly declare
"but you didn't include predator satiation".  Then they walk away
looking very pleased with themselves.

There is a similar problem with the quibblers who inhabit grant
review panels, and are always asking me to do much more
complicated models that are impossibly hard (and they
are not aware how hard they are).

Just understand, when you raise legitimate concerns, that us
model-analyzers are also used to getting a lot of these unreasonable
demands too, and may be grumpy as a result.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] R: Re: R: ancestral state reconstruction for tips

2011-08-05 Thread Joe Felsenstein

Pasquale Raia said:

> Of course Ted is right, but my problem with this computation, or  
> with the
> simple exercise I was proposing is well another: as a  
> paleontologist I often
> come across pretty exceptional phenotypes (dwarf hippos and  
> elephants, huge
> flightless birds, to make a few examples). When you use methods  
> like this (I
> mean Garland and Ives') and compare the output with those  
> phenotypes, as I did,
> you immediately realize what the the bottom line is: no matter if  
> they are
> nodes or tips, by using the expected (under BM) covariance the  
> estimated
> phenotypes are dull, perfectly reasonable but very different from  
> anything
> exceptional you may find yourself to work with. This is why I feel  
> it is
> difficult to rely on those (unobserved) values to begin with.

I think that what is being said is that Brownian Motion is too sedate  
a process
and does not predict some of the large changes actually seen in the  
fossil
record.

That's a legitimate point but does put the onus on the maker of the  
point to
propose some other stochastic process that is tractable and has these  
large
changes (and that fits with known Mendelian and Darwinian mechanisms).
Just complaining that the Brownian stochastic process is no good is  
insufficient.

If we want to add the fossils to the calculation, then they will of  
course
pressure the Brownian Motion process to change more in their vicinity,
which may help some.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] comparative methods at subspecies level

2011-07-24 Thread Joe Felsenstein

Paolo Piras wrote:

> I write you in order to ask a phylosophical opinion in
> applying phylogenetic comparative methods to
> subspecies belonging to the same species for which a
> phylogenetic/phylogeographic scenario is known on the
> basis of molecular studies. I have a number (>10) of
> population samples belonging to two species at
> continental scale. Some OTUs are surely NOT panmittic
> with the others while for some others this is not
> sure.  Someone could argument against the use of
> comparative methods when a gene flow is still present
> among OTUs. However, I think that, if possible,  AT
> LEAST the covariance due to populations relationships
> should be removed. You dont think that it is always
> better to take in account all possible relatedness 
> betwen OTUs? In this paper
> 
> D. C. BLACKBURN,G. J. MEASEY. 2009.Dispersal to or
> from an African biodiversity hotspot? Molecular
> Ecology; Volume 18, Issue 9, pages 1904–1915, May
> 2009.
> 
> the authors apply common comparative methods to a
> phylogeography of populations belonging to ONE
> species. I found their strategy appropriate. I wrote
> to ask your opinion about this issue.


Using standard tree-based methods to treat a single species which has 
populations that exchange migrants is inappropriate.  In a 2002 paper I 
discussed how to do this properly using contrasts which are derived from the 
eigenvectors and eigenvalues of (an estimate of) the migration matrix between 
populations.  Such migration matrices can now be inferred using programs LAMARC 
and Migrate.

The paper is here:
Felsenstein, J.  2002. Contrasts for a within-species comparative method.
  pp.  118-129 in "Modern Developments in Theoretical Population Genetics",
  ed. M. Slatkin and M. Veuille.  Oxford University Press, Oxford.
Here
   http://evolution.gs.washington.edu/papers/spectrum/
is where you can get a PDF of a preprint version of it.

The method is univariate at the moment but a multivariate version is under 
development -- the univariate version can be used to see whether a character 
has significant covariation with an environmental measurement.   

The method is also described in a recent paper by Stone, Nee and me:

Stone, G. N., S. Nee, and J. Felsenstein. 2011. Controlling for 
non-independence in comparative analysis of patterns across populations within 
species. Philosophical Transactions of the Royal Society B 366 (1569): 
1410-1424.

which discusses the general issue.  The (other) authors preferred, for greater 
comprehensibility, to describe my method slightly incorrectly (I got outvoted).

Anyone considering this issue should consider these two papers carefully.   Of 
course mixing within- and between-species analyses is more difficult yet.   I 
hope to release an R package later this year to do the one-character analysis 
(it is not too hard to put one together yourself in the meantime).

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread Joe Felsenstein


Paolo Piras wrote --


Citing
http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
Using Felsenstein's (1985) phylogenetic independent
contrasts (pic); This is also a Brownian-motion based
estimator, but it only takes descendants of each node
into account in reconstructing the state at that node.
More basal nodes are ignored.

I  THINK that, on the opposite, more basal nodes are
NOT ignored in gls and for this reason results can
differ slightly
I'm wrong?


The contrast algorithm if continued to the root,
makes the correct ancestral reconstruction for the root.
You are correct that values for higher nodes in the
tree are not the correct ML reconstruction for those
nodes.  If the tree is rerooted at any interior node
and the algorithm used for that, then that node's
reconstruction will be correct.  There are ways of
re-using information so that the total effort of doing this
for all interior nodes will be no worse than about twice
that of a single pass through the tree.

However people may prefer to use PGLS, which if
properly done should give the proper estimates for
all nodes.  There is some discussion of this in
Rohlf's 2001 paper in Evolution.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] comparative analysis using multiple regression of contrasts?

2011-05-25 Thread Joe Felsenstein


Not only can't I understand the underlying model here, I can't type  
correctly either:   Julien Claude, not Clause.  If it's any excuse,   
S is next to D on my keyboard.


Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] comparative analysis using multiple regression of contrasts?

2011-05-25 Thread Joe Felsenstein


Folks --

Julien Clause wrote:

for type II sums of squares you are right, however, when there are  
multiple factors pics and gls usually still provide different F  
values and p-values even if you set the marginality stuff in the  
analysis of variance. (they are not much different, but i still  
wonder why results are the same with one explanatory variable but  
different when you consider several).


 As I see it, contrast analyses through the origin are not a so  
usual regressions since no intercept is estimated: they however  
result in similar output when only one explanatory variable is  
included. Although I did not investigate type I and type II error  
rate when the response was continuous and the explanatory variable  
was dummy, I still guess that there are still something to do for  
modifying ancestral character state reconstruction in the contrast  
analysis for the dummy variable and computing their contrast: it is  
hard to believe that the brownian model will apply to that dummy  
variable because the expected variance of such a character can not  
be properly gaussian, but certainly more following something that  
has to see with the binomial law and logit family link. I wonder if  
someone has worked in this direction for contrasts...if not, this  
is probably something interesting i would try to investigate.


I'm trying to understand what sort of model everyone is talking about  
here (not the details of how to do it in R but what process is assumed).


So the "independent" variables are all characters of these species,  
and they change along the tree by a process of multivariate Brownian  
motion (with, of course, covariation of their change due to genetic  
correlations and selective correlation?   If you have not yet heard  
of "selective correlation" please check chapter 24 of my book or  
review articles of mine going back to 1988).


OK.   Now what about the response variable (or variables)?  Is it  
assumed that they are also characters that covary with the other  
variables?  In which case, why treat them specially?  Why not just  
infer the mutual covariance matrix of phylogenetic change of all  
variables and use that?  Because in that model the means and  
covariances of the full set of characters are the sufficient  
statistics, and anything else interesting that you want to estimate  
must just be functions of them.


Or is it assumed that the response variable is not a character of the  
species?  That the error of its regression on the other variables is  
i.i.d.?


Regressing on reconstructed ancestral states seems not correct in any  
of these cases, as those are not observations and are a linear  
function of the observed characters at the tip of the tree, in which  
case one is just making life harder by regressing on that  
reconstruction rather than on those tip phenotypes.  But even that  
seems wrong if the response variable is a character of the species,  
as I just noted.


Someone please educate me.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Simulating BM data on phylogeny

2011-05-20 Thread Joe Felsenstein

Ted wrote --

> Based on zillions of BM simulations we have done with DOS PDSIMUL,  
> you should never see a shift in mean value (versus starting value  
> at root of tree), unless you are intentionally modeling a trend.   
> Something must be wrong.

in response to Dean Adams:

>> For these simulations I generate a tree (in this
> case a
>> perfectly-balanced tree) and simulate 100 data sets on the same
>> phylogeny using a particular initial BM rate parameter (sigma).
...
>> However, the most curious finding is that for all methods, as sigma
>> increases, so too does the mean trait value across the tips (and
> the
>> converse occurs as sigma decreases). This observation is curious to
> me,
>> as one should not see a predictable shift in the mean under
> Brownian
>> motion.


Is it possible that the simulation is doing Brownian Motion on some
other scale, such as a log scale?   If one does BM on the logs and then
looks at the original phenotype scale, you *would* expect that as the
variance among species increases, so does the mean of the species
means.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-09 Thread Joe Felsenstein


Emmanuel wrote:


Is it a problem with ties or with identical sequences? I guess you can
solve the latter easily (eg, using the haplotype function in  
pegas), and

this will solve the vast majority of ties. Other cases of ties will
certainly not result in such high bootstrap values (that's my  
intuition).


My intuition disagrees with this.  If several sequences are nearly
identical, but each differs from their consensus at  K  sites,
them if the tree-making algorithm does not randomise
addition order of species (or otherwise somehow
randomize the resolution of ties), there are likely
to be artificially high bootstrap values even with
nonzero branch lenghs.

1. Dropping identical sequences is a good thing to do, especially  
with a distance-based method.


One issue is that after bootstrap sampling, some
sequences that were not identical may become
identical.  So the tree-making method ought to
be able to handle identical sequences.

2. A high bootstrap support (or another form of support) associated  
with a zero-length branch is an indication that something's wrong  
there.


Again, it can be a problem even when the branch lengths
are nonzero.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-05 Thread Joe Felsenstein

Klaus Schliep wrote --

> it is not that surprising. NJ normally does not produce poytomies,
> just edge weights of length 0. How these are broken may depends from
> the input order (from labels in the distance matrix like in this
> implementation) or could be broken randomly.  I added some code below
> to highlight it.

When there are ties in the distance matrix, NJ and UPGMA can lead
to a situation where the ties get resolved in an arbitrary way
dependent only on the input order of species.

This can lead to excessive support for group AB in a situation
where there are three species ABC and their further resolution is
arbitrary.   See Backeljau et al. MBE 1996.  Farris et al. (Cladistics,
1996) pointed out that this can lead to strong support for AB that
is illusory.  (I commented on this on pages 168-169 of my phylogeny
book).

In PHYLIP we have put code in that, when multiple bootstrap
replicates are being analyzed, automatically defaults to randomizing
the order of species in the data set.  That solves the problem, and
is perhaps what PAUP* does too.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] simulating trees

2011-05-01 Thread Joe Felsenstein

Back on April 27 in the discussion of how to simulate a birth-death process
that starts T units of time ago (where that is not the first split time)

Luke Harmon wrote:
> That seems correct to me. The probability that the tree died out before the 
> first speciation event (which is excluded, as Liam says) is d/(b+d).

in reaction to Liam's statement that

> > In both cases, you get only trees that did not go extinct along the root 
> edge (that is, you exclude trees that went extinct before the first 
> speciation).

I think that any program doing the simulation necessarily conditions on
not losing all lineages before the present.  But that is different from
conditioning on the original lineage dying before it gets a chance to
speciate.  While that is  d/(b+d), we need to also exclude the more complicated 
case that also allows for the possibility that some speciations occurred,
but that all of those lineages then died before the present.

Kendall (Annals of Mathematical Statistics, 1948), in the original solution of 
the birth-death process gave that probability (see my phylogeny book, page 565 
which gives the opposite, the probability of the descendants of one species 
*not* dying out):

P(n = 0 | t)  =  (d*exp((b-d)*t) - d) / (b*exp((b-d)*t) - d)

Interestingly, Kendall (1948, p. 5) actually cites some earlier people who got 
these formulae.  I cited only Kendall, but should have looked at his paper 
rather than secondary sources, which cite only him.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


  1   2   >