Re: [R-sig-phylo] summarising a frequentist test over a Bayesian sample of trees

2011-03-11 Thread Carl Boettiger
Hi Rob,

Interesting question.  Others on this list can give a better answer, but
maybe I'll make a few comments and others will chime in.  Here's my
understanding.

Let me see if I got each of the approaches:

1) use the "best tree" for the estimate of the "observed" MPD.  Then
construct the null distribution by randomizing the tips and calculating MPD
for each tree (or randomly sampled trees) in the posterior distribution of
trees.

This makes the frequentist test that the best tree + true tips is not likely
to be a random sample from that null.  The null sounds appropriate, but the
observation doesn't sound like what you want, since you just want to make a
statement about true tips, and integrate over trees.

2) Not sure if I got this one: First they calculate the "observed" MPD over
all trees (rather than just the best).  It seems to me you would want to
average these estimates to get the "expected" MPD integrated over all trees,
and compare that the null as above.  Asking if 50% of these are significant
is equivalent if one assumes this distribution is symmetric, but it seems
like a strange choice.

At least using the expected value, this sounds better.  You also get an
uncertainty your observed MPD.

3) For each tree, calculate the p value by comparing MPD of the real data to
the random data.  Repeating over trees gives a distribution of p values.
You then discuss asking of 95% of these p-values are significant.  Instead,
I think you would rather ask if the expected/average p value of this
distribution is significant.  This sounds different from number 2, but I'm
not convinced that it is.  If you looked over the uncertainty on your
"observed" value in number 2 you also get a distribution of p-values, and I
believe these are the same.

I believe integrating over uncertainty isn't distinctly Bayesian or
Frequentist, the philosophy enters in the interpretation.  If you want to be
a frequentist, you can imagine a world where a tree is drawn from a
distribution, and then tips are assigned according to another distribution,
and you want to know if that second distribution is distinct from a random
assignment, which is the interpretation given above.  I don't see why you
couldn't take a Bayesian perspective if you'd prefer, and ask for the
posterior distribution of MPD, just don't know what you'd choose for a
prior.

-Carl


On Tue, Mar 8, 2011 at 2:16 AM, Rob Lanfear  wrote:

> Hi All,
>
> I have a frequentist test statistic (mean phylogenetic distance (MPD)
> calculated using picante) that I want to calculate while accounting (as
> best
> as I can!) for phylogenetic uncertainty. The test involves calculating a p
> value by first calculating the observed MPD between a particular set of
> tips, then comparing this to a null distribution calculated by randomising
> the tip labels and re-calculating MPD 1000 times.
>
> I have a posterior sample of 100,000 trees, which I'm assuming is a
> reasonable representation of the uncertainty in my tree, and I want to know
> if my test statistic is significant when accounting for this uncertainty.
> Does anyone know of the most appropriate way to do this (if there is one)?
>
> I have seen a couple of different approaches in the recent literature. For
> instance, some authors have calculated the observed MPD using the 'best'
> Bayesian tree of some description (e.g. the maximum clade credibility tree)
> and then used the Bayesian posterior sample of trees when calcualting the
> null distribution. Other authors (using a slightly different approach, but
> with the same basic philosophy) have re-calculted the test statistic for
> all
> trees in the posterior sample, and classed a dataset as rejecting the null
> hypothesis if >50% of the trees in the posterior sample show a significant
> result. In neither of these cases can I find (or figure out) a rigorous
> statistical reasoning behind the approach.
>
> It occurs to me that yet another way to approach this would be to
> re-calculate the p-value on each tree in the posterior sample, and thus
> obtain a 'posterior distribution' of p-values. I might then class as
> significant any dataset for which the 95% HPD of the p-value excludes
> values
> greater than 0.05. However, I worry that this represents some unholy
> alliance of Bayesian and frequentist thinking, so thought I'd ask here
> first.
>
> Any ideas or pointers to the literature gratefully received,
>
> Cheers
>
> Rob
>
> --
> Rob Lanfear
> Postdoc,
> Centre for Macroevolution and Macroecology,
> Research School of Biology,
> Australian National University
>
> Tel: +61 2 6125 7270
> www.robertlanfear.com
>
>[[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>



-- 
Carl Boettiger
UC Davis
http://www.carlboettiger.info/

[[alternative HTML version deleted]]

___
R-si

Re: [R-sig-phylo] Assign node ages to a tree

2011-03-11 Thread Hunt, Gene
Scott,

I have written a function to transform branch lengths on a tree according to 
age constraints in the terminal taxa and (optionally) the nodes (attached).  If 
your phylo object is tr, the command

scalePhylo(tr, tip.ages, node.mins)

will return a phylo object with new branch lengths scaled to the age 
constraints.  Ages are positive numbers (e.g., in Ma).  If all your taxa are 
extant, set tip.ages to be a vector of zeroes.  The argument node.mins can have 
NA entries for nodes without constraints.

I developed this for paleontological trees so the tip ages usually constrain 
branch lengths as well.  This may not work exactly as you would like for trees 
of extant species with only some of the nodes constrained (some internal nodes 
can be pushed up to the recent).

You may also want to look at Graeme Lloyd's function for doing this, which 
handles zero-length branches in a more elegant way:  
http://www.graemetlloyd.com/methdpf.html.  Again, this was designed for 
paleontological applications, so it might not get you exactly what you need.  
The code may be a useful starting point, though.

Best,
Gene


On 3/11/11 10:01 AM, "Scott Chamberlain"  wrote:

Hello,


We have trees for which we have estimated node ages for many nodes (but not all 
nodes in any one tree). We would like to assign node ages to the trees, and 
then later transform branch lengths according to node ages. How does one assign 
node ages in R?


Thanks, Scott Chamberlain
[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--
Gene Hunt
Curator, Department of Paleobiology
National Museum of Natural History
Smithsonian Institution [NHB, MRC 121]
P.O. Box 37012
Washington DC 20013-7012
Phone: 202-633-1331  Fax: 202-786-2832
http://paleobiology.si.edu/staff/individuals/hunt.cfm


Tree-node-ages.R
Description: Tree-node-ages.R
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Assign node ages to a tree

2011-03-11 Thread Scott Chamberlain
Gene, 

Thanks very much. I should have said I am aware of bladj in the program 
Phylocom, but this is great that there are at least two additional methods. 

All of our trees have only extant species, and no extinct species. What exactly 
is the problem with having only extant species in relation to your function? 

Also, how does your and Graeme's function handle polytomies? Do they have to be 
resolved beforehand, or are they resolved within the code you or Graeme wrote? 


Scott
On Friday, March 11, 2011 at 9:49 AM, Hunt, Gene wrote: 
> Scott,
> 
> I have written a function to transform branch lengths on a tree according to 
> age constraints in the terminal taxa and (optionally) the nodes (attached). 
> If your phylo object is tr, the command
> 
> scalePhylo(tr, tip.ages, node.mins)
> 
> will return a phylo object with new branch lengths scaled to the age 
> constraints. Ages are positive numbers (e.g., in Ma). If all your taxa are 
> extant, set tip.ages to be a vector of zeroes. The argument node.mins can 
> have NA entries for nodes without constraints.
> 
> I developed this for paleontological trees so the tip ages usually constrain 
> branch lengths as well. This may not work exactly as you would like for trees 
> of extant species with only some of the nodes constrained (some internal 
> nodes can be pushed up to the recent).
> 
> You may also want to look at Graeme Lloyd's function for doing this, which 
> handles zero-length branches in a more elegant way: 
> http://www.graemetlloyd.com/methdpf.html. Again, this was designed for 
> paleontological applications, so it might not get you exactly what you need. 
> The code may be a useful starting point, though.
> 
> Best,
> Gene
> 
> 
> On 3/11/11 10:01 AM, "Scott Chamberlain"  wrote:
> 
> Hello,
> 
> 
> We have trees for which we have estimated node ages for many nodes (but not 
> all nodes in any one tree). We would like to assign node ages to the trees, 
> and then later transform branch lengths according to node ages. How does one 
> assign node ages in R?
> 
> 
> Thanks, Scott Chamberlain
>  [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 
> 
> 
> --
> Gene Hunt
> Curator, Department of Paleobiology
> National Museum of Natural History
> Smithsonian Institution [NHB, MRC 121]
> P.O. Box 37012
> Washington DC 20013-7012
> Phone: 202-633-1331 Fax: 202-786-2832
> http://paleobiology.si.edu/staff/individuals/hunt.cfm
> 
> Attachments: 
> - Tree-node-ages.R
> 
> 


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] Assign node ages to a tree

2011-03-11 Thread Scott Chamberlain
Hello, 


We have trees for which we have estimated node ages for many nodes (but not all 
nodes in any one tree). We would like to assign node ages to the trees, and 
then later transform branch lengths according to node ages. How does one assign 
node ages in R? 


Thanks, Scott Chamberlain 
[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo