Re: [R-sig-phylo] summarising a frequentist test over a Bayesian sample of trees
Hi Rob, Interesting question. Others on this list can give a better answer, but maybe I'll make a few comments and others will chime in. Here's my understanding. Let me see if I got each of the approaches: 1) use the "best tree" for the estimate of the "observed" MPD. Then construct the null distribution by randomizing the tips and calculating MPD for each tree (or randomly sampled trees) in the posterior distribution of trees. This makes the frequentist test that the best tree + true tips is not likely to be a random sample from that null. The null sounds appropriate, but the observation doesn't sound like what you want, since you just want to make a statement about true tips, and integrate over trees. 2) Not sure if I got this one: First they calculate the "observed" MPD over all trees (rather than just the best). It seems to me you would want to average these estimates to get the "expected" MPD integrated over all trees, and compare that the null as above. Asking if 50% of these are significant is equivalent if one assumes this distribution is symmetric, but it seems like a strange choice. At least using the expected value, this sounds better. You also get an uncertainty your observed MPD. 3) For each tree, calculate the p value by comparing MPD of the real data to the random data. Repeating over trees gives a distribution of p values. You then discuss asking of 95% of these p-values are significant. Instead, I think you would rather ask if the expected/average p value of this distribution is significant. This sounds different from number 2, but I'm not convinced that it is. If you looked over the uncertainty on your "observed" value in number 2 you also get a distribution of p-values, and I believe these are the same. I believe integrating over uncertainty isn't distinctly Bayesian or Frequentist, the philosophy enters in the interpretation. If you want to be a frequentist, you can imagine a world where a tree is drawn from a distribution, and then tips are assigned according to another distribution, and you want to know if that second distribution is distinct from a random assignment, which is the interpretation given above. I don't see why you couldn't take a Bayesian perspective if you'd prefer, and ask for the posterior distribution of MPD, just don't know what you'd choose for a prior. -Carl On Tue, Mar 8, 2011 at 2:16 AM, Rob Lanfear wrote: > Hi All, > > I have a frequentist test statistic (mean phylogenetic distance (MPD) > calculated using picante) that I want to calculate while accounting (as > best > as I can!) for phylogenetic uncertainty. The test involves calculating a p > value by first calculating the observed MPD between a particular set of > tips, then comparing this to a null distribution calculated by randomising > the tip labels and re-calculating MPD 1000 times. > > I have a posterior sample of 100,000 trees, which I'm assuming is a > reasonable representation of the uncertainty in my tree, and I want to know > if my test statistic is significant when accounting for this uncertainty. > Does anyone know of the most appropriate way to do this (if there is one)? > > I have seen a couple of different approaches in the recent literature. For > instance, some authors have calculated the observed MPD using the 'best' > Bayesian tree of some description (e.g. the maximum clade credibility tree) > and then used the Bayesian posterior sample of trees when calcualting the > null distribution. Other authors (using a slightly different approach, but > with the same basic philosophy) have re-calculted the test statistic for > all > trees in the posterior sample, and classed a dataset as rejecting the null > hypothesis if >50% of the trees in the posterior sample show a significant > result. In neither of these cases can I find (or figure out) a rigorous > statistical reasoning behind the approach. > > It occurs to me that yet another way to approach this would be to > re-calculate the p-value on each tree in the posterior sample, and thus > obtain a 'posterior distribution' of p-values. I might then class as > significant any dataset for which the 95% HPD of the p-value excludes > values > greater than 0.05. However, I worry that this represents some unholy > alliance of Bayesian and frequentist thinking, so thought I'd ask here > first. > > Any ideas or pointers to the literature gratefully received, > > Cheers > > Rob > > -- > Rob Lanfear > Postdoc, > Centre for Macroevolution and Macroecology, > Research School of Biology, > Australian National University > > Tel: +61 2 6125 7270 > www.robertlanfear.com > >[[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list > R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > -- Carl Boettiger UC Davis http://www.carlboettiger.info/ [[alternative HTML version deleted]] ___ R-si
Re: [R-sig-phylo] Assign node ages to a tree
Scott, I have written a function to transform branch lengths on a tree according to age constraints in the terminal taxa and (optionally) the nodes (attached). If your phylo object is tr, the command scalePhylo(tr, tip.ages, node.mins) will return a phylo object with new branch lengths scaled to the age constraints. Ages are positive numbers (e.g., in Ma). If all your taxa are extant, set tip.ages to be a vector of zeroes. The argument node.mins can have NA entries for nodes without constraints. I developed this for paleontological trees so the tip ages usually constrain branch lengths as well. This may not work exactly as you would like for trees of extant species with only some of the nodes constrained (some internal nodes can be pushed up to the recent). You may also want to look at Graeme Lloyd's function for doing this, which handles zero-length branches in a more elegant way: http://www.graemetlloyd.com/methdpf.html. Again, this was designed for paleontological applications, so it might not get you exactly what you need. The code may be a useful starting point, though. Best, Gene On 3/11/11 10:01 AM, "Scott Chamberlain" wrote: Hello, We have trees for which we have estimated node ages for many nodes (but not all nodes in any one tree). We would like to assign node ages to the trees, and then later transform branch lengths according to node ages. How does one assign node ages in R? Thanks, Scott Chamberlain [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Gene Hunt Curator, Department of Paleobiology National Museum of Natural History Smithsonian Institution [NHB, MRC 121] P.O. Box 37012 Washington DC 20013-7012 Phone: 202-633-1331 Fax: 202-786-2832 http://paleobiology.si.edu/staff/individuals/hunt.cfm Tree-node-ages.R Description: Tree-node-ages.R ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Assign node ages to a tree
Gene, Thanks very much. I should have said I am aware of bladj in the program Phylocom, but this is great that there are at least two additional methods. All of our trees have only extant species, and no extinct species. What exactly is the problem with having only extant species in relation to your function? Also, how does your and Graeme's function handle polytomies? Do they have to be resolved beforehand, or are they resolved within the code you or Graeme wrote? Scott On Friday, March 11, 2011 at 9:49 AM, Hunt, Gene wrote: > Scott, > > I have written a function to transform branch lengths on a tree according to > age constraints in the terminal taxa and (optionally) the nodes (attached). > If your phylo object is tr, the command > > scalePhylo(tr, tip.ages, node.mins) > > will return a phylo object with new branch lengths scaled to the age > constraints. Ages are positive numbers (e.g., in Ma). If all your taxa are > extant, set tip.ages to be a vector of zeroes. The argument node.mins can > have NA entries for nodes without constraints. > > I developed this for paleontological trees so the tip ages usually constrain > branch lengths as well. This may not work exactly as you would like for trees > of extant species with only some of the nodes constrained (some internal > nodes can be pushed up to the recent). > > You may also want to look at Graeme Lloyd's function for doing this, which > handles zero-length branches in a more elegant way: > http://www.graemetlloyd.com/methdpf.html. Again, this was designed for > paleontological applications, so it might not get you exactly what you need. > The code may be a useful starting point, though. > > Best, > Gene > > > On 3/11/11 10:01 AM, "Scott Chamberlain" wrote: > > Hello, > > > We have trees for which we have estimated node ages for many nodes (but not > all nodes in any one tree). We would like to assign node ages to the trees, > and then later transform branch lengths according to node ages. How does one > assign node ages in R? > > > Thanks, Scott Chamberlain > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list > R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > > > -- > Gene Hunt > Curator, Department of Paleobiology > National Museum of Natural History > Smithsonian Institution [NHB, MRC 121] > P.O. Box 37012 > Washington DC 20013-7012 > Phone: 202-633-1331 Fax: 202-786-2832 > http://paleobiology.si.edu/staff/individuals/hunt.cfm > > Attachments: > - Tree-node-ages.R > > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Assign node ages to a tree
Hello, We have trees for which we have estimated node ages for many nodes (but not all nodes in any one tree). We would like to assign node ages to the trees, and then later transform branch lengths according to node ages. How does one assign node ages in R? Thanks, Scott Chamberlain [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo