Re: [R-sig-phylo] How to test stratification of sampling across tree?

Brian O'Meara Wed, 15 Mar 2017 14:08:14 -0700

One thing you could do is take all nonoverlapping pairs of taxa
(Felsenstein's other technique in the contrasts paper): that is, for a tree
(A,(B,(C,(D,E)))), you can look at D-E and B-C, *or* D-E and A-B, *or* D-E
and A-C, *or* A-B and C-D, etc. (so, still leaving out one taxon each time,
but only using each edge once) and then compare the states for pair of
taxa. If state 0 has freq p, and state 1 has freq q, you should see p^2 0-0
pairs, 2pq 0-1 pairs, and q^2 1-1 pairs with truly random sampling; if it's
maximally stratified, you should see only two of these pairs (i.e., if 1 is
less common, you should see only 0-1 and 0-0 pairs). You could rig up a
test statistic (prob comparing two multinomial models) from this.


I have some code that purportedly gets all independent such pairs in R at
https://r-forge.r-project.org/scm/viewvc.php/pkg/R/independentTaxa.R?view=markup&revision=366&root=omearalab.
However, I haven't rigorously tested it (this was in the dark ages before
we all used testthat, too), so feel free to take, hack, republish, etc.,
but test first [anyone else should feel free to take this, too -- it could
naturally go into phytools, ape, or phangorn, for example, assuming it
actually works].

Best,
Brian


_______________________________________________________________________
Brian O'Meara, http://www.brianomeara.info, especially Calendar
<http://brianomeara.info/calendars/omeara/>, CV
<http://brianomeara.info/cv/>, and Feedback
<http://brianomeara.info/teaching/feedback/>

Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
Associate Director for Postdoctoral Activities, National Institute for
Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS)
Communication Director, Society of Systematic Biologists

On Tue, Mar 14, 2017 at 1:37 PM, Ross Mounce <ross.mou...@gmail.com> wrote:

> So I tried a 12-taxon fully pectinate tree with Blomberg's K as calculated
> by picante::Kcalc()
>
> library(picante)
> library(ape)
> aa<-"(A,(B,(C,(D,(E,(F,(G,(H,(I,(J,(K,L)))))))))));"
> t1<-read.tree(text=aa)
> t4 <- compute.brlen(t1,method="Grafen",1)
> tipvals <- c(0,1,0,1,0,1,0,1,0,1,0,1)
> Kcalc(tipvals,t4)
>
> K = 0.3487135
>
>
> There are possible 924 permutations of 6 'painted' tips from 000000111111
> to 111111000000 (each of those two extreme distributions gives the maximum
> value of K for this particular tree & number [6] of painted tips: 2.37703)
>
> There are 336 discrete integer value results of K for this tree, and
> painting 6 tips:
>
> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  0.3438  0.3676  0.4489  0.5332  0.5945  2.3770
>
>
> A histogram of all 924 possible values of K (for 6 painted tips) shows that
> Blomberg's K in terms of value distribution is extremely positively skewed
> (it has a skewness of 2.671139), which is great if you're looking to test
> phylogenetic signal without false positives, but not so great if you're
> trying to assess "evenness" at the other tail of the distribution. In the
> case of this exact tree, because I've enumerated all possible permutations
> of 6 painted tips, I can calculate the 5% significance threshold as values
> of K that are 0.3480158 or less.  It seems some normalisation procedure
> might be needed before safely using Blomberg's K to assess the significance
> of evenness, if one is not going to exhaustively examine/enumerate all
> possible values (which I can't do for a 1000+ tip tree).
>
> Does that make sense?
>
> Certainly interesting...
>
>
> Ross
>
>
>
>
>
>
>
>
> On 14 March 2017 at 14:53, Ross Mounce <ross.mou...@gmail.com> wrote:
>
> > Thanks Dave,
> >
> > I'll try Blomberg's K with small simulated fully-bifurcating trees of
> > simple shape (e.g. fully pectinate), where I can easily paint the tips
> > myself in what I believe to be a "maximally stratified manner" e.g.
> > 010101010 to see if Blomberg's K does actually reach minimum (i.e.
> 0.00000
> > ?) for such a distribution. If it does, great! This is the measure I
> need.
> >
> > I still wonder though, for a complex tree structure in terms of
> > balance/shape somewhere intermediate between fully balanced and fully
> > pectinate; how does one arrive empirically at _the_ most optimal
> > stratified/even sampling ('painting') of tips if say only 25% of tips
> > are/can be 'painted'. I guess a lot depends on how one defines what 'even
> > sampling' on a phylogeny actually is, does it include branch lengths et
> > cetera...
> >
> > I'll give it a try anyway,
> >
> > Thanks again,
> >
> > Ross
> >
> >
> > On 14 March 2017 at 14:33, David Bapst <dwba...@gmail.com> wrote:
> >
> >> Ross,
> >>
> >> An interesting question. I understand it as that you want to test if
> >> the trait is overdispersed relative to phylogeny, which still makes me
> >> think that measures of 'phylogenetic signal' might be still be useful,
> >> even though the typical interpretation is 'signal' as 'heritability'.
> >> I would try some toy examples with smallish trees and artificial data
> >> and play with different signal measures; particularly your idea
> >> regarding that the variance is high at the level of closest
> >> relatedness suggests that you perhaps should investigate Blomberg's K
> >> as a measure, rather than Pagel's lambda:
> >>
> >> Blomberg, S. P., T. Garland, and A. R. Ives. 2003. Testing for
> >> phylogenetic signal in comparative data: behavioral traits are more
> >> labile. Evolution 57.
> >>
> >> However, your soft polytomies are worrisome; I suggest using the MPT
> >> or posterior tree sample, if such exists, or considering resolving
> >> those polytomies somehow.
> >>
> >> Cheers,
> >> -Dave
> >>
> >> On Tue, Mar 14, 2017 at 5:45 AM, Ross Mounce <ross.mou...@gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I'm interested in the distribution of a non-heritable binary
> >> > trait/observation across a large tree 1000+ tip tree. The tree is
> >> > non-distinct in shape and balance, it is neither fully pectinate nor
> >> fully
> >> > balanced. It has many soft polytomies too.
> >> >
> >> > I believe the distribution of this trait to be significantly
> stratified
> >> > such that just for the sake of explanation, every other tip is
> "present"
> >> > for the trait. So essentially I'm interested in testing the evenness
> of
> >> > distribution of "present" tips across the tree.
> >> >
> >> > In this instance it doesn't seem to me that I should be testing for
> >> > "phylogenetic signal" or using models that do that, nor am I testing
> the
> >> > randomicity of distribution of the trait.
> >> > Specifically, I want to test if the observed distribution is
> >> significantly
> >> > close to "perfect" stratification for the given number of "presences"
> >> > (which is ~33% of the tips of the tree), on the given fixed tree
> shape.
> >> >
> >> > TL;DR
> >> >
> >> > How can I meaningfully test the evenness of the distribution of a
> binary
> >> > trait across a tree, with R?
> >> >
> >> >
> >> > Any ideas?
> >> >
> >> > Thanks,
> >> >
> >> > Ross
> >> >
> >> >
> >> > --
> >> > --
> >> > -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/
> >> -/-/-/-/-/-/-/-
> >> > Ross Mounce, PhD
> >> > Software Sustainability Institute Fellow
> >> > Dept. of Plant Sciences, University of Cambridge
> >> > www.rossmounce.co.uk <http://rossmounce.co.uk/>
> >> > -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/
> >> -/-/-/-/-/-/-/-
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > R-sig-phylo mailing list - R-sig-phylo@r-project.org
> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >> > Searchable archive at http://www.mail-archive.com/r-
> >> sig-ph...@r-project.org/
> >>
> >>
> >>
> >> --
> >> David W. Bapst, PhD
> >> Adjunct Asst. Professor, Geology and Geol. Eng.
> >> South Dakota School of Mines and Technology
> >> 501 E. St. Joseph
> >> Rapid City, SD 57701
> >>
> >> http://webpages.sdsmt.edu/~dbapst/
> >> http://cran.r-project.org/web/packages/paleotree/index.html
> >>
> >
> >
> >
> > --
> > --
> > -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
> > /-/-
> > Ross Mounce, PhD
> > Software Sustainability Institute Fellow 2016
> > Dept. of Plant Sciences, University of Cambridge
> > www.rossmounce.co.uk <http://rossmounce.co.uk/>
> > -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
> > /-/-
> >
>
>
>
> --
> --
> -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/
> -/-/-/-/-/-/-/-
> Ross Mounce, PhD
> Software Sustainability Institute Fellow 2016
> Dept. of Plant Sciences, University of Cambridge
> www.rossmounce.co.uk <http://rossmounce.co.uk/>
> -/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/
> -/-/-/-/-/-/-/-
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-
> sig-ph...@r-project.org/
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] How to test stratification of sampling across tree?

Reply via email to