Hi all,

I agree with Ted that models motivated by biological phenomena we know to be
true are the most useful, even though the apparent Brownian motion of traits
might have more to do with fluctuating environments, etc than genetic drift.
The star phylogeny is indistinguishable from the white-noise model or OU
model with infinite alpha, and may be a reasonable model of selection being
more important than shared ancestry.

Dave asks,

Thus, it is of interest to me to know if none of the models I have
>
considered are adequate descriptions of the real processes.
>

I can see how we can compare among reasonable baselines suggested by others
(OU, BM, etc), but I don't believe it's meaningful to try and reject every
model in the set.  Just because every model fits extremely poorly, absent
another hypothesis this shouldn't be any reason to reject a model --
particularly for very stochastic models like BM any particular outcome is
extremely unlikely.  How best then, do we address Dave's question?

Carl

On Mon, Jan 31, 2011 at 10:10 AM, <tgarl...@ucr.edu> wrote:

> Hi All,
>
> My perspective has been that we know (1) phylogeny happened (descent with
> modification, speciation) and (2) random genetic drift happened.  Given that
> knowledge, Brownian motion along your best estimate of the phylogenetic
> topology and branch lengths (in units proportional to divergence times) is
> the best (simplest) "baseline" we can have for a polygenic, quantitative
> trait.
>
> One might argue that Brownian motion along a star phylogeny with
> contemporaneous tips is another useful baseline, but it sort of a different
> kettle of fish, in my opinion.
>
> In any case, you can fit both of these models and compare them with models
> that have more parameters (via ln maximum likelihood ratio tests or
> information-theoretic statistics).
>
> Cheers,
> Ted
>
> Theodore Garland, Jr.
> Professor
> Department of Biology
> University of California, Riverside
> Riverside, CA 92521
> Office Phone:  (951) 827-3524
> Lab Phone:  (951) 827-5724
> Home Phone:  (951) 328-0820
> Facsimile:  (951) 827-4286 = Dept. office (not confidential)
> Email:  tgarl...@ucr.edu
>
> Main Departmental page:
> http://www.biology.ucr.edu/people/faculty/Garland.html
>
> List of all Publications:
> http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html
>
> Garland and Rose, 2009
> http://www.ucpress.edu/books/pages/10604.php
>
>
>  ---- Original message ----
>
>     Date: Mon, 31 Jan 2011 09:53:25 -0800
>    From: Luke Harmon <lu...@uidaho.edu>
>    Subject: Re: [R-sig-phylo] Model-Selection vs. Finding Models that
>    "Fit Well"
>     To: David Bapst <dwba...@uchicago.edu>
>    Cc: "r-sig-phylo@r-project.org mailing list"
>    <r-sig-phylo@r-project.org>
>
>    >I agree with Dave here. White noise has two parameters, mean and
>    variance, and - to me - is an interesting model to test. But I'm not
>    sure it should be considered as a "baseline."
>    >
>    >One can link Brownian motion and white noise through the
>    Ornstein-Uhlenbeck model - BM is OU with alpha (constraint)
>    parameter equal to zero, and WN is OU with infinite alpha.
>    >
>    >lh
>    >On Jan 30, 2011, at 3:18 PM, David Bapst wrote:
>    >
>    >> Florian-
>    >> Doesn't white noise have two parameters, mean and variance, and
>    thus is just
>    >> as complex as the Brownian Motion model? I guess LRT could be
>    done against
>    >> both.
>    >>
>    >> That said, it isn't clear to me that what it means for the White
>    Noise to
>    >> fit best, as WN is interpretable as an evolutionary scenario, so
>    it doesn't
>    >> seem a very clean null. If WN fits best in a set of models, the
>    trait
>    >> examined must be exceedingly low in phylogenetic signal and thus
>    the trait
>    >> must be evolving under one of the scenarios which can produce low
>    signal (OU
>    >> with high rates and strong attraction, and/or evolution where
>    descendant
>    >> values are not a function of ancestral values).
>    >> -Dave
>    >>
>    >> On Fri, Jan 28, 2011 at 5:31 PM, Florian Boucher
>    <floflobouc...@gmail.com>wrote:
>    >>
>    >>> Hi David and list,
>    >>>
>    >>> just a quick comment on one of your questions :
>    >>> for quantitative traits on a phylogeny you can compare your
>    "best" model to
>    >>> the "white noise" model implemented in geiger, which assumes
>    that your
>    >>> traits are drawn from a normal distribution.
>    >>> This last model would be the "baseline model" Ted evoked in his
>    post.
>    >>>
>    >>> I hope it helps...
>    >>>
>    >>>
>    >>> Florian Boucher
>    >>> PhD student, Laboratoire d'Ecologie Alpine,
>    >>> Grenoble, France
>    >>>
>    >>> 2011/1/28 David Bapst <dwba...@uchicago.edu>
>    >>>
>    >>>> Hello all,
>    >>>>
>    >>>> Apologies for leaving the replies to get cold for a week, but
>    now I
>    >>>> finally have some time to respond.
>    >>>>
>    >>>> On Thu, Jan 20, 2011 at 12:17 PM, Brian O'Meara
>    <omeara.br...@gmail.com>
>    >>>> wrote:
>    >>>>> I think considering model adequacy is something that would be
>    useful to
>    >>>> do
>    >>>>> and is not done much now. One general way to do this is to
>    simulate
>    >>> under
>    >>>>> your chosen model and see if the real data "look" very
>    different from
>    >>> the
>    >>>>> simulated data. For example, I might try a one rate vs. a two
>    rate
>    >>>> Brownian
>    >>>>> motion model and find the latter fits better. If the actual
>    true model
>    >>> is
>    >>>> an
>    >>>>> OU model with two very distinct peaks and strong selection,
>    which is
>    >>> not
>    >>>> in
>    >>>>> my model set, I'll find that my simulated data under the two
>    rate
>    >>>> Brownian
>    >>>>> model may look very different from my actual data, which will
>    be fairly
>    >>>>> bimodal. Aha, my model is inadequate. [but then what -- keep
>    adding new
>    >>>>> models, just report that your model is inadequate, ...?]
>    >>>>
>    >>>> Certainly, data exploration is a step that cannot be skipped;
>    I've
>    >>>> found that Ackerly's traitgrams work well for me for
>    visualizing my
>    >>>> data, although I know some people who find them simply
>    confusing
>    >>>> (particularly my traitgrams, as they have fossil taxa all over
>    them).
>    >>>>
>    >>>>> Of course, you need a method for evaluating how similar the
>    data
>    >>> "look".
>    >>>>> There's been some work on this in models for tree inference
>    using
>    >>>> posterior
>    >>>>> predictive performance (work by Jonathan Bollback and Jeremy
>    Brown come
>    >>>> to
>    >>>>> mind) or using other approaches (some of Peter Waddell's
>    work), but it
>    >>>>> hasn't really taken off yet. It'd be easy to implement such
>    approaches
>    >>> in
>    >>>> R
>    >>>>> for comparative methods given capabilities in ape, Geiger, and
>    other
>    >>>>> packages.
>    >>>>
>    >>>> I don't entirely follow, but I'll look in to the posterior
>    predictive
>    >>>> performance work you mention. But wouldn't how 'similar the
>    data look'
>    >>>> would all be a function of what we want to look at, would it
>    not?
>    >>>>
>    >>>> On Thu, Jan 20, 2011 at 12:27 PM, <tgarl...@ucr.edu> wrote:
>    >>>>> One quick comment. In many cases what you can do is also fit a
>    model
>    >>>> with no independent variables. It, too, will have a likelihood,
>    and can
>    >>> be
>    >>>> used as a "baseline" to argue whether your "best" model is
>    actually any
>    >>>> good. You could then do a maximum likelihood ratio test of your
>    best
>    >>> model
>    >>>> versus your baseline model. If the baseline model does not have
>    a
>    >>>> significant lack of fit by a LRT (e.g., P not < 0.05), then
>    your best
>    >>> model
>    >>>> arguably isn't of much use.
>    >>>>>
>    >>>>> Cheers,
>    >>>>> Ted
>    >>>>
>    >>>> That seems like a pretty good idea! How would we go about doing
>    that
>    >>>> an example case, such as for a continuous trait on a phylogeny?
>    >>>>
>    >>>> On Thu, Jan 20, 2011 at 2:00 PM, Nick Matzke
>    <mat...@berkeley.edu>
>    >>> wrote:
>    >>>>> If one is interested in absolute goodness of fit, rather than
>    model
>    >>>>> comparison (which model fits best, which might not be useful
>    if you are
>    >>>>> worried that all your models are horrible), wouldn't
>    cross-validation
>    >>> be
>    >>>> a
>    >>>>> good technique? I.e. leave out one tip, calculate the model
>    and the
>    >>>>> estimated node values from the rest, then put an estimate and
>    >>> uncertainty
>    >>>> on
>    >>>>> the tip and see how often it matches the observed value.
>    Repeat for
>    >>> all
>    >>>>> observations...
>    >>>>
>    >>>> Would jack-knifing data like that be appropriate for the
>    continuous
>    >>>> trait analyses? That would seem to deal with whether we have
>    enough
>    >>>> data to distinguish the models at hand (which is still
>    important), but
>    >>>> not deal with whether any of the models we are considering are
>    >>>> appropriate.
>    >>>>
>    >>>> On Thu, Jan 20, 2011 at 12:48 PM, Carl Boettiger
>    <cboet...@gmail.com>
>    >>>> wrote:
>    >>>>> Hi David, List,
>    >>>>>
>    >>>>> I think you make a good point. After all, the goal isn't to
>    match the
>    >>>>> pattern but to match the process. If we just wanted to match
>    the data
>    >>>> we'd
>    >>>>> use the most complicated model we could make (or some machine
>    learning
>    >>>>> pattern) and dispense with AIC.
>    >>>>>
>    >>>>> If a model has errors that are normally distributed from a
>    path, than
>    >>>>> minimizing R^2 for the model is the same as maximizing
>    likelihood, so
>    >>> I'm
>    >>>>> afraid I don't understand what is meant by not having a
>    >>> goodness-of-fit.
>    >>>>> Isn't likelihood a measure of fit?
>    >>>>
>    >>>> It is; I partly made an error in what I said and also was
>    quoting
>    >>>> others, who apparently did not appreciate the relationship
>    between R2
>    >>>> and likelihood in their discussions with me.
>    >>>>
>    >>>>> If we consider very stochastic models that have a large range
>    of
>    >>> possible
>    >>>>> outcomes, no outcome is very likely, and we don't expect a
>    good fit.
>    >>> If
>    >>>> we
>    >>>>> had replicates we might hope to compare the distributions of
>    possible
>    >>>>> outcomes.
>    >>>>>
>    >>>>> I don't see getting around this with a simple test. I think
>    Brian's
>    >>>> example
>    >>>>> is very instructive, it depends why we are trying to fit the
>    model in
>    >>> the
>    >>>>> first place (to go back to Levins 1968). If we want to learn
>    about
>    >>>> optima
>    >>>>> and strengths of selection, we won't learn anything by fitting
>    a BM
>    >>> model
>    >>>> to
>    >>>>> the data, as it has no parameters that represent these things.
>    >>> However,
>    >>>>> Brian's two-rate BM fit will still test that the rates of
>    >>> diversification
>    >>>>> don't differ substantially between the peaks (or conversely,
>    if one
>    >>> peak
>    >>>> had
>    >>>>> very weak stablizing selection, this would be detected as a
>    difference
>    >>> in
>    >>>>> Brownian rates between the clades)
>    >>>>
>    >>>> More or less, that is what happened to Sidlauskaus (2006,
>    2008).
>    >>>>
>    >>>>> If our goal wasn't to compare parameter values but to make
>    predictions
>    >>>> (for
>    >>>>> instance, estimate trait values missing taxa), then a purely
>    >>>> goodness-of-fit
>    >>>>> approach might be better (and some machine learning algorithm
>    could
>    >>>> probably
>    >>>>> out-perform any of the simple mechanistic models). I think it
>    may be
>    >>>>> difficult to really answer David's question without an example
>    of what
>    >>>>> hypothesis we are really after. Perhaps I have missed
>    something?
>    >>>>
>    >>>> I personally tend to be interested in which pattern best
>    describes
>    >>>> evolution in a trait I have measured for a clade. For example,
>    is
>    >>>> thecal diameter in graptoloids best described by BM, OU or
>    trend?
>    >>>> Which model fits best informs us about the types of factors
>    that may
>    >>>> have been at work in the evolution of the trait (ala McShea and
>    >>>> Brandon, 2010). This is a typical sort of paleontological
>    question,
>    >>>> generally asked in terms of what factors drive trends (Alroy,
>    1998).
>    >>>> Thus, it is of interest to me to know if none of the models I
>    have
>    >>>> considered are adequate descriptions of the real processes.
>    >>>>
>    >>>> Interesting responses, all! Thank you very much for this
>    discussion!
>    >>>> -Dave
>    >>>>
>    >>>> --
>    >>>> David Bapst
>    >>>> Dept of Geophysical Sciences
>    >>>> University of Chicago
>    >>>> 5734 S. Ellis
>    >>>> Chicago, IL 60637
>    >>>>
>    http://home.uchicago.edu/~dwbapst/<http://home.uchicago.edu/%7Edwbapst/>
> <http://home.uchicago.edu/%7Edwbapst/><
>    >>> http://home.uchicago.edu/%7Edwbapst/>
>    >>>>
>    >>>> _______________________________________________
>    >>>> R-sig-phylo mailing list
>    >>>> R-sig-phylo@r-project.org
>    >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>    >>>>
>    >>>
>    >>> [[alternative HTML version deleted]]
>    >>>
>    >>> _______________________________________________
>    >>> R-sig-phylo mailing list
>    >>> R-sig-phylo@r-project.org
>    >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>    >>>
>    >>
>    >>
>    >>
>    >> --
>    >> David Bapst
>    >> Dept of Geophysical Sciences
>    >> University of Chicago
>    >> 5734 S. Ellis
>    >> Chicago, IL 60637
>    >> http://home.uchicago.edu/~dwbapst/<http://home.uchicago.edu/%7Edwbapst/>
>    <http://home.uchicago.edu/%7Edwbapst/>
>    >>
>    >> [[alternative HTML version deleted]]
>    >>
>    >> _______________________________________________
>    >> R-sig-phylo mailing list
>    >> R-sig-phylo@r-project.org
>    >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>    >
>    >Luke Harmon
>    >Assistant Professor
>    >Biological Sciences
>    >University of Idaho
>    >208-885-0346
>    >lu...@uidaho.edu
>    >
>    >_______________________________________________
>    >R-sig-phylo mailing list
>    >R-sig-phylo@r-project.org
>    >https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>



-- 
Carl Boettiger
UC Davis
http://www.carlboettiger.info/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to