Re: [R-sig-phylo] simulating continuous data

2016-05-10 Thread Theodore Garland Jr
If you are trying to mimic real data, then perhaps you have some fossil data to 
go on?  If not, then you can try to pick a "reasonable" value based on other 
biological knowledge.  Check the Garland et al. (1993) for how we did it.

Cheers,
Ted

From: Bryan McLean [bryansmcl...@gmail.com]
Sent: Tuesday, May 10, 2016 3:32 PM
To: Theodore Garland Jr
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] simulating continuous data

Thanks Joe and Ted,

By similar scaling, I just meant (as Ted guessed) that the root value depends 
on the empirical trait data, and does not start at 0 or 1, e.g., and thus 
produces simulated values that can be directly compared to the true data. Under 
a Brownian model, the mean trait value is suitable as the root value, but how 
does one specify a root value under a different and potentially better fitting 
model (OU, EB)? Im working mostly in R.

-Bryan


On 10 May2016, at 4:38 PM, Theodore Garland Jr 
>
 wrote:

This is a good point and one that is often glossed over.
We talked about it quite a bit here:

Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
Phylogenetic analysis of covariance by computer simulation. Systematic Biology 
42:265�292.
http://www.biology.ucr.edu/people/faculty/Garland/GarlEA93.pdf

Surely you want to do various descriptive statistics on your simulated data 
sets to see how they compare with the real one, and presumably you want some of 
those to include the phylogenetic versions (e.g., conventional and phylogenetic 
estimates of the correlation coefficient if you are simulating two traits).

I think it is also really important to consider models that have limits to 
trait evolution (again, see the paper listed above).  Those limits can interact 
strongly with starting (root) values, especially if you include evolutionary 
trends.

Check Figure 1 in this paper:

Diaz-Uriarte, R., and T. Garland. 1996. Testing hypotheses of correlated 
evolution using phylogenetically independent contrasts: sensitivity to 
deviations from Brownian motion. Systematic Biology 45:27�47.
http://www.biology.ucr.edu/people/faculty/Garland/DiazGa96.pdf

Cheers,
Ted

Theodore Garland, Jr., Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Facsimile:  (951) 827-4286 (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html
http://scholar.google.com/citations?hl=en=iSSbrhwJ

Director, UCR Institute for the Development of Educational Applications

Editor in Chief, Physiological and Biochemical Zoology

Fail Lab: Episode One
http://testtube.com/faillab/zoochosis-episode-one-evolution
http://www.youtube.com/watch?v=c0msBWyTzU0


From: R-sig-phylo [r-sig-phylo-boun...@r-project.org] on behalf of Bryan McLean 
[bryansmcl...@gmail.com]
Sent: Tuesday, May 10, 2016 1:24 PM
To: r-sig-phylo@r-project.org
Subject: [R-sig-phylo] simulating continuous data

Hi list,

I�m working to simulate multiple continuous characters on a known phylogeny 
(using several of the standard models), and I want to compare properties of the 
simulated datasets to an empirical dataset. My question is: what is the 
standard method for ensuring that those datasets (simulated, empirical) are 
actually directly comparable, i.e. scaled similarly? Does this involve 
specifying a sensible root state (e.g. ancestral reconstruction) OR just 
rescaling one or the other datasets before or after the analysis? Forgive me if 
this is a bit of a naive question, just trying to get a sense of standard 
practices.

-Bryan McLean
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] simulating continuous data

2016-05-10 Thread Bryan McLean
Thanks Joe and Ted,

By similar scaling, I just meant (as Ted guessed) that the root value depends 
on the empirical trait data, and does not start at 0 or 1, e.g., and thus 
produces simulated values that can be directly compared to the true data. Under 
a Brownian model, the mean trait value is suitable as the root value, but how 
does one specify a root value under a different and potentially better fitting 
model (OU, EB)? Im working mostly in R.

-Bryan


> On 10 May2016, at 4:38 PM, Theodore Garland Jr  
> wrote:
> 
> This is a good point and one that is often glossed over.
> We talked about it quite a bit here:
> 
> Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
> Phylogenetic analysis of covariance by computer simulation. Systematic 
> Biology 42:265�292.
> http://www.biology.ucr.edu/people/faculty/Garland/GarlEA93.pdf
> 
> Surely you want to do various descriptive statistics on your simulated data 
> sets to see how they compare with the real one, and presumably you want some 
> of those to include the phylogenetic versions (e.g., conventional and 
> phylogenetic estimates of the correlation coefficient if you are simulating 
> two traits).
> 
> I think it is also really important to consider models that have limits to 
> trait evolution (again, see the paper listed above).  Those limits can 
> interact strongly with starting (root) values, especially if you include 
> evolutionary trends.
> 
> Check Figure 1 in this paper:
> 
> Diaz-Uriarte, R., and T. Garland. 1996. Testing hypotheses of correlated 
> evolution using phylogenetically independent contrasts: sensitivity to 
> deviations from Brownian motion. Systematic Biology 45:27�47.
> http://www.biology.ucr.edu/people/faculty/Garland/DiazGa96.pdf
> 
> Cheers,
> Ted
> 
> Theodore Garland, Jr., Professor
> Department of Biology
> University of California, Riverside
> Riverside, CA 92521
> Office Phone:  (951) 827-3524
> Facsimile:  (951) 827-4286 (not confidential)
> Email:  tgarl...@ucr.edu
> http://www.biology.ucr.edu/people/faculty/Garland.html
> http://scholar.google.com/citations?hl=en=iSSbrhwJ
> 
> Director, UCR Institute for the Development of Educational Applications
> 
> Editor in Chief, Physiological and Biochemical Zoology
> 
> Fail Lab: Episode One
> http://testtube.com/faillab/zoochosis-episode-one-evolution
> http://www.youtube.com/watch?v=c0msBWyTzU0
> 
> 
> From: R-sig-phylo [r-sig-phylo-boun...@r-project.org] on behalf of Bryan 
> McLean [bryansmcl...@gmail.com]
> Sent: Tuesday, May 10, 2016 1:24 PM
> To: r-sig-phylo@r-project.org
> Subject: [R-sig-phylo] simulating continuous data
> 
> Hi list,
> 
> I�m working to simulate multiple continuous characters on a known phylogeny 
> (using several of the standard models), and I want to compare properties of 
> the simulated datasets to an empirical dataset. My question is: what is the 
> standard method for ensuring that those datasets (simulated, empirical) are 
> actually directly comparable, i.e. scaled similarly? Does this involve 
> specifying a sensible root state (e.g. ancestral reconstruction) OR just 
> rescaling one or the other datasets before or after the analysis? Forgive me 
> if this is a bit of a naive question, just trying to get a sense of standard 
> practices.
> 
> -Bryan McLean
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] simulating continuous data

2016-05-10 Thread Theodore Garland Jr
This is a good point and one that is often glossed over.
We talked about it quite a bit here:

Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
Phylogenetic analysis of covariance by computer simulation. Systematic Biology 
42:265–292.
http://www.biology.ucr.edu/people/faculty/Garland/GarlEA93.pdf

Surely you want to do various descriptive statistics on your simulated data 
sets to see how they compare with the real one, and presumably you want some of 
those to include the phylogenetic versions (e.g., conventional and phylogenetic 
estimates of the correlation coefficient if you are simulating two traits).

I think it is also really important to consider models that have limits to 
trait evolution (again, see the paper listed above).  Those limits can interact 
strongly with starting (root) values, especially if you include evolutionary 
trends.

Check Figure 1 in this paper:

Diaz-Uriarte, R., and T. Garland. 1996. Testing hypotheses of correlated 
evolution using phylogenetically independent contrasts: sensitivity to 
deviations from Brownian motion. Systematic Biology 45:27–47.
http://www.biology.ucr.edu/people/faculty/Garland/DiazGa96.pdf

Cheers,
Ted

Theodore Garland, Jr., Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Facsimile:  (951) 827-4286 (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html
http://scholar.google.com/citations?hl=en=iSSbrhwJ

Director, UCR Institute for the Development of Educational Applications

Editor in Chief, Physiological and Biochemical Zoology

Fail Lab: Episode One
http://testtube.com/faillab/zoochosis-episode-one-evolution
http://www.youtube.com/watch?v=c0msBWyTzU0


From: R-sig-phylo [r-sig-phylo-boun...@r-project.org] on behalf of Bryan McLean 
[bryansmcl...@gmail.com]
Sent: Tuesday, May 10, 2016 1:24 PM
To: r-sig-phylo@r-project.org
Subject: [R-sig-phylo] simulating continuous data

Hi list,

I’m working to simulate multiple continuous characters on a known phylogeny 
(using several of the standard models), and I want to compare properties of the 
simulated datasets to an empirical dataset. My question is: what is the 
standard method for ensuring that those datasets (simulated, empirical) are 
actually directly comparable, i.e. scaled similarly? Does this involve 
specifying a sensible root state (e.g. ancestral reconstruction) OR just 
rescaling one or the other datasets before or after the analysis? Forgive me if 
this is a bit of a naive question, just trying to get a sense of standard 
practices.

-Bryan McLean
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] simulating continuous data

2016-05-10 Thread Joe Felsenstein
Bryan McLean --


> I’m working to simulate multiple continuous characters on a known
> phylogeny (using several of the standard models), and I want to compare
> properties of the simulated datasets to an empirical dataset. My question
> is: what is the standard method for ensuring that those datasets
> (simulated, empirical) are actually directly comparable, i.e. scaled
> similarly? Does this involve specifying a sensible root state (e.g.
> ancestral reconstruction) OR just rescaling one or the other datasets
> before or after the analysis? Forgive me if this is a bit of a naive
> question, just trying to get a sense of standard practices.
>

It would seem to depend on what you consider to be "scaled similarly".

When I simulate multiple characters with correlated Brownian Motion, I
specify a covariance matrix for the evolutionary changes, as well as a
starting vector of means.  Using a matrix square root of the covariance
matrix, one can transform the characters so that the covariance matrix of
the new characters is an identity matrix.  Those are easy to simulate up
the tree, and then one transforms back to the original characters.

I do this with my own C programs, but it can be done in R too.

But what does it mean to be "scaled similarly" ?

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] simulating continuous data

2016-05-10 Thread Bryan McLean
Hi list,

I’m working to simulate multiple continuous characters on a known phylogeny 
(using several of the standard models), and I want to compare properties of the 
simulated datasets to an empirical dataset. My question is: what is the 
standard method for ensuring that those datasets (simulated, empirical) are 
actually directly comparable, i.e. scaled similarly? Does this involve 
specifying a sensible root state (e.g. ancestral reconstruction) OR just 
rescaling one or the other datasets before or after the analysis? Forgive me if 
this is a bit of a naive question, just trying to get a sense of standard 
practices.

-Bryan McLean
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/