Re: [R-sig-phylo] simulating continuous data
If you are trying to mimic real data, then perhaps you have some fossil data to go on? If not, then you can try to pick a "reasonable" value based on other biological knowledge. Check the Garland et al. (1993) for how we did it. Cheers, Ted From: Bryan McLean [bryansmcl...@gmail.com] Sent: Tuesday, May 10, 2016 3:32 PM To: Theodore Garland Jr Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] simulating continuous data Thanks Joe and Ted, By similar scaling, I just meant (as Ted guessed) that the root value depends on the empirical trait data, and does not start at 0 or 1, e.g., and thus produces simulated values that can be directly compared to the true data. Under a Brownian model, the mean trait value is suitable as the root value, but how does one specify a root value under a different and potentially better fitting model (OU, EB)? Im working mostly in R. -Bryan On 10 May2016, at 4:38 PM, Theodore Garland Jr > wrote: This is a good point and one that is often glossed over. We talked about it quite a bit here: Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265�292. http://www.biology.ucr.edu/people/faculty/Garland/GarlEA93.pdf Surely you want to do various descriptive statistics on your simulated data sets to see how they compare with the real one, and presumably you want some of those to include the phylogenetic versions (e.g., conventional and phylogenetic estimates of the correlation coefficient if you are simulating two traits). I think it is also really important to consider models that have limits to trait evolution (again, see the paper listed above). Those limits can interact strongly with starting (root) values, especially if you include evolutionary trends. Check Figure 1 in this paper: Diaz-Uriarte, R., and T. Garland. 1996. Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Systematic Biology 45:27�47. http://www.biology.ucr.edu/people/faculty/Garland/DiazGa96.pdf Cheers, Ted Theodore Garland, Jr., Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Facsimile: (951) 827-4286 (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html http://scholar.google.com/citations?hl=en&user=iSSbrhwJ Director, UCR Institute for the Development of Educational Applications Editor in Chief, Physiological and Biochemical Zoology Fail Lab: Episode One http://testtube.com/faillab/zoochosis-episode-one-evolution http://www.youtube.com/watch?v=c0msBWyTzU0 From: R-sig-phylo [r-sig-phylo-boun...@r-project.org] on behalf of Bryan McLean [bryansmcl...@gmail.com] Sent: Tuesday, May 10, 2016 1:24 PM To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] simulating continuous data Hi list, I�m working to simulate multiple continuous characters on a known phylogeny (using several of the standard models), and I want to compare properties of the simulated datasets to an empirical dataset. My question is: what is the standard method for ensuring that those datasets (simulated, empirical) are actually directly comparable, i.e. scaled similarly? Does this involve specifying a sensible root state (e.g. ancestral reconstruction) OR just rescaling one or the other datasets before or after the analysis? Forgive me if this is a bit of a naive question, just trying to get a sense of standard practices. -Bryan McLean ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] simulating continuous data
Thanks Joe and Ted, By similar scaling, I just meant (as Ted guessed) that the root value depends on the empirical trait data, and does not start at 0 or 1, e.g., and thus produces simulated values that can be directly compared to the true data. Under a Brownian model, the mean trait value is suitable as the root value, but how does one specify a root value under a different and potentially better fitting model (OU, EB)? Im working mostly in R. -Bryan > On 10 May2016, at 4:38 PM, Theodore Garland Jr > wrote: > > This is a good point and one that is often glossed over. > We talked about it quite a bit here: > > Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. > Phylogenetic analysis of covariance by computer simulation. Systematic > Biology 42:265�292. > http://www.biology.ucr.edu/people/faculty/Garland/GarlEA93.pdf > > Surely you want to do various descriptive statistics on your simulated data > sets to see how they compare with the real one, and presumably you want some > of those to include the phylogenetic versions (e.g., conventional and > phylogenetic estimates of the correlation coefficient if you are simulating > two traits). > > I think it is also really important to consider models that have limits to > trait evolution (again, see the paper listed above). Those limits can > interact strongly with starting (root) values, especially if you include > evolutionary trends. > > Check Figure 1 in this paper: > > Diaz-Uriarte, R., and T. Garland. 1996. Testing hypotheses of correlated > evolution using phylogenetically independent contrasts: sensitivity to > deviations from Brownian motion. Systematic Biology 45:27�47. > http://www.biology.ucr.edu/people/faculty/Garland/DiazGa96.pdf > > Cheers, > Ted > > Theodore Garland, Jr., Professor > Department of Biology > University of California, Riverside > Riverside, CA 92521 > Office Phone: (951) 827-3524 > Facsimile: (951) 827-4286 (not confidential) > Email: tgarl...@ucr.edu > http://www.biology.ucr.edu/people/faculty/Garland.html > http://scholar.google.com/citations?hl=en&user=iSSbrhwJ > > Director, UCR Institute for the Development of Educational Applications > > Editor in Chief, Physiological and Biochemical Zoology > > Fail Lab: Episode One > http://testtube.com/faillab/zoochosis-episode-one-evolution > http://www.youtube.com/watch?v=c0msBWyTzU0 > > > From: R-sig-phylo [r-sig-phylo-boun...@r-project.org] on behalf of Bryan > McLean [bryansmcl...@gmail.com] > Sent: Tuesday, May 10, 2016 1:24 PM > To: r-sig-phylo@r-project.org > Subject: [R-sig-phylo] simulating continuous data > > Hi list, > > I�m working to simulate multiple continuous characters on a known phylogeny > (using several of the standard models), and I want to compare properties of > the simulated datasets to an empirical dataset. My question is: what is the > standard method for ensuring that those datasets (simulated, empirical) are > actually directly comparable, i.e. scaled similarly? Does this involve > specifying a sensible root state (e.g. ancestral reconstruction) OR just > rescaling one or the other datasets before or after the analysis? Forgive me > if this is a bit of a naive question, just trying to get a sense of standard > practices. > > -Bryan McLean > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] simulating continuous data
This is a good point and one that is often glossed over. We talked about it quite a bit here: Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265–292. http://www.biology.ucr.edu/people/faculty/Garland/GarlEA93.pdf Surely you want to do various descriptive statistics on your simulated data sets to see how they compare with the real one, and presumably you want some of those to include the phylogenetic versions (e.g., conventional and phylogenetic estimates of the correlation coefficient if you are simulating two traits). I think it is also really important to consider models that have limits to trait evolution (again, see the paper listed above). Those limits can interact strongly with starting (root) values, especially if you include evolutionary trends. Check Figure 1 in this paper: Diaz-Uriarte, R., and T. Garland. 1996. Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Systematic Biology 45:27–47. http://www.biology.ucr.edu/people/faculty/Garland/DiazGa96.pdf Cheers, Ted Theodore Garland, Jr., Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Facsimile: (951) 827-4286 (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html http://scholar.google.com/citations?hl=en&user=iSSbrhwJ Director, UCR Institute for the Development of Educational Applications Editor in Chief, Physiological and Biochemical Zoology Fail Lab: Episode One http://testtube.com/faillab/zoochosis-episode-one-evolution http://www.youtube.com/watch?v=c0msBWyTzU0 From: R-sig-phylo [r-sig-phylo-boun...@r-project.org] on behalf of Bryan McLean [bryansmcl...@gmail.com] Sent: Tuesday, May 10, 2016 1:24 PM To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] simulating continuous data Hi list, I’m working to simulate multiple continuous characters on a known phylogeny (using several of the standard models), and I want to compare properties of the simulated datasets to an empirical dataset. My question is: what is the standard method for ensuring that those datasets (simulated, empirical) are actually directly comparable, i.e. scaled similarly? Does this involve specifying a sensible root state (e.g. ancestral reconstruction) OR just rescaling one or the other datasets before or after the analysis? Forgive me if this is a bit of a naive question, just trying to get a sense of standard practices. -Bryan McLean ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] simulating continuous data
Bryan McLean -- > I’m working to simulate multiple continuous characters on a known > phylogeny (using several of the standard models), and I want to compare > properties of the simulated datasets to an empirical dataset. My question > is: what is the standard method for ensuring that those datasets > (simulated, empirical) are actually directly comparable, i.e. scaled > similarly? Does this involve specifying a sensible root state (e.g. > ancestral reconstruction) OR just rescaling one or the other datasets > before or after the analysis? Forgive me if this is a bit of a naive > question, just trying to get a sense of standard practices. > It would seem to depend on what you consider to be "scaled similarly". When I simulate multiple characters with correlated Brownian Motion, I specify a covariance matrix for the evolutionary changes, as well as a starting vector of means. Using a matrix square root of the covariance matrix, one can transform the characters so that the covariance matrix of the new characters is an identity matrix. Those are easy to simulate up the tree, and then one transforms back to the original characters. I do this with my own C programs, but it can be done in R too. But what does it mean to be "scaled similarly" ? Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] simulating continuous data
Hi list, I’m working to simulate multiple continuous characters on a known phylogeny (using several of the standard models), and I want to compare properties of the simulated datasets to an empirical dataset. My question is: what is the standard method for ensuring that those datasets (simulated, empirical) are actually directly comparable, i.e. scaled similarly? Does this involve specifying a sensible root state (e.g. ancestral reconstruction) OR just rescaling one or the other datasets before or after the analysis? Forgive me if this is a bit of a naive question, just trying to get a sense of standard practices. -Bryan McLean ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/