Re: [R-sig-phylo] PGLS vs lm
link to code here: http://www.math.chalmers.se/~krzbar/GLSME/GLSME.R On Fri, Jul 12, 2013 at 1:59 PM, Joe Felsenstein wrote: > > Tom Schoenemann asked me: > > > With respect to your crankiness, is this the paper by Hansen that you > are referring to?: > > > > Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. > (2012). A phylogenetic comparative method for studying multivariate > adaptation. Journal of Theoretical Biology, 314(0), 204-215. > > > > I wrote Bartoszek to see if I could get his R code to try the method > mentioned in there. If I can figure out how to apply it to my data, that > will be great. I agree that it is clearly a mistake to assume one variable > is responding evolutionarily only to the current value of the other > (predictor variables). > > I'm glad to hear that *somebody* here thinks it is a mistake (because it > really is). I keep mentioning it here, and Hansen has published > extensively on it, but everyone keeps saying "Well, my friend used it, and > he got tenure, so it must be OK". > > The paper I saw was this one: > > Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the > evolutionary regression: The interplay between observational and biological > errors in phylogenetic comparative studies. Systematic Biology 61 (3): > 413-425. ISSN 1063-5157. > > J.F. > > Joe Felsenstein j...@gs.washington.edu > Department of Genome Sciences and Department of Biology, > University of Washington, Box 355065, Seattle, WA 98195-5065 USA > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Getting started making R packages
Tristan- I found this to be the most important how-to reference, but I am a windows user. http://robjhyndman.com/hyndsight/building-r-packages-for-windows/ For distributing on CRAN, making help files and examples is the most critical step. package.skeleton() is helpful in setting up the necessary directories and initial help documents for you, and then you can go fill them out. I found that looking at other people's package files on github or elsewhere can be quite useful, such as when trying to figure out formatting for stuff in the NAMESPACE file. For example: https://github.com/richfitz/diversitree http://www.phytools.org/ https://github.com/cboettig Yes, R CMD check is where much of your time will be spent, but once you are done with that, you will build the tarball for CRAN. You should definitely run check with the --as-cran option, since it is more stringent, and also preferably with R-devel. A word of warning: CRAN administrators have fairly strict regulations regarding how much time single help files or the entire package is allowed to take, in terms of error checking with the help examples. They may complain if your examples take too long, which may require putting those in 'don't run' sections. When I submit, I am often guilty of having examples that run too long. There are other policies, and some suggestions, beyond the extensions pdf which Elliot linked to. You should also look at these: http://cran.r-project.org/web/packages/policies.html http://developer.r-project.org/Rds.html Good luck and let me know if you have any more specific questions, -Dave On Fri, Jul 12, 2013 at 1:19 PM, Tristan Stayton wrote: > Hello, > > I'm relatively new to the world of R and need some general advice. I have > a number of R functions that I'd like to put together in a package and > distribute via CRAN, and I've found a few more or less helpful tutorials on > the subject. However, I seem to be getting stuck with rtools (running R > CMD check, etc...) and even if I get past that I'm sure other issues will > come up. So I'm wondering if there are any resources or tutorials > available which you all would recommend. What was helpful to you when you > were first learning to build and distribute packages? > > Thanks in advance. And feel free to let me know if this is not the proper > venue for such questions. > > Tristan > > -- > C. Tristan Stayton > Associate Professor of Biology > Bucknell University > Lewisburg, PA 17837 > > Office: 570-577-3272 > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- David W. Bapst, PhD Adjunct Asst. Professor, Geology and Geol. Eng. South Dakota School of Mines and Technology 501 E. St. Joseph Rapid City, SD 57701 http://home.uchicago.edu/~dwbapst/ http://cran.r-project.org/web/packages/paleotree/index.html ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
Tom Schoenemann asked me: > With respect to your crankiness, is this the paper by Hansen that you are > referring to?: > > Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. > (2012). A phylogenetic comparative method for studying multivariate > adaptation. Journal of Theoretical Biology, 314(0), 204-215. > > I wrote Bartoszek to see if I could get his R code to try the method > mentioned in there. If I can figure out how to apply it to my data, that will > be great. I agree that it is clearly a mistake to assume one variable is > responding evolutionarily only to the current value of the other (predictor > variables). I'm glad to hear that *somebody* here thinks it is a mistake (because it really is). I keep mentioning it here, and Hansen has published extensively on it, but everyone keeps saying "Well, my friend used it, and he got tenure, so it must be OK". The paper I saw was this one: Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the evolutionary regression: The interplay between observational and biological errors in phylogenetic comparative studies. Systematic Biology 61 (3): 413-425. ISSN 1063-5157. J.F. Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Getting started making R packages
Tristan, Check out the package devtools, and before submitting to CRAN make sure you read: cran.us.r-project.org/doc/manuals/R-exts.pdf There are obviously people on here who know a million times more than me, and not that I've gotten very far, but I'm trying to figure this out myself at the moment. Email me directly and I'll let you know what I find as I go. Cheers, Eliot On Fri, Jul 12, 2013 at 1:19 PM, Tristan Stayton wrote: > Hello, > > I'm relatively new to the world of R and need some general advice. I have > a number of R functions that I'd like to put together in a package and > distribute via CRAN, and I've found a few more or less helpful tutorials on > the subject. However, I seem to be getting stuck with rtools (running R > CMD check, etc...) and even if I get past that I'm sure other issues will > come up. So I'm wondering if there are any resources or tutorials > available which you all would recommend. What was helpful to you when you > were first learning to build and distribute packages? > > Thanks in advance. And feel free to let me know if this is not the proper > venue for such questions. > > Tristan > > -- > C. Tristan Stayton > Associate Professor of Biology > Bucknell University > Lewisburg, PA 17837 > > Office: 570-577-3272 > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Getting started making R packages
Hello, I'm relatively new to the world of R and need some general advice. I have a number of R functions that I'd like to put together in a package and distribute via CRAN, and I've found a few more or less helpful tutorials on the subject. However, I seem to be getting stuck with rtools (running R CMD check, etc...) and even if I get past that I'm sure other issues will come up. So I'm wondering if there are any resources or tutorials available which you all would recommend. What was helpful to you when you were first learning to build and distribute packages? Thanks in advance. And feel free to let me know if this is not the proper venue for such questions. Tristan -- C. Tristan Stayton Associate Professor of Biology Bucknell University Lewisburg, PA 17837 Office: 570-577-3272 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
Thanks Liam, OK, I'm starting to understand this better. But I'm not sure what now to do. Given that the mathematics are such that a PGLS gives significance in one direction, but not in another, what is the most convincing way to show that the two variables really ARE associated (at some level of probability) independent of phylogeny? Ultimately I want to investigate the following: Given 2 (or more) behavioral measures, what is the probability that they are independently associated with brain size in my sample, controlling for phylogeny. I'd also like to create a prediction model that allows me to estimate what the behavioral values would be for a given brain size (of course with confidence intervals, so I could assess whether the model is really actually useful at all for prediction). Thanks for any suggestions, -Tom On Jul 11, 2013, at 5:23 PM, Liam J. Revell wrote: > Hi Tom. > > This is actually not a property of GLS - but of using different correlation > structures when fitting y~x vs. x~y. When you set > correlation=corPagel(...,fixed=FALSE) (the default for corPagel), gls will > fit Pagel's lambda model to the residual error in y|x. The fitted value of > lambda will almost always be different between y|x and x|y. Since the fitted > correlation structure of the residual error is used to calculate our standard > error for beta, this will affect any hypothesis test about beta. > > By contrast, if we assume a fixed error structure (OLS, as in lm; or > correlation=corBrownian(...) - the latter being the same as contrasts > regression), we will find that the P values are the same for y~x vs. x~y. > > library(phytools) > library(nlme) > tree<-pbtree(n=100) > x<-fastBM(tree) > # note I have intentionally simulated y without phylogenetic signal > y<-setNames(rnorm(n=100),names(x)) > fit.a<-gls(y~x,data.frame(x,y),correlation=corBrownian(1,tree)) > summary(fit.a) > fit.b<-gls(x~y,data.frame(x,y),correlation=corBrownian(1,tree)) > summary(fit.b) > # fit.a & fit.b should have the same P-values > fit.c<-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree)) > summary(fit.c) > fit.d<-gls(x~y,data.frame(x,y),correlation=corPagel(1,tree)) > summary(fit.d) > # fit.c & fit.d will most likely have different P-values > > All the best, Liam > > Liam J. Revell, Assistant Professor of Biology > University of Massachusetts Boston > web: http://faculty.umb.edu/liam.revell/ > email: liam.rev...@umb.edu > blog: http://blog.phytools.org > > On 7/11/2013 12:03 AM, Tom Schoenemann wrote: >> Hi all, >> >> I ran a PGLS with two variables, call them VarA and VarB, using a >> phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get >> a significant coefficient for VarB. However, if I invert this and try to >> predict VarB from VarA, I do NOT get a significant coefficient for VarA. >> Shouldn't these be both significant, or both insignificant (the actual >> outputs and calls are pasted below)? >> >> If I do a simple lm for these, I get the same significance level for the >> coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though >> the values of the coefficients of course differ. >> >> Can someone help me understand why the PGLS would not necessarily be >> symmetric in this same way? >> >> Thanks, >> >> -Tom >> >>> outTree_group_by_brain_LambdaEst_redo1 <- gls(log_group_size_data ~ >>> log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = >>> DF.brain.repertoire.group, method= "ML") >>> summary(outTree_group_by_brain_LambdaEst_redo1) >> Generalized least squares fit by maximum likelihood >> Model: log_group_size_data ~ log_brain_weight_data >> Data: DF.brain.repertoire.group >>AIC BIClogLik >> 89.45152 99.8722 -40.72576 >> Correlation Structure: corPagel >> Formula: ~1 >> Parameter estimate(s): >>lambda >> 0.7522738 >> Coefficients: >>Value Std.Error t-value p-value >> (Intercept) -0.0077276 0.2628264 -0.029402 0.9766 >> log_brain_weight_data 0.4636859 0.1355499 3.420778 0.0009 >> >> Correlation: >> (Intr) >> log_brain_weight_data -0.637 >> Standardized residuals: >>Min Q1Med Q3Max >> -1.7225003 -0.1696079 0.5753531 1.0705308 3.0685637 >> Residual standard error: 0.5250319 >> Degrees of freedom: 100 total; 98 residual >> >> >> Here is the inverse: >> >>> outTree_brain_by_group_LambdaEst_redo1 <- gls(log_brain_weight_data ~ >>> log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = >>> DF.brain.repertoire.group, method= "ML") >>> summary(outTree_brain_by_group_LambdaEst_redo1) >> Generalized least squares fit by maximum likelihood >> Model: log_brain_weight_data ~ log_group_size_data >> Data: DF.brain.repertoire.group >> AIC BIC logLik >> -39.45804 -29.03736 23.72902 >> Correlation Structure: corPagel >> Formula: ~1 >> Parameter estimate(s): >> lambd
Re: [R-sig-phylo] PGLS vs lm
With respect to your crankiness, is this the paper by Hansen that you are referring to?: Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. (2012). A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology, 314(0), 204-215. I wrote Bartoszek to see if I could get his R code to try the method mentioned in there. If I can figure out how to apply it to my data, that will be great. I agree that it is clearly a mistake to assume one variable is responding evolutionarily only to the current value of the other (predictor variables). Regarding your comments: > If the "regressions" are being done in a model which implies > that the two variables are multivariate normal, then we can > simply estimate the parameters of that joint distribution, > which are of course the two means and the three elements of the > covariance matrix. > > If we then test whether Cov(X,Y) is different from zero, that > should be equivalent to a test of significance of either > regression. I'm not clear on what you are suggesting I do here. Isn't PGLS essentially testing Cov(X,Y) taking the phylogeny into account? And are you saying there is a way to show that my variables are significantly associated with each other even though PGLS shows different things depending on which way I run the associations? -Tom On Jul 11, 2013, at 5:46 PM, Joe Felsenstein wrote: > > If the "regressions" are being done in a model which implies > that the two variables are multivariate normal, then we can > simply estimate the parameters of that joint distribution, > which are of course the two means and the three elements of the > covariance matrix. > > If we then test whether Cov(X,Y) is different from zero, that > should be equivalent to a test of significance of either > regression. > > /* crankiness on */ > Note of course that most "phylogenetic" regressions are being > done wrong: if they assume that Y responds to the current value > of X, but when the value of Y may actually be the result of > optimum selection which is affected by past values of X which > we do not observe directly. > > I've complained about this here in the past, to no avail, > Thomas Hansen, in a recent paper, made the same point, with > evidence too. > /* crankiness off */ > > J.F. > > Joe Felsenstein j...@gs.washington.edu > Department of Genome Sciences and Department of Biology, > University of Washington, Box 355065, Seattle, WA 98195-5065 USA _ P. Thomas Schoenemann Associate Professor Department of Anthropology Cognitive Science Program Indiana University Bloomington, IN 47405 Phone: 812-855-8800 E-mail: t...@indiana.edu Open Research Scan Archive (ORSA) Co-Director Consulting Scholar Museum of Archaeology and Anthropology University of Pennsylvania http://www.indiana.edu/~brainevo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
OK, I started going through the Ives et al. paper - thanks for that. Note that my data is not brain size vs. body size, but brain size vs. social group size (not a measure for which brain size is a subset). For our particular dataset, I believe we were not able to find much in the way of within-species variation for one of the variables - typically one report per species, and usually no variation given (but I'm not sure on that - I'll have to check). Regarding what exactly we want to do: 1) is there a significant association between brain size and two other behavioral dimensions (reported in the literature), after taking into account (as best we can) phylogeny. This is why I was trying PGLS. We probably also want to look at the relationship within clades (is there a phylogenetically appropriate version of ANCOVA?). 2) are these two other behavioral measures independently associated with brain size (after controlling for the other) - I'm assuming this would be a phylogenetically appropriate version of multiple regression But my issue is that, if I use PGLS, I get significant coefficients if I do it one direction, and not in the other. This makes me skeptical that there is a significant association in the first place. -Tom On Jul 11, 2013, at 4:32 PM, Theodore Garland Jr wrote: > I think the issue is largely one of conceptualizing the problem. > People often view body size as an "independent variable" when analyzing brain > size, but obviously this is a serious oversimplificaiton -- usually done for > statistical convenience -- that does not reflect the biology (yes, I have > also done this!). Moreover, brain mass is part of body mass, so if you use > body mass per se as an independent variable then you have potential > part-whole correlation statistical issues. > > I would think carefully about what you are really wanting to do (e.g., > regression vs. correlation vs. ANCOVA), and check over this paper: > > Ives, A. R., P. E. Midford, and T. Garland, Jr. 2007. Within-species > variation and measurement error in phylogenetic comparative methods. > Systematic Biology 56:252-270. > > > And maybe this one: > > Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. > Phylogenetic analysis of covariance by computer simulation. Systematic > Biology 42:265-292. > > > Cheers, > Ted > > Theodore Garland, Jr., Professor > Department of Biology > University of California, Riverside > Riverside, CA 92521 > Office Phone: (951) 827-3524 > Wet Lab Phone: (951) 827-5724 > Dry Lab Phone: (951) 827-4026 > Home Phone: (951) 328-0820 > Skype: theodoregarland > Facsimile: (951) 827-4286 = Dept. office (not confidential) > Email: tgarl...@ucr.edu > http://www.biology.ucr.edu/people/faculty/Garland.html > http://scholar.google.com/citations?hl=en&user=iSSbrhwJ > > Inquiry-based Middle School Lesson Plan: > "Born to Run: Artificial Selection Lab" > http://www.indiana.edu/~ensiweb/lessons/BornToRun.html > > From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] > on behalf of Tom Schoenemann [t...@indiana.edu] > Sent: Thursday, July 11, 2013 11:19 AM > To: Emmanuel Paradis > Cc: r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] PGLS vs lm > > Thanks Emmanuel, > > OK, so this makes sense in terms of the math involved. However, from a > practical, interpretive perspective, shouldn't I assume this to mean that we > actually cannot say (from this data) whether VarA and VarB ARE actually > associated with each other? In the real world, if VarA is causally related to > VarB, then by definition they will be associated. Doesn't this type of > situation - where the associations are judged to be statistically significant > in one direction but not in the other - suggest that we actually DON'T have > confidence that - independent of phylogeny - VarA is associated with VarB? > Putting this in the context of the actual variables involved, doesn't this > mean that we actually can't be sure brain size is associated with social > group size (in this dataset) independent of phylogeny? > > I notice that the maximum likelihood lambda estimates are different (though > I'm not sure they are significantly so). I understand this could > mathematically be so, but I'm concerned with how to interpret this. In the > real world, how could phylogenetic relatedness affect group size predicting > brain size, more than brain size predicting group size? Isn't this a logical > problem (for interpretation - not for the math)? In other words, in > evolutionary history, shouldn't phylogeny affect the relationship between two > variables in only one way, which would show up whichever way we approached > the association? Again, I understand the math may allow it, I just don't > understand how it could actually be true over evolutionary time. > > Thanks in advance for helping me understand this better, > > -Tom > > > On Jul 11, 2013, at 5:12 AM,
Re: [R-sig-phylo] Distance matrix from phylogeny
Hi Eugen, to compute a distance matrix from a phylogeny, you can use the cophenetic.phylo function in {ape}. You then have to rescale all distances in you matrix (which are expressed in million years) by the total depth of the tree: dist=cophenetic.phylo(tree) dist2=dist/max(dist) Best, Florian 2013/7/12 Eugen > Hi folks, > > I have phylogeny with absolute branch lengths in million years. How do I > get a distance matrix with values between 0 and 1? > > Best, > > Eugen > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Distance matrix from phylogeny
Hi folks, I have phylogeny with absolute branch lengths in million years. How do I get a distance matrix with values between 0 and 1? Best, Eugen ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/