Re: [R-sig-phylo] question about measurement error in phylogenetic signal (Krzysztof Bartoszek)
In addition to the references to papers by Hansen and Bartoszek, and by Ives, Midford and Garland, I would biasedly suggest this paper: Felsenstein, J. 2008. Comparative methods with sampling error and within-species variation: contrasts revisited and revised. American Naturalist 171: 713-725. The method estimates the within-species phenotypic variation (which, when you are analysing species means is the relevamt measurement error and also includes actual measurement error) and corrects for it. The software announced there is not in R, but I believe that Liam Revell's phytools package can call our program. Joe Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] question about measurement error in phylogenetic signal
Small follow-up to Liam's suggestion: If you do use an arcsin transformation for proportional data, the variance of arcsin(sqrt(p)) is approximately 1/(4N), where p is the proportion and N is sample size. The approximation is good unless the proportion is very close to 0 or 1. Best, Gene -- Gene Hunt Curator, Department of Paleobiology National Museum of Natural History Smithsonian Institution [NHB, MRC 121] P.O. Box 37012 Washington DC 20013-7012 Phone: 202-633-1331 Fax: 202-786-2832 http://paleobiology.si.edu/staff/individuals/hunt.cfm From: Liam J. Revell liam.rev...@umb.edumailto:liam.rev...@umb.edu Date: Sunday, July 7, 2013 3:10 PM To: Xavier Prudent prudentxav...@gmail.commailto:prudentxav...@gmail.com Cc: mailman, r-sig-phylo r-sig-phylo@r-project.orgmailto:r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] question about measurement error in phylogenetic signal Hi Eliot Xavier. I think that Xavier's suggestion is not a particularly good idea in this case because random error will tend to depress phylogenetic signal. In other words - random data error does not introduce random error in phylogenetic signal, rather it biases phylogenetic signal towards 0. A better approach is to incorporate error in the estimation of species means directly - following Ives et al. (2007). This is implemented in phylosig of the phytools package. Your formula for the standard error of a proportion is indeed the formula for the correct standard error given your data; however, it raises the question of whether the assumed model (BM) is suitable for your data (or perhaps this is what you are trying to find out). For small samples (n30), some people have recommended an n+4 correction - in which 2 successes and 2 failures are added during calculation of the SE. If you are using an arcsine transformation, as is common for proportion data, you need to be aware that your standard errors are on the original scale! (I don't know the formula for standard errors on the transformed scale.) - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edumailto:liam.rev...@umb.edu blog: http://blog.phytools.org On 7/4/2013 3:36 AM, Xavier Prudent wrote: Dear Eliot, One way to cope with the uncertainty on the inputs in an analysis is vary these inputs by some amount (like +- 1 standard deviation) and rerun your analysis. The spread of the result tells you then how robust your analysis is. Pay attention that the inputs may be varied in an independent way if they ARE independent, if they highly correlated you may prefer to vary them simultaneously. Hope that helps, Regards, Xavier 2013/7/4 Eliot Miller eliotmil...@umsl.edumailto:eliotmil...@umsl.edu Hello all, I have been trying to get something to work in a number of different packages and with a number of different approaches today that I couldn't get to run in a believable way. Before I spend another day on this, I was wondering what people think about the idea in general. I have a dataset of disease prevalence across ~100 species. There are ~2000 individuals total across the dataset, with 4 individuals per species. Prevalence per individual is coded as 0 or 1. I am interested in the phylogenetic signal of disease prevalence across the species. One approach that works is to simply calculate prevalence as the species-specific mean, i.e. if 3 individuals of 6 for a species had the disease, the prevalence would be 3/6 = 0.5. Then one can use these values with e.g. phylosig() (I arcsin sqrt transformed these proportions here). Like the few other published tests of phylogenetic signal in disease prevalence, there is little signal here. I could leave it at that, because in general there are very low detections in this dataset and it's probably not ideally suited to address this question anyhow. That aside however, because not all individuals of a given species always have the disease, I wanted to incorporate measurement error. So, based on the calculation for SE for binary data from the site: http://www.researchgate.net/post/Can_standard_deviation_and_standard_error_be_calculated_for_a_binary_variable , I also calculated a species-specific SEs as the sqrt(mean(prevalence)*((1- mean(prevalence))/individuals)). What do people think about this? It's hardly measurement error in the sense we normally mean it. On the other hand, I think it would be neat if there were some way to account for variation among individuals in prevalence, and the influence this has on phylogenetic signal. Cheers, Eliot [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.orgmailto:R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo
Re: [R-sig-phylo] Count Deep Coal in R?
I think the package phybase has functions for looking at gene trees in species trees that could be modified for this. The package has been purged from CRAN (well, archived), but you can still install from source. I'm CCing the package author, Liang Liu, to see if he has any ideas. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Jul 8, 2013 at 7:27 AM, Melisa Olave melizz...@hotmail.com wrote: just wondering if you found the way to count deep coal in R?? I'm trying to do the same... with no success! thank you! Melisa ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Count Deep Coal in R?
Hi Melisa, I agree with Brian, so far I have used phybase to do this. I haven't tested it completely but I believe you can use the getcoaltime() command from phybase to get at this information. You can try this code (most of which is from the phybase manual): install.packages('/home/dan/Downloads/phybase_1.3.tar.gz',repos=NULL,type='source') library(phybase) set.seed() tree-(((H:0.00402#0.01,C:0.00402#0.01):0.00304#0.01,G:0.00707#0.01):0.00929#0.01,O:0.01635#0.01)#0.01; nodematrix-read.tree.nodes(tree)$nodes rootnode-7 spname-species.name(tree) ##define the vector seq as [2,2,2,2] which means that there are 2 sequences in each species seq-rep(2,4) str-sim.coaltree.sp(rootnode,nodematrix,4,seq,name=spname)$gt gtnod-read.tree.nodes(str)$nodes getcoaltime(gtnod,nodematrix,8,4,spstructure(rep(2,4))) #what i got looks like: [,1] [,2] [1,]1 0.000870 [2,]5 0.001126 [3,]5 0.002557 [4,]6 0.000580 [5,]7 0.000193 [6,]2 0.001416 [7,]4 0.002865 plottree(str) #should show deep coalescences of Hs1 and Hs2 Given the description in the phybase manual you can see that the first coalescence is 0.000870 in species tree node 1, then Hs2 and Hs1 both coalesce to the (Cs1,Cs2) and ((Cs1,Cs2),Hs2)) respectively in species tree branch 5 etc... One simple way of looking for deepcoals would be to simply find over or underrepresented branches in the getcoaltime output. For example, there was no coalescence in species tree branch three due to a deep coal event so looking at the getcoaltime matrix we see there is no 3 in the first column and two 5's. Obviously this will get much trickier if you have uneven taxa sampling across species but it would still be more than possible to make simple predictions based on your sampling Cheers, -Dan From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Brian O'Meara [bome...@utk.edu] Sent: Monday, July 08, 2013 11:30 AM To: Melisa Olave; Liang Liu Cc: mailman, r-sig-phylo Subject: Re: [R-sig-phylo] Count Deep Coal in R? I think the package phybase has functions for looking at gene trees in species trees that could be modified for this. The package has been purged from CRAN (well, archived), but you can still install from source. I'm CCing the package author, Liang Liu, to see if he has any ideas. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Jul 8, 2013 at 7:27 AM, Melisa Olave melizz...@hotmail.com wrote: just wondering if you found the way to count deep coal in R?? I'm trying to do the same... with no success! thank you! Melisa ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Count Deep Coal in R?
Thanks Dan for the explanation. Getcoaltime() was written as an internal function to calculate coalescent probabilities, not for the purpose of outputting the coalescence times. So the output of getcoaltime() is not user-friendly (sorry about that). -- Liang On 7/8/13 2:17 PM, dga...@huskers.unl.edu dga...@huskers.unl.edu wrote: Hi Melisa, I agree with Brian, so far I have used phybase to do this. I haven't tested it completely but I believe you can use the getcoaltime() command from phybase to get at this information. You can try this code (most of which is from the phybase manual): install.packages('/home/dan/Downloads/phybase_1.3.tar.gz',repos=NULL,type= 'source') library(phybase) set.seed() tree-(((H:0.00402#0.01,C:0.00402#0.01):0.00304#0.01,G:0.00707#0.01):0.00 929#0.01,O:0.01635#0.01)#0.01; nodematrix-read.tree.nodes(tree)$nodes rootnode-7 spname-species.name(tree) ##define the vector seq as [2,2,2,2] which means that there are 2 sequences in each species seq-rep(2,4) str-sim.coaltree.sp(rootnode,nodematrix,4,seq,name=spname)$gt gtnod-read.tree.nodes(str)$nodes getcoaltime(gtnod,nodematrix,8,4,spstructure(rep(2,4))) #what i got looks like: [,1] [,2] [1,]1 0.000870 [2,]5 0.001126 [3,]5 0.002557 [4,]6 0.000580 [5,]7 0.000193 [6,]2 0.001416 [7,]4 0.002865 plottree(str) #should show deep coalescences of Hs1 and Hs2 Given the description in the phybase manual you can see that the first coalescence is 0.000870 in species tree node 1, then Hs2 and Hs1 both coalesce to the (Cs1,Cs2) and ((Cs1,Cs2),Hs2)) respectively in species tree branch 5 etc... One simple way of looking for deepcoals would be to simply find over or underrepresented branches in the getcoaltime output. For example, there was no coalescence in species tree branch three due to a deep coal event so looking at the getcoaltime matrix we see there is no 3 in the first column and two 5's. Obviously this will get much trickier if you have uneven taxa sampling across species but it would still be more than possible to make simple predictions based on your sampling Cheers, -Dan From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Brian O'Meara [bome...@utk.edu] Sent: Monday, July 08, 2013 11:30 AM To: Melisa Olave; Liang Liu Cc: mailman, r-sig-phylo Subject: Re: [R-sig-phylo] Count Deep Coal in R? I think the package phybase has functions for looking at gene trees in species trees that could be modified for this. The package has been purged from CRAN (well, archived), but you can still install from source. I'm CCing the package author, Liang Liu, to see if he has any ideas. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Jul 8, 2013 at 7:27 AM, Melisa Olave melizz...@hotmail.com wrote: just wondering if you found the way to count deep coal in R?? I'm trying to do the same... with no success! thank you! Melisa ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/