Re: [R-sig-phylo] estimate ancestral state with OUwie models
It's a shame there aren't awards for great threads, because this is one! The minor twist I would throw in is that it's difficult to make universal generalizations about the quality of ancestral state estimation. If one is interested in the ancestral state value at node N, it might be reasonably estimated if it is nested high up within the phylogeny, if the rates of change aren't high, etc. And (local) trends etc might well be reliably inferred. We are pretty confident that the common ancestor of humans and chimps was larger than many deeper primate ancestors, for instance. If N is the root of your available phylogeny, however, you have to be much more cautious. Cheers, Nick On Wed, Jun 13, 2018 at 6:57 AM, Joe Felsenstein wrote: > Let me add more warnings to Marguerite and Thomas's excellent > responses. People may be tempted to infer ancestral states and then > treat those inferences as data (and also to infer ancestral > environments and then treat those inferences as data). In fact, I > wonder whether that is not the main use people make of these > inferences. > > But not only are those inferences very noisy, they are correlated with > each other. So if you infer the ancestral state for the clade (Old > World Monkeys, Apes) and also the ancestral state for the clade (New > World Monkeys, (Old World Monkeys, Apes)) the two will typically not > only be error-prone, but will also typically be subject to strongly > correlated errors. Using them as data for further inferences is very > dubious. It is better to figure out what your hypothesis is and then > test it on the data from the tips of the tree, without the > intermediate step of taking ancestral state inferences as > observations. > > The popular science press in particular demands a fly-on-the-wall > account of what happened in evolution, and giving them the ancestral > state inferences as if they were known precisely is a mistake. > > Joe > > Joe Felsenstein j...@gs.washington.edu > Department of Genome Sciences and Department of Biology, > University of Washington, Box 355065, Seattle, WA 98195-5065 USA > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] dist.nodes crashing with big trees
Hi! I re-did chainsaw at some point, now there is chainsaw2. However, googling that gets you horror movies, so here is a link with example code: https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ (the discussion there points out a rare case where this crashes, but for most trees it should work fine) Cheers, Nick On Fri, Oct 16, 2015 at 2:17 PM, David Bapst <dwba...@gmail.com> wrote: > Hi Gustavo, > > I'm paleotree's author and maintainer. Just to be clear that I > understand your problem, I believe you are saying that when you use > timeSliceTree, you are getting an error that the internal call to > dist.nodes is failing? Is that right? > > The first thought I have is that maybe the solution here is to avoid > dist.nodes, as it is somewhat overkill. I use dist.nodes in that code, > which I wrote in 2011, to get the distance of tips and nodes from the > root. A better solution may now exist in another R package. I'd have > to investigate (although maybe someone on the list can suggest one). > > The second thought I have is that there might be alternative functions > that do something lie timeSliceTree in another R package. Off the top > of my head, I recall that Nick Matzke had a similar, 'chainsaw' > function, which you can find here and appears not to call dist.nodes: > > https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html > > Again, maybe someone on the list knows of a good alternative function. > > I'll try to give this more thought, but for now, maybe see if you can > use Nick's function succesfully. Overall though, I've discovered the > use of truly gigantic trees can often run into unexpected problems. > > Cheers, > -Dave > > > > On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira > <ariete...@gmail.com> wrote: > > Dear list, > > > > I'm trying to perform a time travel in simulated phylogenies with both > > extant and extinct species using the timeSliceTree function form the > > paleotree package. My aim is to have the molecular phylogenies derived > from > > the complete phylogeny (attached) in different points in time. > > > > However, when I try that with big trees (bigger than 2 tips total), I > > get an error of integer overflow coming from the dist.nodes function. > After > > slightly tweaking the dist.nodes function (changing nm from integer to > > numeric/double), I get the following message: > > > > Error in dist.nodes(tree) (from #7) : > > long vectors (argument 7) are not supported in .Fortran > > > > Since I don't know much about C or Fortran, I couldn't find a way of > solving > > this by myself, so any help will be greatly appreciated. > > > > I'm sending one tree attached for example. > > > > Thank you very much in advance! > > > > Best, > > > > Gustavo Burin Ferreira, Msc. > > Instituto de Biociências > > Universidade de São Paulo > > Tel: (11) 98525-8948 > > > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > > -- > David W. Bapst, PhD > Adjunct Asst. Professor, Geology and Geol. Eng. > South Dakota School of Mines and Technology > 501 E. St. Joseph > Rapid City, SD 57701 > > http://webpages.sdsmt.edu/~dbapst/ > http://cran.r-project.org/web/packages/paleotree/index.html > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] issue in ape's rphylo (and workaround)
Hi all, I have been hitting intermittent problems using trees generated by ape::rphylo. Here is a reproducible example. library(ape) sessionInfo() # Simulate a tree with e.g. 5 species nspecies = 5 set.seed(123465) tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50, fossils=TRUE, eps=1e-06) # Ladderize tree with fossils ltr = ladderize(tr_wFossils, right=FALSE) ltr$edge round(ltr$edge.length, 4) write.tree(phy=ltr, file="") tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") ) tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label)) plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils") Output: > library(ape) > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.5 (Yosemite) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ape_3.3 loaded via a namespace (and not attached): [1] nlme_3.1-121grid_3.2.2 lattice_0.20-33 > > # Simulate a tree with e.g. 5 species > nspecies = 5 > set.seed(123465) > tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50, fossils=TRUE, eps=1e-06) > > # Ladderize tree with fossils > ltr = ladderize(tr_wFossils, right=FALSE) > ltr$edge [,1] [,2] [1,]7 11 [2,] 112 [3,]85 [4,]89 [5,]96 [6,]9 10 [7,] 104 [8,] 103 > round(ltr$edge.length, 4) [1] 2.6862 0.0670 1.2078 0.1028 0.4985 0.0915 1.0135 1.0135 > write.tree(phy=ltr, file="") [1] "((t2:0.06701971429):2.686176768);" > > tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") ) Warning message: In FUN(X[[i]], ...) : NAs introduced by coercion > tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label)) > plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils") NULL Warning message: In plot.phylo(tr_wFossils) : found less than 2 tips in the tree Error in get("last_plot.phylo", envir = .PlotPhyloEnv) : object 'last_plot.phylo' not found It looks like a problem with the edge matrix generated by rphylo - perhaps older simulation code. The workaround is Yea Olde Writeth To And Readeth >From Newick trick: ### # Simulate a tree with e.g. 5 species nspecies = 5 set.seed(123465) tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50, fossils=TRUE, eps=1e-06) tr_wFossils = read.tree(file="", text=write.tree(phy=tr_wFossils, file="") ) # Ladderize tree with fossils ltr = ladderize(tr_wFossils, right=FALSE) ltr$edge round(ltr$edge.length, 4) write.tree(phy=ltr, file="") tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") ) tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label)) plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils") ### Now there are no issues: ### > # Simulate a tree with e.g. 5 species > nspecies = 5 > set.seed(123465) > tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50, fossils=TRUE, eps=1e-06) > tr_wFossils = read.tree(file="", text=write.tree(phy=tr_wFossils, file="") ) > # Ladderize tree with fossils > ltr = ladderize(tr_wFossils, right=FALSE) > ltr$edge [,1] [,2] [1,]78 [2,]81 [3,]82 [4,]79 [5,]96 [6,]9 10 [7,] 103 [8,] 10 11 [9,] 114 [10,] 115 > round(ltr$edge.length, 4) [1] 2.6862 0.0670 0.0670 1.5454 1.2078 0.1028 0.4985 0.0915 1.0135 1.0135 > write.tree(phy=ltr, file="") [1] "((t2:0.06701971429,t1:0.06701971429):2.686176768,(t5:1.207768321,(t6:0.4984853178,(t4:1.013489985,t3:1.013489985):0.09146602338):0.1028123128):1.545428161);" > > tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") ) > tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label)) > plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils") ### Cheers, Nick [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Plot phylogeny with increasing or decreasing node ages?
Hi all, In FigTree, there is an option (Left menu-Trees-Order nodes-increasing, or decreasing) to plot trees and order them such that the higher nodes/tips in the tree plot at the top or bottom of the plot. Does anything like this exist for plotting trees in R? Or do I have to hunt-n-peck and manually rotate nodes 1-by-1 until it looks right? (yes, I could save the tree out from FigTree and use that, but I want to automate this plotting function without a FigTree step) Cheers! Nick [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Plot phylogeny with increasing or decreasing node ages?
Yep, that's it exactly. Thanks guys! Nick On Sun, Apr 26, 2015 at 6:42 PM, Liam J. Revell liam.rev...@umb.edu wrote: Hi Nick. I'm not sure if this is what you mean, but have you tried the function ladderize from the ape package? All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 4/26/2015 6:37 PM, Nick Matzke wrote: Hi all, In FigTree, there is an option (Left menu-Trees-Order nodes-increasing, or decreasing) to plot trees and order them such that the higher nodes/tips in the tree plot at the top or bottom of the plot. Does anything like this exist for plotting trees in R? Or do I have to hunt-n-peck and manually rotate nodes 1-by-1 until it looks right? (yes, I could save the tree out from FigTree and use that, but I want to automate this plotting function without a FigTree step) Cheers! Nick [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] read.dna warnings and pitfalls
I have written several custom mutations of various data-reading functions to get around some of the common limitations and to read e.g. ambiguous characters in morphology datasets. But wouldn't the best solution in the long run be to implement the equivalent of readseq and/or the Nexus Class Library? I'm not volunteering ;-) Cheers, Nick On Mon, Apr 30, 2012 at 7:19 PM, Emmanuel Paradis emmanuel.para...@ird.fr wrote: Hi Dan, The original motivation behind read.dna() was to allow users to read their DNA alignments stored in the Phylip formats -- support for the Clustal format came later. You may be right that this is not so frequent. I presume a commoner workflow is to first use read.GenBank, then write the sequences for analyses with other software. In the situation you describe, that'd imply using write.dna. In this function, the rule is (say L is the length of the longest taxon name): if L 11 then the 1st nucleotide is written at the 11th column in the output file, otherwise at the (L+2)th column with a space at the (L+1)th one. In the short term, I can change this in both read.dna and write.dna and impose a space (or a tabulation) between the longest taxon name and the first nucleotide. This would imply, of course, that taxa names cannot have spaces. What do others think? In the long term, I think we may discuss deprecating the sequential and interleaved formats for the reasons I listed below. For instance, Clustal can output its alignments in FASTA, Muscle outputs by default in this format. This is an open discussion. Best, Emmanuel Dan Rabosky wrote on 26/04/2012 22:01: Hi Emmanuel- Thanks for fixing the whitespace issue. I think this fix will be useful to many users. On the issue of recognizing 10 IUPAC characters: I think this is a real problem, and may come up again in short order. Maybe it is just that use of this function has been limited? In the single dataset with a modest number of sequences that caused me problems yesterday, I had the following species and/or genus names - all of which constitute 10 character strings drawn from the set of IUPAC codes: brachyurus (x 2) savannarum graduacauda caudacutus Camarhynchus (x 3) madagascariensis I don't suggest deprecating the phylip sequential, but rather, using something that is compatible with raxml (surely one of the most widely used phylogenetics programs today). I think raxml uses a relaxed sequential version of the phylip format with whitespace delimitation. I could read the same alignment in raxml with no problems, but I had multiple issues when reading the same file with read.dna (including the whitespace character on the first line). My guess is that very few people are using the original phylip format, with its limit of 10 characters per taxon name, and with dna seqs beginning immediately after this. So maybe deprecate sequential phylip, but you could use what Stamatakis calls relaxed sequential PHYLIP, which appears to be: (1) taxon names cannot include spaces but can be up to 100 characters; and (2) names separated from sequences by whitespace character (ideally, this should recognize any number of spaces or tabs to prevent user confusion). For users with tab-delimited raxml files (eg each taxon name separated from its dna sequence by a tab), you can use a regular-expressions enabled text editor (like textwrangler) to quickly find potential problems. Just search for [ACGTUMRWSYKVHDBN]{10}.+\t with grep matching enabled. Cheers, ~Dan On Apr 26, 2012, at 2:16 AM, Emmanuel Paradis wrote: Hi Dan, The reason for this implementation (searching the first 10 IUPAC-coded bases) is because the exact formatting is not inconsistent among different programs. Some files have: 0123456789acgt. that is a 10-character name and the sequence starting on the 11th position. I think this is typical for Phylip. Other software (e.g., PhyML) accepts longer taxa names and require a space before the start of the sequence. About your example: it depends on the order of the data. The following file can be read: 2 10 x AA madagascarAA But if you invert the two sequence lines, it fails. It is the first time I hear about this problem in 9 years, maybe because it requires a particular combination of circumstances. Another drawback of this implementation is that files with less than 10 bases cannot be read. How to solve this? If it were left only to me, I would deprecate the interleaved and sequential formats. FASTA is more flexible, more widespread, easier to parse, can store exactly the same information, and labels are only constrained to be on a single line (but can contain any characters including \n, \t, ...) But I guess many programs use the Phylip formats, so I'd be glad to read other suggestions. As for your 2nd problem, it is now fixed in ape. Best, Emmanuel -Original Message- From: Dan
Re: [R-sig-phylo] Show Informative Sites?
All sites are informative under likelihood, but I assume you mean parsimony-informative, in which case all you have to do is count which sites are either (a) uniform or (b) uniform except for differences found only in a single species. Probably easiest if you convert the read.nexus.data output to a dataframe with as.data.frame, then use unique and == to count the number of states in each site... On 9/17/11 11:21 AM, Jimmy O'Donnell wrote: Hi all, A simple question to which I can't seem to find an answer: is there a way to show only the informative sites of a DNA sequence dataset in R? Thanks, Jimmy -- Jimmy O'Donnell PhD Candidate Ecology and Evolutionary Biology University of California Santa Cruz, CA 95060 jodonn...@biology.ucsc.edu (407)744-3377 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] post-order tree traversal
Thanks...here's the code. However, it looks like the output I am processing does not actually come with post-order labeling. I.e., in the 2nd plot below: node 1 should be labeled 1 node 10 should be labeled 2 node 2 should be labeled 3 node 14 should be labeled 4 node 3 should be labeled 5 ...etc... What is the name of this sort of ordering? Or are there 2 sorts of post-ordering? Cheers, Nick trstr = 1:61.9979,2:61.9979):10.0994,3:72.0973):30.5957,(4:97.17,5:97.17):5.52294):165.913,(6:45.8837,7:42.7899):90.6427,8:11.748):9.81895,((9:71.9983,10:30.1075):59.5189,(((11:48.3447,12:48.3447):77.8739,(13:86.8531,(14:53.665,15:53.665):33.1881):39.3656):26.0816,((16:94.7537,17:94.7537):44.1509,(18:82.3045,(19:72.9737,20:72.9737):9.33074):56.6002):13.3956):27.9983):10.886):22.824,21:214.009):34.3554,(22:186.877,23:186.877):61.4866):20.2417); tr = read.tree(file=, text=trstr) plot(tr) nodelabels() tr2 = reorder(tr, order=pruningwise) plot(tr2) nodelabels() internal_nodenums_in_postorder = unique(tr2$edge[,1]) internal_nodenums_in_postorder = internal_nodenums_in_postorder - length(tr2$tip.label) tr3 = tr2 tr3$node.label[internal_nodenums_in_postorder] = 1:tr3$Nnode plot(tr3) nodelabels(tr3$node.label) On 9/14/11 6:36 PM, Liam J. Revell wrote: Hi Nick. If you use the function reorder.phylo(...,order=pruningwise) the resultant tree edge matrix is suitable for post-order traversal (i.e., the daughters always precede the parents in top-to-bottom matrix traversal). This might be helpful for what you would like to do - for instance if you just label the nodes as they are first encountered in the matrix this would be in the order of a post-order traversal of the tree. All the best, Liam -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] LTT plot non-ultrametric trees
Somewhere I wrote a function that samples at a series of user-set timepoints and counts the # of lineages crossing each timepoint -- this is pretty flexible, allows for increases/decreases in diversity etc., let me know if the other options aren't working for you I can dig it up... On 8/10/11 9:57 PM, Emmanuel Paradis wrote: Hi Liam Rob, You may try using dist.nodes() for this: x - dist.nodes(phy) n - Ntip(phy) ROOT - n + 1 x[ROOT, ] The last command returns the distance from the root to all nodes and tips which are ordered in the usual way. So you may create a vector with +1 for the nodes, -1 for the tips, and drop the tips not extinct. Then you sort on the values of x, and plot cumsum() of the vector of +/-1 against them. I can add this to ape if there's a general interest for it. Best, Emmanuel Liam J. Revell wrote on 11/08/2011 11:05: Hi Rob. I can reproduce your error, but I haven't figured out the problem yet. You can try an earlier version of this function, which seems to work: source(http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ltt/v0.3/ltt.R;) p2 - ltt(t1, log.lineages=FALSE, drop.extinct=FALSE) Sorry about this. Also: max(p1$times)==max(p2$times) can be FALSE because if drop.extinct is set to TRUE, then the crown age of the pruned tree can be smaller than in the full tree if some lineages arising at the root of the tree do not leave any extant descendants. - Liam -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] cut a tree in a given time interval
On 7/14/11 2:45 AM, ppi...@uniroma3.it wrote: Thankyou NIck, now...I have an error when running it: Oops. Apologies, this stuff is in-house code, I haven't organized it. Add these to the text file... === get_daughters - function(nodenum, t) { daughter_edgenums = findall(nodenum, t$edge[,1]) daughter_nodenums = t$edge[,2][daughter_edgenums] return(daughter_nodenums) } # Get indices of all matches to a list findall - function(what, inlist) { TFmatches = inlist == what indices = 1:length(inlist) matching_indices = indices[TFmatches] return(matching_indices) } === object get_daughters not found It does not seem I miss something some help? thanks again best paolo Oops, left that out. Here it is as text and an attached file. You will have to do library(ape) and maybe library(phangorn) ...ahead of time (hopefully not others...) chainsaw- function(tr, timepoint=10) { # Take a tree and saw it off evenly across a certain timepoint. # This removes any tips above the timepoint, and replaces them # with a single tip representing the lineage crossing # the timepoint (with a new tip name). # Get the tree in a table tr_table = prt(tr, printflag=FALSE) # Find the tips that are less than 10 my old and drop them TF_exists_more_recently_than_10mya = tr_table$time_bp timepoint # Get the corresponding labels labels_for_tips_existing_more_recently_than_10mya = tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ] ### # Draft chainsaw function ### # loop through the branches that cross 10 mya # get a list of the edge start/stops in the phylogeny's edges edge_times_bp = get_edge_times_before_present(tr) # which of these branches cross 10 mya? edges_start_earlier_than_10mya = edge_times_bp[, 1] timepoint edges_end_later_than_10mya = edge_times_bp[, 2] = timepoint edges_to_chainsaw = edges_start_earlier_than_10mya + edges_end_later_than_10mya == 2 # then, for each of these edges, figure out how many tips exist descending from it nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw] nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw length(tr$tip.label)] # create a copy of the tree to chainsaw tree_to_chainsaw = tr for (i in 1:length(nodes_to_chainsaw)) { tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i]) #print(tmp_subtree$tip.label) tmp_number_of_tips = length(tmp_subtree$tip.label) #print(tmp_number_of_tips) # number of tips to drop = (numtips -1) numtips_to_drop = tmp_number_of_tips - 1 # tips_to_drop tmp_labels = tmp_subtree$tip.label labels_to_drop = tmp_labels[1:numtips_to_drop] # new label label_kept_num = length(tmp_labels) label_kept = tmp_labels[label_kept_num] new_label = paste(CA_, label_kept, +, numtips_to_drop, _tips, sep=) tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == label_kept] = new_label # chop off e.g. 2 of the 3 tips tree_to_chainsaw = drop.tip(tree_to_chainsaw, labels_to_drop) } #plot(tree_to_chainsaw) #axisPhylo() tree_to_chainsaw_table = prt(tree_to_chainsaw, printflag=FALSE) tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = tree_to_chainsaw_table$time_bp timepoint tmp_edge_lengths = tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] times_bp_for_edges_to_chainsaw = tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] adjustment = times_bp_for_edges_to_chainsaw - timepoint revised_tmp_edge_lengths = tmp_edge_lengths + adjustment tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] = revised_tmp_edge_lengths # revised ordered_nodenames = get_nodenums(tree_to_chainsaw) parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, tree_to_chainsaw$edge[,2]) NA_false = is.not.na(tree_to_chainsaw_table$edge.length) tree_to_chainsaw$edge.length[parent_branches[NA_false]] = tree_to_chainsaw_table$edge.length[NA_false] return(tree_to_chainsaw) } # print tree in hierarchical format prt- function(t, printflag=TRUE, relabel_nodes = FALSE, time_bp_digits=7) { # assemble beginning table # check if internal node labels exist if (node.label %in% attributes(t)$names == FALSE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } # or manually relabel
Re: [R-sig-phylo] cut a tree in a given time interval
Oops, left that out. Here it is as text and an attached file. You will have to do library(ape) and maybe library(phangorn) ...ahead of time (hopefully not others...) chainsaw - function(tr, timepoint=10) { # Take a tree and saw it off evenly across a certain timepoint. # This removes any tips above the timepoint, and replaces them # with a single tip representing the lineage crossing # the timepoint (with a new tip name). # Get the tree in a table tr_table = prt(tr, printflag=FALSE) # Find the tips that are less than 10 my old and drop them TF_exists_more_recently_than_10mya = tr_table$time_bp timepoint # Get the corresponding labels labels_for_tips_existing_more_recently_than_10mya = tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ] ### # Draft chainsaw function ### # loop through the branches that cross 10 mya # get a list of the edge start/stops in the phylogeny's edges edge_times_bp = get_edge_times_before_present(tr) # which of these branches cross 10 mya? edges_start_earlier_than_10mya = edge_times_bp[, 1] timepoint edges_end_later_than_10mya = edge_times_bp[, 2] = timepoint edges_to_chainsaw = edges_start_earlier_than_10mya + edges_end_later_than_10mya == 2 # then, for each of these edges, figure out how many tips exist descending from it nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw] nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw length(tr$tip.label)] # create a copy of the tree to chainsaw tree_to_chainsaw = tr for (i in 1:length(nodes_to_chainsaw)) { tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i]) #print(tmp_subtree$tip.label) tmp_number_of_tips = length(tmp_subtree$tip.label) #print(tmp_number_of_tips) # number of tips to drop = (numtips -1) numtips_to_drop = tmp_number_of_tips - 1 # tips_to_drop tmp_labels = tmp_subtree$tip.label labels_to_drop = tmp_labels[1:numtips_to_drop] # new label label_kept_num = length(tmp_labels) label_kept = tmp_labels[label_kept_num] new_label = paste(CA_, label_kept, +, numtips_to_drop, _tips, sep=) tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == label_kept] = new_label # chop off e.g. 2 of the 3 tips tree_to_chainsaw = drop.tip(tree_to_chainsaw, labels_to_drop) } #plot(tree_to_chainsaw) #axisPhylo() tree_to_chainsaw_table = prt(tree_to_chainsaw, printflag=FALSE) tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = tree_to_chainsaw_table$time_bp timepoint tmp_edge_lengths = tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] times_bp_for_edges_to_chainsaw = tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] adjustment = times_bp_for_edges_to_chainsaw - timepoint revised_tmp_edge_lengths = tmp_edge_lengths + adjustment tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] = revised_tmp_edge_lengths # revised ordered_nodenames = get_nodenums(tree_to_chainsaw) parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, tree_to_chainsaw$edge[,2]) NA_false = is.not.na(tree_to_chainsaw_table$edge.length) tree_to_chainsaw$edge.length[parent_branches[NA_false]] = tree_to_chainsaw_table$edge.length[NA_false] return(tree_to_chainsaw) } # print tree in hierarchical format prt - function(t, printflag=TRUE, relabel_nodes = FALSE, time_bp_digits=7) { # assemble beginning table # check if internal node labels exist if (node.label %in% attributes(t)$names == FALSE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } # or manually relabel the internal nodes, if desired if (relabel_nodes == TRUE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } labels = c(t$tip.label, t$node.label) ordered_nodenames = get_nodenums(t) #nodenums = 1:length(labels) node.types1 = rep(tip, length(t$tip.label)) node.types2 = rep(internal, length(t$node.label)) node.types2[1] = root node.types = c(node.types1, node.types2) # These are the index numbers of the edges below each node parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, t$edge[,2]) #parent_edges = parent_branches brlen_to_parent = t$edge.length[parent_branches] parent_nodes = t$edge[,1][parent_branches] daughter_nodes = lapply(ordered_nodenames,
Re: [R-sig-phylo] cut a tree in a given time interval
I wrote a function called chainsaw to do something like this, it saws off all the branches at a particular time point, then you just have to drop.tip on branch tips older than your time period. Would that be helpful? On 7/12/11 3:19 PM, ppi...@uniroma3.it wrote: Hi all, someone knows a smart coding to cut a tree in order to retain only taxa present in a given time interval bin? Thanks in advance for any suggestion best paolo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] cut a tree in a given time interval
Here's chainsaw(). It also requires sourcing a few other functions. Please cite me if you use it :-). = chainsaw - function(tr, timepoint=10) { # Take a tree and saw it off evenly across a certain timepoint. # This removes any tips above the timepoint, and replaces them # with a single tip representing the lineage crossing # the timepoint (with a new tip name). # Get the tree in a table tr_table = prt(tr, printflag=FALSE) # Find the tips that are less than 10 my old and drop them TF_exists_more_recently_than_10mya = tr_table$time_bp timepoint # Get the corresponding labels labels_for_tips_existing_more_recently_than_10mya = tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ] ### # Draft chainsaw function ### # loop through the branches that cross 10 mya # get a list of the edge start/stops in the phylogeny's edges edge_times_bp = get_edge_times_before_present(tr) # which of these branches cross 10 mya? edges_start_earlier_than_10mya = edge_times_bp[, 1] timepoint edges_end_later_than_10mya = edge_times_bp[, 2] = timepoint edges_to_chainsaw = edges_start_earlier_than_10mya + edges_end_later_than_10mya == 2 # then, for each of these edges, figure out how many tips exist descending from it nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw] nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw length(tr$tip.label)] # create a copy of the tree to chainsaw tree_to_chainsaw = tr for (i in 1:length(nodes_to_chainsaw)) { tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i]) #print(tmp_subtree$tip.label) tmp_number_of_tips = length(tmp_subtree$tip.label) #print(tmp_number_of_tips) # number of tips to drop = (numtips -1) numtips_to_drop = tmp_number_of_tips - 1 # tips_to_drop tmp_labels = tmp_subtree$tip.label labels_to_drop = tmp_labels[1:numtips_to_drop] # new label label_kept_num = length(tmp_labels) label_kept = tmp_labels[label_kept_num] new_label = paste(CA_, label_kept, +, numtips_to_drop, _tips, sep=) tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == label_kept] = new_label # chop off e.g. 2 of the 3 tips tree_to_chainsaw = drop.tip(tree_to_chainsaw, labels_to_drop) } #plot(tree_to_chainsaw) #axisPhylo() tree_to_chainsaw_table = prt(tree_to_chainsaw, printflag=FALSE) tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = tree_to_chainsaw_table$time_bp timepoint tmp_edge_lengths = tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] times_bp_for_edges_to_chainsaw = tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] adjustment = times_bp_for_edges_to_chainsaw - timepoint revised_tmp_edge_lengths = tmp_edge_lengths + adjustment tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] = revised_tmp_edge_lengths # revised ordered_nodenames = get_nodenums(tree_to_chainsaw) parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, tree_to_chainsaw$edge[,2]) NA_false = is.not.na(tree_to_chainsaw_table$edge.length) tree_to_chainsaw$edge.length[parent_branches[NA_false]] = tree_to_chainsaw_table$edge.length[NA_false] return(tree_to_chainsaw) } # print tree in hierarchical format prt - function(t, printflag=TRUE, relabel_nodes = FALSE, time_bp_digits=7) { # assemble beginning table # check if internal node labels exist if (node.label %in% attributes(t)$names == FALSE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } # or manually relabel the internal nodes, if desired if (relabel_nodes == TRUE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } labels = c(t$tip.label, t$node.label) ordered_nodenames =
Re: [R-sig-phylo] DNA sequence management for phylogenetics in R
After a fair amount of annoyment involving in shifting back and forth between BioPython and R, I also think it would be useful to have BioPython-like sequence management capabilities in R. It would even be good to be able to do some things like access NCBI genbank records and download them, remote BLAST, etc. My understanding is that the bioconductor package is supposed to have some of these capabilities, but (a) to get their genbank function to work I had to hack it myself to update the appropriate URL etc., which indicates that this part of bioconductor, at least, is not well-maintained ...and... (b) the bioconductor set of packages is massive, but most of it seems to be devoted to microarray analysis, which makes finding the sequence stuff a bit of a needle-in-a-haystack Has anyone else had experience/success with bioconductor for sequence phylogenetics purposes? Cheers, Nick On 3/17/09 12:06 PM, Christoph Heibl wrote: Hi Dan, Emmanuel, Brian, Rphyloers ... Now that Brian pointed towards the phyloch package, I think I have to add a little more information. First of all, although it goes perhaps into the direction of what Dan is looking for, this not a mature system and surely aimed to work on a smaller scale (and tailored towards my specific needs which include a strong spatial emphasis). But to let you be the judge - my approach is as follows: (1) All my own sequences are stored as ASCII files with their PCR number as unique identifier in a set of directories. (They could be stored in database, of course, but in my opinion the benefits of this don't outweigh the additional step of work, especially if you work actively on the electropherograms.) (2) Attribute data (taxonomy, marker, primers, collector, acc no., locality, coordinates, etc) is stored in a postgreSQL database. (3) Queries of the database generate vectors containing PCR numbers, which are used to select the corresponding sequences and bundle them into an alignment object (ape) with 'make.fasta' (phyloch). (4) If necessary, additional sequences from GenBank are retrieved with Emmanuel´s 'read.GenBank' function and fused to my sequences via 'c.alignment' (phyloch). (5) I assemble partitions separately by calling MAFFT with 'mafft' (phyloch) and then fuse them with 'c.genes' (phylo). Thereby I can create alignments where missing sequences are filled with Ns or choose to delete all those sequences which are not represented in all of the partitions. (6) 'c.genes' matches sequences via their name. That means before I concatenate partitions, I have to set appropriate taxon names. Once again this is done with the postgreSQL database using the function 'tax.labels' (phyloch). This allows me to concatenate alignments with different degrees of specificity. Example: If I want to create an interspecific sampling covering geographic range of species, I can choose taxonnames AND locality as sequence names in order to get an alignment where more than one accession of each/some species is represented and only those conspecifics stemming from the same sites will be concatenated. I admit that this is a very crude patchwork of functions, but up to a certain dimension it serves its purpose. If think in your endeavor, Dan, SQL is your friend, but the main task will be: How to automate the extraction of the sequences' attributes from varying sources. For Genbank this could be done by more sophisticated version of 'read.GenBank'. Some time ago I tried to build a function 'search.GenBank', but was not successful. Perhaps Emmanuel could help here. His class 'DNAbin' might also prove important if you plan to handle real big datasets, as he just pointed out. In this case, it would be desirable to extend the binary format to unaligned sequences to speed up data assembly prior to alignment. Best wishes, Christoph PS: Parts of phyloch are poorly documented. Anyone interested, please do not hesitate to ask. On Mar 17, 2009, at 5:46 PM, Brian O'Meara wrote: Christoph Heibl has some R code that calls mafft for alignment (which I currently like better than Clustal, btw) and others that can interact with a postgreSQL database for storing info [according to the software description -- I haven't tried this]. See http://www.christophheibl.de/Rpackages.html. Brian On Mar 17, 2009, at 12:09 PM, Emmanuel Paradis wrote: Dan, It seems that the way DNA sequences are coded in ape with the class DNAbin meets some of the criteria you list below. Sequences are stored in vectors, lists of vectors, or matrices. The usual methods for extracting and subsetting ([, [[, $) have been written for this class. There are also methods for rbind and cbind. I have modified them recently so that super-matrices can be built eventually filling some columns/rows with gaps. There is no way to do sequence alignment directly into R at the moment, but Clustal can be called with the system() function and read.dna() can read clustal alignment files, so this
Re: [R-sig-phylo] Generating a tree based on a distance matrix?
The APE command NJ (neighbor-joining) will form a tree from a distance matrix, so that's one option. You could do it and then see if you get the same topology from NJ as from your topology tree. The branch lengths will reflect whatever distances were calculated from the data (which might be one of several corrected or uncorrected distances, depending on the input sequence/character data). Cheers, Nick On 5/12/11 2:18 PM, mgavil2 wrote: All, I have a tree topology (tree_name.tre), and a distance matrix, based on that tree topology. However I cant not seem to find the nexus file from which the matrix was generated. Is there a way to use that distance matrix to incorporate branch lengths into my topology? I have looked into all the threads of questions posted in the list, but still can not find an answer. my final objective is just to generate a tree with branch lengths proportional to the distances on the matrix (reviewers requirement for a publication). Any suggestions would be greatly appreciated! Thanks. Maria Mercedes -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] weirdness in confidence intervals returned by ace, pic option
On Wed, Mar 23, 2011 at 10:24 PM, Emmanuel Paradis emmanuel.para...@ird.fr wrote: Hi Nick, With method = pic, the CIs are computed using the expected variances under the model, so they depend only on the tree. I've added a paragraph in the man page to explain this. Cheers, Emmanuel Obviously, though, if one set of tip data ranges from 100-1000, and another set of tip data ranges from 0.001-1, and both are mapped on the same tree, the variances confidence intervals will be different at the estimated internal nodes for each trait. But I was getting identical confidence intervals across ~20 traits of widely different magnitudes -- and then when I rescaled the trait data, the problem went away, and each trait had different-sized CIs like you would expect. It was as if there some bizarre bug where there was a memory error or some such and the same CI values were getting copied from one trait to the next in some situations. It's probably pointless to discuss, though, unless someone else can replicate the problem I saw with the code I posted. If the problem disappears on other computers, then it's a local problem. R.app sometimes throws messages about memory errors and the like for no apparent reason (not specifically associated with this, though), so it might not even be an APE issue. Cheers! Nick Nick Matzke wrote on 22/03/2011 12:30: Hi all, This isn't crucial to my work at the moment since I am not using the PIC option of ace to do ancestral character estimation. But while trying it out I noticed a very weird result that I can't explain...basically when I run ace on my raw trait values, I get the same sized confidence interval (97.5% CI minus 2.5% confidence interval) for all of my (drastically different) traits. ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] weirdness in confidence intervals returned by ace, pic option
Ah -- I get it now. Even my rescaled trait data have the same width 95% CI, e.g. my CI for a particular node could be mean +/- 2.345 whether my input trait data ranges from 0.3-0.5, or 400-800. My fix with scaled data occurred because I was back-transforming, which scaled the size of the 95% CI with the size of the mean. So I guess neither of these options is a real estimate of the CI, unlike when one runs ace, method=ML. I noticed the same behavior using ace, method=gls, so that should be noted as well. I have been using method=ml for ancestral character estimation, the width of its CIs vary as you might expect, so I was just surprised when PIC GLS didn't exhibit the same behavior. Cheers, Nick On 3/24/11 12:21 AM, Nick Matzke wrote: On Wed, Mar 23, 2011 at 10:24 PM, Emmanuel Paradis emmanuel.para...@ird.fr wrote: Hi Nick, With method = pic, the CIs are computed using the expected variances under the model, so they depend only on the tree. I've added a paragraph in the man page to explain this. Cheers, Emmanuel ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1
Hi all, It seems to be a popular week for questions! I am running fitContinuous on a variety of continuous trait data. I am noticing that when the traits are in units where the max is less than 1 (these are not ratio data, though), many of the various models produce log-likelihoods that are positive, which ought to be impossible. If I rescale the trait values, e.g. by multiplying them by 100, the problem goes away and all log-likelihoods are negative, and come out somewhat similar to each other between the BM, OU, etc. models, as one would expect. Any hint about why this might be? Cheers, Nick -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1
Doh! Really should have remembered that, likelihoods-can-be-greater-than-1 is likelihood 101... I am still a little puzzled by the dramatically different results between rescaling and not, will try to post an example in a sec... On 3/7/11 12:37 PM, Nick Matzke wrote: Hi all, It seems to be a popular week for questions! I am running fitContinuous on a variety of continuous trait data. I am noticing that when the traits are in units where the max is less than 1 (these are not ratio data, though), many of the various models produce log-likelihoods that are positive, which ought to be impossible. If I rescale the trait values, e.g. by multiplying them by 100, the problem goes away and all log-likelihoods are negative, and come out somewhat similar to each other between the BM, OU, etc. models, as one would expect. Any hint about why this might be? Cheers, Nick -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1
Ah, so while re-creating my problem for copy-paste-debug goodness on the listserv, I discovered what was confusing me. Originally, when I ran the various models, I got these log-likelihoods for results: == tf2ic2kzkr turn BM -9.923018 -9.725328 -9.740485 -44.67030 OU 17.939356 61.492326 43.530827 -42.22548 lambda -9.923018 -9.725328 -9.740485 -42.26222 kappa -9.923018 -9.725328 -9.740485 -44.45598 delta 18.121709 61.731899 42.452448 -43.09885 EB 9.649302 11.810710 11.612804 -44.67030 white 17.224360 58.832699 43.138356 -42.26222 == The trait data is roughly normal, the phylogeny is pretty standard, so it was damned odd and rather hard to see why OU and white noise (!) would be dozens of log-likelihood units higher than Brownian motion, for instance. turn trait data were all above 1, but for tf2ic2, kz, and kr, they were below 1. So I was suspecting a bug or something. As it turns out, back when first experimenting with fitContinuous, I had input non-default bounds on the models, and then forgot about it. I think I probably ruled some of the optimal parameter values out-of-bounds for the trait data 1, and this gave pathological results. Going back to defaults, we get: tf2ic2 kz kr turn BM 17.46596 60.96182 40.75615 -44.67030 OU 17.93936 61.49233 43.53083 -42.22548 lambda 17.46596 60.96182 43.13836 -42.26235 kappa 17.46596 60.96182 41.43779 -44.45598 delta 18.12171 61.73190 42.45245 -43.09885 EB 17.46596 60.96182 40.75615 -44.67030 white 17.22436 58.83270 43.13836 -42.26222 ...which seems far more reasonable... Multiplying the trait data by 100 actually doesn't appear to change the subsequent log-likelihoods in this case, in the situation where the bounds are defaults rather than my screwed-up bounds... tf2ic2 kz kr turn BM 17.46596 60.96182 40.75615 -44.67030 OU 17.93936 61.49233 43.53083 -42.22548 lambda 17.46596 60.96182 43.13836 -42.26235 kappa 17.46596 60.96182 41.43779 -44.45598 delta 18.12171 61.73190 42.45245 -43.09885 EB 17.46596 60.96182 40.75615 -44.67030 white 17.22436 58.83270 43.13836 -42.26222 ...which is also encouraging. So, basically, thanks for the help and sorry for the trouble! Nick On 3/7/11 4:41 PM, Dan Rabosky wrote: Hi Nick- Are you are getting differences in relative AICs between models from simple rescaling (multiplying by a constant)? The actual values of the traits *might* matter for optimization, depending on various parameters associated with optimization (and whatever algorithm is being used - this should be L-BFGS-B for fitContinuous, I think). So...if relative AICs are different, the first thing I would check is whether some of your model AICs reflect local optima. Do lots of optimizations with random starting parameters, which is usually sufficient - and failing that, you can get into the guts of optim and mess with the actual arguments that control the optimization (e.g., parscale etc). FYI, this is not immediately transparent in Geiger, as many of the functions are hidden. To get at the optimization, look at geiger:::fitContinuousModel which does the heavy lifting within fitContinuous, and if you need to, you can recode the relevant call to optim to have a bit more flexibility. ~Dan On Mar 7, 2011, at 4:04 PM, Nick Matzke wrote: Doh! Really should have remembered that, likelihoods-can-be-greater-than-1 is likelihood 101... I am still a little puzzled by the dramatically different results between rescaling and not, will try to post an example in a sec... -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong
Re: [R-sig-phylo] Multiple sequence alignment in R?
You can call any command-line thing from R with system(). Typically I use R to write the control file (e.g. for r8s), then do something like... cmdstr = paste(program_name, -options, control_file, output.log, sep= ) system(cmdstr) Cheers! Nick On 2/22/11 5:42 AM, Scott Chamberlain wrote: Hello, Is clustal multiple sequence alignment implemented in any R packages, or is there an easy way to call ClustalW on your hard drive from R, perhaps with system() Sincerely, Scott Chamberlain Rice University, EEB Dept. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: mat...@berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 - [W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together. Isaac Asimov (1989). The Relativity of Wrong. The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo