Re: [R-sig-phylo] estimate ancestral state with OUwie models

2018-06-12 Thread Nick Matzke
It's a shame there aren't awards for great threads, because this is one!

The minor twist I would throw in is that it's difficult to make universal
generalizations about the quality of ancestral state estimation.  If one is
interested in the ancestral state value at node N, it might be reasonably
estimated if it is nested high up within the phylogeny, if the rates of
change aren't high, etc. And (local) trends etc might well be reliably
inferred.  We are pretty confident that the common ancestor of humans and
chimps was larger than many deeper primate ancestors, for instance. If N is
the root of your available phylogeny, however, you have to be much more
cautious.

Cheers,
Nick


On Wed, Jun 13, 2018 at 6:57 AM, Joe Felsenstein 
wrote:

> Let me add more warnings to Marguerite and Thomas's excellent
> responses.   People may be tempted to infer ancestral states and then
> treat those inferences as data (and also to infer ancestral
> environments and then treat those inferences as data).  In fact, I
> wonder whether that is not the main use people make of these
> inferences.
>
> But not only are those inferences very noisy, they are correlated with
> each other.  So if you infer the ancestral state for the clade (Old
> World Monkeys, Apes) and also the ancestral state for the clade (New
> World Monkeys, (Old World Monkeys, Apes)) the two will typically not
> only be error-prone, but will also typically be subject to strongly
> correlated errors.  Using them as data for further inferences is very
> dubious.  It is better to figure out what your hypothesis is and then
> test it on the data from the tips of the tree, without the
> intermediate step of taking ancestral state inferences as
> observations.
>
> The popular science press in particular demands a fly-on-the-wall
> account of what happened in evolution, and giving them the ancestral
> state inferences as if they were known precisely is a mistake.
>
> Joe
> 
> Joe Felsenstein j...@gs.washington.edu
>  Department of Genome Sciences and Department of Biology,
>  University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-
> sig-ph...@r-project.org/
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-16 Thread Nick Matzke
Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
googling that gets you horror movies, so here is a link with example code:

https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ

(the discussion there points out a rare case where this crashes, but for
most trees it should work fine)

Cheers, Nick

On Fri, Oct 16, 2015 at 2:17 PM, David Bapst <dwba...@gmail.com> wrote:

> Hi Gustavo,
>
> I'm paleotree's author and maintainer. Just to be clear that I
> understand your problem, I believe you are saying that when you use
> timeSliceTree, you are getting an error that the internal call to
> dist.nodes is failing? Is that right?
>
> The first thought I have is that maybe the solution here is to avoid
> dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
> which I wrote in 2011, to get the distance of tips and nodes from the
> root. A better solution may now exist in another R package. I'd have
> to investigate (although maybe someone on the list can suggest one).
>
> The second thought I have is that there might be alternative functions
> that do something lie timeSliceTree in another R package. Off the top
> of my head, I recall that Nick Matzke had a similar, 'chainsaw'
> function, which you can find here and appears not to call dist.nodes:
>
> https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html
>
> Again, maybe someone on the list knows of a good alternative function.
>
> I'll try to give this more thought, but for now, maybe see if you can
> use Nick's function succesfully. Overall though, I've discovered the
> use of truly gigantic trees can often run into unexpected problems.
>
> Cheers,
> -Dave
>
>
>
> On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira
> <ariete...@gmail.com> wrote:
> > Dear list,
> >
> > I'm trying to perform a time travel in simulated phylogenies with both
> > extant and extinct species using the timeSliceTree function form the
> > paleotree package. My aim is to have the molecular phylogenies derived
> from
> > the complete phylogeny (attached) in different points in time.
> >
> > However, when I try that with big trees (bigger than 2 tips total), I
> > get an error of integer overflow coming from the dist.nodes function.
> After
> > slightly tweaking the dist.nodes function (changing nm from integer to
> > numeric/double), I get the following message:
> >
> > Error in dist.nodes(tree) (from #7) :
> >   long vectors (argument 7) are not supported in .Fortran
> >
> > Since I don't know much about C or Fortran, I couldn't find a way of
> solving
> > this by myself, so any help will be greatly appreciated.
> >
> > I'm sending one tree attached for example.
> >
> > Thank you very much in advance!
> >
> > Best,
> >
> > Gustavo Burin Ferreira, Msc.
> > Instituto de Biociências
> > Universidade de São Paulo
> > Tel: (11) 98525-8948
> >
> > ___
> > R-sig-phylo mailing list - R-sig-phylo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
>
>
> --
> David W. Bapst, PhD
> Adjunct Asst. Professor, Geology and Geol. Eng.
> South Dakota School of Mines and Technology
> 501 E. St. Joseph
> Rapid City, SD 57701
>
> http://webpages.sdsmt.edu/~dbapst/
> http://cran.r-project.org/web/packages/paleotree/index.html
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] issue in ape's rphylo (and workaround)

2015-10-04 Thread Nick Matzke
Hi all,

I have been hitting intermittent problems using trees generated by
ape::rphylo. Here is a reproducible example.



library(ape)
sessionInfo()

# Simulate a tree with e.g. 5 species
nspecies = 5
set.seed(123465)
tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50,
fossils=TRUE, eps=1e-06)

# Ladderize tree with fossils
ltr = ladderize(tr_wFossils, right=FALSE)
ltr$edge
round(ltr$edge.length, 4)
write.tree(phy=ltr, file="")

tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") )
tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label))
plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils")


Output:

> library(ape)
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] ape_3.3

loaded via a namespace (and not attached):
[1] nlme_3.1-121grid_3.2.2  lattice_0.20-33
>
> # Simulate a tree with e.g. 5 species
> nspecies = 5
> set.seed(123465)
> tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50,
fossils=TRUE, eps=1e-06)
>
> # Ladderize tree with fossils
> ltr = ladderize(tr_wFossils, right=FALSE)
> ltr$edge
 [,1] [,2]
[1,]7   11
[2,]   112
[3,]85
[4,]89
[5,]96
[6,]9   10
[7,]   104
[8,]   103
> round(ltr$edge.length, 4)
[1] 2.6862 0.0670 1.2078 0.1028 0.4985 0.0915 1.0135 1.0135
> write.tree(phy=ltr, file="")
[1] "((t2:0.06701971429):2.686176768);"
>
> tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") )
Warning message:
In FUN(X[[i]], ...) : NAs introduced by coercion
> tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label))
> plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils")
NULL
Warning message:
In plot.phylo(tr_wFossils) : found less than 2 tips in the tree
Error in get("last_plot.phylo", envir = .PlotPhyloEnv) :
  object 'last_plot.phylo' not found




It looks like a problem with the edge matrix generated by rphylo - perhaps
older simulation code. The workaround is Yea Olde Writeth To And Readeth
>From Newick trick:

###
# Simulate a tree with e.g. 5 species
nspecies = 5
set.seed(123465)
tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50,
fossils=TRUE, eps=1e-06)
tr_wFossils = read.tree(file="", text=write.tree(phy=tr_wFossils, file="") )
# Ladderize tree with fossils
ltr = ladderize(tr_wFossils, right=FALSE)
ltr$edge
round(ltr$edge.length, 4)
write.tree(phy=ltr, file="")

tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") )
tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label))
plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils")
###



Now there are no issues:
###
> # Simulate a tree with e.g. 5 species
> nspecies = 5
> set.seed(123465)
> tr_wFossils = rphylo(n=nspecies, birth=0.3, death=0.25, T0=50,
fossils=TRUE, eps=1e-06)
> tr_wFossils = read.tree(file="", text=write.tree(phy=tr_wFossils,
file="") )
> # Ladderize tree with fossils
> ltr = ladderize(tr_wFossils, right=FALSE)
> ltr$edge
  [,1] [,2]
 [1,]78
 [2,]81
 [3,]82
 [4,]79
 [5,]96
 [6,]9   10
 [7,]   103
 [8,]   10   11
 [9,]   114
[10,]   115
> round(ltr$edge.length, 4)
 [1] 2.6862 0.0670 0.0670 1.5454 1.2078 0.1028 0.4985 0.0915 1.0135 1.0135
> write.tree(phy=ltr, file="")
[1]
"((t2:0.06701971429,t1:0.06701971429):2.686176768,(t5:1.207768321,(t6:0.4984853178,(t4:1.013489985,t3:1.013489985):0.09146602338):0.1028123128):1.545428161);"
>
> tr_wFossils = read.tree(file="", text=write.tree(phy=ltr, file="") )
> tr_wFossils$tip.label = paste0("sp", 1:length(tr_wFossils$tip.label))
> plot(tr_wFossils); axisPhylo(); title("Sim Tree w Fossils")
###

Cheers,
Nick

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Plot phylogeny with increasing or decreasing node ages?

2015-04-26 Thread Nick Matzke
Hi all,

In FigTree, there is an option (Left menu-Trees-Order nodes-increasing,
or decreasing) to plot trees and order them such that the higher nodes/tips
in the tree plot at the top or bottom of the plot.

Does anything like this exist for plotting trees in R?  Or do I have to
hunt-n-peck and manually rotate nodes 1-by-1 until it looks right?

(yes, I could save the tree out from FigTree and use that, but I want to
automate this plotting function without a FigTree step)

Cheers!
Nick

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Plot phylogeny with increasing or decreasing node ages?

2015-04-26 Thread Nick Matzke
Yep, that's it exactly. Thanks guys!
Nick


On Sun, Apr 26, 2015 at 6:42 PM, Liam J. Revell liam.rev...@umb.edu wrote:

 Hi Nick.

 I'm not sure if this is what you mean, but have you tried the function
 ladderize from the ape package?

 All the best, Liam

 Liam J. Revell, Assistant Professor of Biology
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 email: liam.rev...@umb.edu
 blog: http://blog.phytools.org


 On 4/26/2015 6:37 PM, Nick Matzke wrote:

 Hi all,

 In FigTree, there is an option (Left menu-Trees-Order nodes-increasing,
 or decreasing) to plot trees and order them such that the higher
 nodes/tips
 in the tree plot at the top or bottom of the plot.

 Does anything like this exist for plotting trees in R?  Or do I have to
 hunt-n-peck and manually rotate nodes 1-by-1 until it looks right?

 (yes, I could save the tree out from FigTree and use that, but I want to
 automate this plotting function without a FigTree step)

 Cheers!
 Nick

 [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/


 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] read.dna warnings and pitfalls

2012-05-01 Thread Nick Matzke
I have written several custom mutations of various data-reading
functions to get around some of the common limitations and to read
e.g. ambiguous characters in morphology datasets.

But wouldn't the best solution in the long run be to implement the
equivalent of readseq and/or the Nexus Class Library?   I'm not
volunteering ;-)

Cheers, Nick



On Mon, Apr 30, 2012 at 7:19 PM, Emmanuel Paradis
emmanuel.para...@ird.fr wrote:
 Hi Dan,

 The original motivation behind read.dna() was to allow users to read their
 DNA alignments stored in the Phylip formats -- support for the Clustal
 format came later. You may be right that this is not so frequent. I presume
 a commoner workflow is to first use read.GenBank, then write the sequences
 for analyses with other software. In the situation you describe, that'd
 imply using write.dna. In this function, the rule is (say L is the length of
 the longest taxon name): if L  11 then the 1st nucleotide is written at the
 11th column in the output file, otherwise at the (L+2)th column with a space
 at the (L+1)th one.

 In the short term, I can change this in both read.dna and write.dna and
 impose a space (or a tabulation) between the longest taxon name and the
 first nucleotide. This would imply, of course, that taxa names cannot have
 spaces. What do others think?

 In the long term, I think we may discuss deprecating the sequential and
 interleaved formats for the reasons I listed below. For instance, Clustal
 can output its alignments in FASTA, Muscle outputs by default in this
 format. This is an open discussion.

 Best,

 Emmanuel

 Dan Rabosky wrote on 26/04/2012 22:01:


 Hi Emmanuel-

 Thanks for fixing the whitespace issue. I think this fix will be useful to
 many users.

 On the issue of recognizing 10 IUPAC characters: I think this is a real
 problem, and may come up again in short order. Maybe it is just that use of
 this function has been limited? In the single dataset with a modest number
 of sequences that caused me problems yesterday, I had the following species
 and/or genus names - all of which constitute 10 character strings drawn from
 the set of IUPAC codes:

 brachyurus (x 2)
 savannarum
 graduacauda
 caudacutus
 Camarhynchus (x 3)
 madagascariensis

 I don't suggest deprecating the phylip sequential, but rather, using
 something that is compatible with raxml (surely one of the most widely used
 phylogenetics programs today). I think raxml uses a relaxed sequential
 version of the phylip format with whitespace delimitation. I could read the
 same alignment in raxml with no problems, but I had multiple issues when
 reading the same file with read.dna (including the whitespace character on
 the first line). My guess is that very few people are using the original
 phylip format, with its limit of 10 characters per taxon name, and with dna
 seqs beginning immediately after this. So maybe deprecate sequential
 phylip, but you could use what Stamatakis calls relaxed sequential
 PHYLIP, which appears to be: (1) taxon names cannot include spaces but can
 be up to 100 characters; and (2) names separated from sequences by
 whitespace character (ideally, this should recognize any number of spaces or
 tabs to prevent user confusion).

 For users with tab-delimited raxml files (eg each taxon name separated
 from its dna sequence by a tab), you can use a regular-expressions enabled
 text editor (like textwrangler) to quickly find potential problems. Just
 search for

 [ACGTUMRWSYKVHDBN]{10}.+\t

 with grep matching enabled.

 Cheers,
 ~Dan


 On Apr 26, 2012, at 2:16 AM, Emmanuel Paradis wrote:

 Hi Dan,

 The reason for this implementation (searching the first 10 IUPAC-coded
 bases) is because the exact formatting is not inconsistent among different
 programs. Some files have:

 0123456789acgt.

 that is a 10-character name and the sequence starting on the 11th
 position. I think this is typical for Phylip. Other software (e.g., PhyML)
 accepts longer taxa names and require a space before the start of the
 sequence.

 About your example: it depends on the order of the data. The following
 file can be read:

 2 10
 x     AA
 madagascarAA

 But if you invert the two sequence lines, it fails.

 It is the first time I hear about this problem in 9 years, maybe because
 it requires a particular combination of circumstances. Another drawback of
 this implementation is that files with less than 10 bases cannot be read.

 How to solve this? If it were left only to me, I would deprecate the
 interleaved and sequential formats. FASTA is more flexible, more widespread,
 easier to parse, can store exactly the same information, and labels are only
 constrained to be on a single line (but can contain any characters including
 \n, \t, ...) But I guess many programs use the Phylip formats, so I'd be
 glad to read other suggestions.

 As for your 2nd problem, it is now fixed in ape.

 Best,

 Emmanuel
 -Original Message-
 From: Dan 

Re: [R-sig-phylo] Show Informative Sites?

2011-09-17 Thread Nick Matzke
All sites are informative under likelihood, but I assume you 
mean parsimony-informative, in which case all you have to do 
is count which sites are either (a) uniform or (b) uniform 
except for differences found only in a single species.


Probably easiest if you convert the read.nexus.data output 
to a dataframe with as.data.frame, then use unique and == to 
count the number of states in each site...



On 9/17/11 11:21 AM, Jimmy O'Donnell wrote:

Hi all,

A simple question to which I can't seem to find an answer: is there a way to 
show only the informative sites of a DNA sequence dataset in R?

Thanks,

Jimmy
--
Jimmy O'Donnell
PhD Candidate
Ecology and Evolutionary Biology
University of California
Santa Cruz, CA 95060
jodonn...@biology.ucsc.edu
(407)744-3377

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] post-order tree traversal

2011-09-14 Thread Nick Matzke
Thanks...here's the code.  However, it looks like the output 
I am processing does not actually come with post-order 
labeling.


I.e., in the 2nd plot below:

node 1 should be labeled 1
node 10 should be labeled 2
node 2 should be labeled 3
node 14 should be labeled 4
node 3 should be labeled 5
...etc...

What is the name of this sort of ordering?  Or are there 2 
sorts of post-ordering?


Cheers,
Nick



trstr = 
1:61.9979,2:61.9979):10.0994,3:72.0973):30.5957,(4:97.17,5:97.17):5.52294):165.913,(6:45.8837,7:42.7899):90.6427,8:11.748):9.81895,((9:71.9983,10:30.1075):59.5189,(((11:48.3447,12:48.3447):77.8739,(13:86.8531,(14:53.665,15:53.665):33.1881):39.3656):26.0816,((16:94.7537,17:94.7537):44.1509,(18:82.3045,(19:72.9737,20:72.9737):9.33074):56.6002):13.3956):27.9983):10.886):22.824,21:214.009):34.3554,(22:186.877,23:186.877):61.4866):20.2417);


tr = read.tree(file=, text=trstr)
plot(tr)
nodelabels()


tr2 = reorder(tr, order=pruningwise)
plot(tr2)
nodelabels()

internal_nodenums_in_postorder = unique(tr2$edge[,1])
internal_nodenums_in_postorder = 
internal_nodenums_in_postorder - length(tr2$tip.label)


tr3 = tr2
tr3$node.label[internal_nodenums_in_postorder] = 1:tr3$Nnode
plot(tr3)
nodelabels(tr3$node.label)




On 9/14/11 6:36 PM, Liam J. Revell wrote:

Hi Nick.

If you use the function
reorder.phylo(...,order=pruningwise) the resultant tree
edge matrix is suitable for post-order traversal (i.e., the
daughters always precede the parents in top-to-bottom matrix
traversal). This might be helpful for what you would like to
do - for instance if you just label the nodes as they are
first encountered in the matrix this would be in the order
of a post-order traversal of the tree.

All the best, Liam



--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LTT plot non-ultrametric trees

2011-08-11 Thread Nick Matzke
Somewhere I wrote a function that samples at a series of 
user-set timepoints and counts the # of lineages crossing 
each timepoint -- this is pretty flexible, allows for 
increases/decreases in diversity etc., let me know if the 
other options aren't working for you  I can dig it up...



On 8/10/11 9:57 PM, Emmanuel Paradis wrote:

Hi Liam  Rob,

You may try using dist.nodes() for this:

x - dist.nodes(phy)
n - Ntip(phy)
ROOT - n + 1
x[ROOT, ]

The last command returns the distance from the root to all
nodes and tips which are ordered in the usual way. So you
may create a vector with +1 for the nodes, -1 for the tips,
and drop the tips not extinct. Then you sort on the values
of x, and plot cumsum() of the vector of +/-1 against them.

I can add this to ape if there's a general interest for it.

Best,

Emmanuel

Liam J. Revell wrote on 11/08/2011 11:05:

Hi Rob.

I can reproduce your error, but I haven't figured out the
problem yet.

You can try an earlier version of this function, which
seems to work:

source(http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ltt/v0.3/ltt.R;)

p2 - ltt(t1, log.lineages=FALSE, drop.extinct=FALSE)

Sorry about this.

Also:

max(p1$times)==max(p2$times)

can be FALSE because if drop.extinct is set to TRUE, then
the crown age of the pruned tree can be smaller than in
the full tree if some lineages arising at the root of the
tree do not leave any extant descendants.

- Liam





--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] cut a tree in a given time interval

2011-07-14 Thread Nick Matzke

On 7/14/11 2:45 AM, ppi...@uniroma3.it wrote:
 Thankyou NIck,
 now...I have an error when running it:




Oops. Apologies, this stuff is in-house code, I haven't 
organized it. Add these to the text file...


===

get_daughters - function(nodenum, t)
{
daughter_edgenums = findall(nodenum, t$edge[,1])
daughter_nodenums = t$edge[,2][daughter_edgenums]
return(daughter_nodenums)
}

# Get indices of all matches to a list
findall - function(what, inlist)
{
TFmatches = inlist == what
indices = 1:length(inlist)
matching_indices = indices[TFmatches]
return(matching_indices)
}

===





object get_daughters not found
It does not seem I miss something
some help?
thanks again
best
paolo


Oops, left that out.  Here it is as text and an
attached file.

You will have to do
library(ape)

and maybe

library(phangorn)

...ahead of time (hopefully not others...)





chainsaw- function(tr, timepoint=10)
  {
  # Take a tree and saw it off evenly across a
certain
timepoint.
  # This removes any tips above the timepoint, and
replaces them
  # with a single tip representing the lineage
crossing
  # the timepoint (with a new tip name).

  # Get the tree in a table
  tr_table = prt(tr, printflag=FALSE)

  # Find the tips that are less than 10 my old and
drop them
  TF_exists_more_recently_than_10mya =
tr_table$time_bp
timepoint

  # Get the corresponding labels
  labels_for_tips_existing_more_recently_than_10mya
=
tr_table$label[ TF_exists_more_recently_than_10mya ==
TRUE ]

  ###
  # Draft chainsaw function
  ###
  # loop through the branches that cross 10 mya

  # get a list of the edge start/stops in the
phylogeny's
edges
  edge_times_bp = get_edge_times_before_present(tr)

  # which of these branches cross 10 mya?
  edges_start_earlier_than_10mya = edge_times_bp[,
1]
timepoint
  edges_end_later_than_10mya = edge_times_bp[, 2]
=
timepoint
  edges_to_chainsaw =
edges_start_earlier_than_10mya +
edges_end_later_than_10mya == 2

  # then, for each of these edges, figure out how
many
tips exist descending from it
  nodes_to_chainsaw = tr$edge[,
2][edges_to_chainsaw]
  nodes_to_chainsaw =
nodes_to_chainsaw[nodes_to_chainsaw
length(tr$tip.label)]

  # create a copy of the tree to chainsaw
  tree_to_chainsaw = tr

  for (i in 1:length(nodes_to_chainsaw))
  {
  tmp_subtree = extract.clade(tr,
nodes_to_chainsaw[i])
  #print(tmp_subtree$tip.label)

  tmp_number_of_tips =
length(tmp_subtree$tip.label)
  #print(tmp_number_of_tips)

  # number of tips to drop = (numtips -1)
  numtips_to_drop = tmp_number_of_tips - 1

  # tips_to_drop
  tmp_labels = tmp_subtree$tip.label

  labels_to_drop =
tmp_labels[1:numtips_to_drop]

  # new label
  label_kept_num = length(tmp_labels)
  label_kept = tmp_labels[label_kept_num]
  new_label = paste(CA_, label_kept, +,
numtips_to_drop, _tips, sep=)

tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label
==
label_kept] = new_label

  # chop off e.g. 2 of the 3 tips
  tree_to_chainsaw = drop.tip(tree_to_chainsaw,
labels_to_drop)

  }
  #plot(tree_to_chainsaw)
  #axisPhylo()

  tree_to_chainsaw_table = prt(tree_to_chainsaw,
printflag=FALSE)

  tree_to_chainsaw_table_tips_TF_time_bp_LT_10my =
tree_to_chainsaw_table$time_bp  timepoint


  tmp_edge_lengths =
tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]

  times_bp_for_edges_to_chainsaw =
tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]

  adjustment = times_bp_for_edges_to_chainsaw -
timepoint

  revised_tmp_edge_lengths = tmp_edge_lengths +
adjustment


tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]
= revised_tmp_edge_lengths

  # revised
  ordered_nodenames =
get_nodenums(tree_to_chainsaw)
  parent_branches =
get_indices_where_list1_occurs_in_list2(ordered_nodenames,
tree_to_chainsaw$edge[,2])

  NA_false =
is.not.na(tree_to_chainsaw_table$edge.length)

  tree_to_chainsaw$edge.length[parent_branches[NA_false]]
= tree_to_chainsaw_table$edge.length[NA_false]

  return(tree_to_chainsaw)
  }


# print tree in hierarchical format
prt- function(t, printflag=TRUE, relabel_nodes =
FALSE,
time_bp_digits=7)
  {
  # assemble beginning table

  # check if internal node labels exist
  if (node.label %in% attributes(t)$names ==
FALSE)
  {
  rootnum = get_nodenum_structural_root(t)

  new_node_labels = paste(inNode,
rootnum:(rootnum+t$Nnode-1), sep=)
  t$node.label = new_node_labels
  }

  # or manually relabel 

Re: [R-sig-phylo] cut a tree in a given time interval

2011-07-13 Thread Nick Matzke

Oops, left that out.  Here it is as text and an attached file.

You will have to do
library(ape)

and maybe

library(phangorn)

...ahead of time (hopefully not others...)





chainsaw - function(tr, timepoint=10)
{
# Take a tree and saw it off evenly across a certain 
timepoint.
# This removes any tips above the timepoint, and 
replaces them

# with a single tip representing the lineage crossing
# the timepoint (with a new tip name).

# Get the tree in a table
tr_table = prt(tr, printflag=FALSE)

# Find the tips that are less than 10 my old and drop them
TF_exists_more_recently_than_10mya = tr_table$time_bp  
timepoint


# Get the corresponding labels
labels_for_tips_existing_more_recently_than_10mya = 
tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ]


###
# Draft chainsaw function
###
# loop through the branches that cross 10 mya

# get a list of the edge start/stops in the phylogeny's 
edges

edge_times_bp = get_edge_times_before_present(tr)

# which of these branches cross 10 mya?
edges_start_earlier_than_10mya = edge_times_bp[, 1]  
timepoint
edges_end_later_than_10mya = edge_times_bp[, 2] = 
timepoint
edges_to_chainsaw = edges_start_earlier_than_10mya + 
edges_end_later_than_10mya == 2


# then, for each of these edges, figure out how many 
tips exist descending from it

nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw]
nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw 
 length(tr$tip.label)]


# create a copy of the tree to chainsaw
tree_to_chainsaw = tr

for (i in 1:length(nodes_to_chainsaw))
{
tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i])
#print(tmp_subtree$tip.label)

tmp_number_of_tips = length(tmp_subtree$tip.label)
#print(tmp_number_of_tips)

# number of tips to drop = (numtips -1)
numtips_to_drop = tmp_number_of_tips - 1

# tips_to_drop
tmp_labels = tmp_subtree$tip.label

labels_to_drop = tmp_labels[1:numtips_to_drop]

# new label
label_kept_num = length(tmp_labels)
label_kept = tmp_labels[label_kept_num]
new_label = paste(CA_, label_kept, +, 
numtips_to_drop, _tips, sep=)


tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == 
label_kept] = new_label


# chop off e.g. 2 of the 3 tips
tree_to_chainsaw = drop.tip(tree_to_chainsaw, 
labels_to_drop)


}
#plot(tree_to_chainsaw)
#axisPhylo()

tree_to_chainsaw_table = prt(tree_to_chainsaw, 
printflag=FALSE)


tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = 
tree_to_chainsaw_table$time_bp  timepoint



tmp_edge_lengths = 
tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]


times_bp_for_edges_to_chainsaw = 
tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]


adjustment = times_bp_for_edges_to_chainsaw - timepoint

revised_tmp_edge_lengths = tmp_edge_lengths + adjustment


tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] 
= revised_tmp_edge_lengths


# revised
ordered_nodenames = get_nodenums(tree_to_chainsaw)
parent_branches = 
get_indices_where_list1_occurs_in_list2(ordered_nodenames, 
tree_to_chainsaw$edge[,2])


NA_false = is.not.na(tree_to_chainsaw_table$edge.length)

tree_to_chainsaw$edge.length[parent_branches[NA_false]] 
= tree_to_chainsaw_table$edge.length[NA_false]


return(tree_to_chainsaw)
}


# print tree in hierarchical format
prt - function(t, printflag=TRUE, relabel_nodes = FALSE, 
time_bp_digits=7)

{
# assemble beginning table

# check if internal node labels exist
if (node.label %in% attributes(t)$names == FALSE)
{
rootnum = get_nodenum_structural_root(t)

new_node_labels = paste(inNode, 
rootnum:(rootnum+t$Nnode-1), sep=)

t$node.label = new_node_labels
}

# or manually relabel the internal nodes, if desired
if (relabel_nodes == TRUE)
{
rootnum = get_nodenum_structural_root(t)

new_node_labels = paste(inNode, 
rootnum:(rootnum+t$Nnode-1), sep=)

t$node.label = new_node_labels
}

labels = c(t$tip.label, t$node.label)
ordered_nodenames = get_nodenums(t)
#nodenums = 1:length(labels)
node.types1 = rep(tip, length(t$tip.label))
node.types2 = rep(internal, length(t$node.label))
node.types2[1] = root
node.types = c(node.types1, node.types2)

# These are the index numbers of the edges below each node
parent_branches = 
get_indices_where_list1_occurs_in_list2(ordered_nodenames, 
t$edge[,2])

#parent_edges = parent_branches
brlen_to_parent = t$edge.length[parent_branches]

parent_nodes = t$edge[,1][parent_branches]
daughter_nodes = lapply(ordered_nodenames, 

Re: [R-sig-phylo] cut a tree in a given time interval

2011-07-12 Thread Nick Matzke
I wrote a function called chainsaw to do something like 
this, it saws off all the branches at a particular time 
point, then you just have to drop.tip on branch tips older 
than your time period.  Would that be helpful?


On 7/12/11 3:19 PM, ppi...@uniroma3.it wrote:

Hi all,
someone knows a smart coding to cut a tree in order to
retain only taxa present in a given time interval bin?

Thanks in advance for any suggestion
best
paolo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] cut a tree in a given time interval

2011-07-12 Thread Nick Matzke


Here's chainsaw().  It also requires sourcing a few other 
functions.  Please cite me if you use it :-).


=

chainsaw - function(tr, timepoint=10)
{
# Take a tree and saw it off evenly across a certain timepoint.
# This removes any tips above the timepoint, and replaces them
# with a single tip representing the lineage crossing
# the timepoint (with a new tip name).

# Get the tree in a table
tr_table = prt(tr, printflag=FALSE)

# Find the tips that are less than 10 my old and drop them
	TF_exists_more_recently_than_10mya = tr_table$time_bp  
timepoint


# Get the corresponding labels
	labels_for_tips_existing_more_recently_than_10mya = 
tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ]


###
# Draft chainsaw function
###
# loop through the branches that cross 10 mya

# get a list of the edge start/stops in the phylogeny's edges
edge_times_bp = get_edge_times_before_present(tr)

# which of these branches cross 10 mya?
edges_start_earlier_than_10mya = edge_times_bp[, 1]  timepoint
edges_end_later_than_10mya = edge_times_bp[, 2] = timepoint
	edges_to_chainsaw = edges_start_earlier_than_10mya + 
edges_end_later_than_10mya == 2


	# then, for each of these edges, figure out how many tips 
exist descending from it

nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw]
	nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw  
length(tr$tip.label)]


# create a copy of the tree to chainsaw
tree_to_chainsaw = tr

for (i in 1:length(nodes_to_chainsaw))
{
tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i])
#print(tmp_subtree$tip.label)

tmp_number_of_tips = length(tmp_subtree$tip.label)
#print(tmp_number_of_tips)

# number of tips to drop = (numtips -1)
numtips_to_drop = tmp_number_of_tips - 1

# tips_to_drop
tmp_labels = tmp_subtree$tip.label

labels_to_drop = tmp_labels[1:numtips_to_drop]

# new label
label_kept_num = length(tmp_labels)
label_kept = tmp_labels[label_kept_num]
		new_label = paste(CA_, label_kept, +, numtips_to_drop, 
_tips, sep=)
		tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == 
label_kept] = new_label


# chop off e.g. 2 of the 3 tips
tree_to_chainsaw = drop.tip(tree_to_chainsaw, labels_to_drop)

}
#plot(tree_to_chainsaw)
#axisPhylo()

tree_to_chainsaw_table = prt(tree_to_chainsaw, printflag=FALSE)

	tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = 
tree_to_chainsaw_table$time_bp  timepoint



	tmp_edge_lengths = 
tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]


	times_bp_for_edges_to_chainsaw = 
tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]


adjustment = times_bp_for_edges_to_chainsaw - timepoint

revised_tmp_edge_lengths = tmp_edge_lengths + adjustment


tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]
 = revised_tmp_edge_lengths

# revised
ordered_nodenames = get_nodenums(tree_to_chainsaw)
	parent_branches = 
get_indices_where_list1_occurs_in_list2(ordered_nodenames, 
tree_to_chainsaw$edge[,2])


NA_false = is.not.na(tree_to_chainsaw_table$edge.length)

	tree_to_chainsaw$edge.length[parent_branches[NA_false]] = 
tree_to_chainsaw_table$edge.length[NA_false]


return(tree_to_chainsaw)
}


# print tree in hierarchical format
prt - function(t, printflag=TRUE, relabel_nodes = FALSE, 
time_bp_digits=7)

{
# assemble beginning table

# check if internal node labels exist
if (node.label %in% attributes(t)$names == FALSE)
{
rootnum = get_nodenum_structural_root(t)

		new_node_labels = paste(inNode, 
rootnum:(rootnum+t$Nnode-1), sep=)

t$node.label = new_node_labels
}

# or manually relabel the internal nodes, if desired
if (relabel_nodes == TRUE)
{
rootnum = get_nodenum_structural_root(t)

		new_node_labels = paste(inNode, 
rootnum:(rootnum+t$Nnode-1), sep=)

t$node.label = new_node_labels
}

labels = c(t$tip.label, t$node.label)
ordered_nodenames = 

Re: [R-sig-phylo] DNA sequence management for phylogenetics in R

2011-06-27 Thread Nick Matzke
After a fair amount of annoyment involving in shifting back 
and forth between BioPython and R, I also think it would be 
useful to have BioPython-like sequence management 
capabilities in R.  It would even be good to be able to do 
some things like access NCBI genbank records and download 
them, remote BLAST, etc.


My understanding is that the bioconductor package is 
supposed to have some of these capabilities, but


(a) to get their genbank function to work I had to hack it 
myself to update the appropriate URL etc., which indicates 
that this part of bioconductor, at least, is not well-maintained


...and...

(b) the bioconductor set of packages is massive, but most of 
it seems to be devoted to microarray analysis, which makes 
finding the sequence stuff a bit of a needle-in-a-haystack


Has anyone else had experience/success with bioconductor for 
sequence  phylogenetics purposes?


Cheers,
Nick


On 3/17/09 12:06 PM, Christoph Heibl wrote:

Hi Dan, Emmanuel, Brian, Rphyloers ...

Now that Brian pointed towards the phyloch package, I think
I have to add a little more information.

First of all, although it goes perhaps into the direction of
what Dan is looking for, this not a mature system and surely
aimed to work on a smaller scale (and tailored towards my
specific needs which include a strong spatial emphasis). But
to let you be the judge - my approach is as follows:

(1) All my own sequences are stored as ASCII files with
their PCR number as unique identifier in a set of
directories. (They could be stored in database, of course,
but in my opinion the benefits of this don't outweigh the
additional step of work, especially if you work actively on
the electropherograms.)

(2) Attribute data (taxonomy, marker, primers, collector,
acc no., locality, coordinates, etc) is stored in a
postgreSQL database.

(3) Queries of the database generate vectors containing PCR
numbers, which are used to select the corresponding
sequences and bundle them into an alignment object (ape)
with 'make.fasta' (phyloch).

(4) If necessary, additional sequences from GenBank are
retrieved with Emmanuel´s 'read.GenBank' function and fused
to my sequences via 'c.alignment' (phyloch).

(5) I assemble partitions separately by calling MAFFT with
'mafft' (phyloch) and then fuse them with 'c.genes' (phylo).
Thereby I can create alignments where missing sequences are
filled with Ns or choose to delete all those sequences which
are not represented in all of the partitions.

(6) 'c.genes' matches sequences via their name. That means
before I concatenate partitions, I have to set appropriate
taxon names. Once again this is done with the postgreSQL
database using the function 'tax.labels' (phyloch). This
allows me to concatenate alignments with different degrees
of specificity. Example: If I want to create an
interspecific sampling covering geographic range of species,
I can choose taxonnames AND locality as sequence names in
order to get an alignment where more than one accession of
each/some species is represented and only those conspecifics
stemming from the same sites will be concatenated.

I admit that this is a very crude patchwork of functions,
but up to a certain dimension it serves its purpose. If
think in your endeavor, Dan, SQL is your friend, but the
main task will be: How to automate the extraction of the
sequences' attributes from varying sources. For Genbank this
could be done by more sophisticated version of
'read.GenBank'. Some time ago I tried to build a function
'search.GenBank', but was not successful. Perhaps Emmanuel
could help here. His class 'DNAbin' might also prove
important if you plan to handle real big datasets, as he
just pointed out. In this case, it would be desirable to
extend the binary format to unaligned sequences to speed up
data assembly prior to alignment.

Best wishes,

Christoph

PS: Parts of phyloch are poorly documented. Anyone
interested, please do not hesitate to ask.





On Mar 17, 2009, at 5:46 PM, Brian O'Meara wrote:


Christoph Heibl has some R code that calls mafft for
alignment (which I currently like better than Clustal,
btw) and others that can interact with a postgreSQL
database for storing info [according to the software
description -- I haven't tried this]. See
http://www.christophheibl.de/Rpackages.html.

Brian

On Mar 17, 2009, at 12:09 PM, Emmanuel Paradis wrote:


Dan,

It seems that the way DNA sequences are coded in ape with
the class DNAbin meets some of the criteria you list
below. Sequences are stored in vectors, lists of vectors,
or matrices. The usual methods for extracting and
subsetting ([, [[, $) have been written for this class.
There are also methods for rbind and cbind. I have
modified them recently so that super-matrices can be
built eventually filling some columns/rows with gaps.

There is no way to do sequence alignment directly into R
at the moment, but Clustal can be called with the
system() function and read.dna() can read clustal
alignment files, so this 

Re: [R-sig-phylo] Generating a tree based on a distance matrix?

2011-05-12 Thread Nick Matzke
The APE command NJ (neighbor-joining) will form a tree from 
a distance matrix, so that's one option.  You could do it 
and then see if you get the same topology from NJ as from 
your topology tree.  The branch lengths will reflect 
whatever distances were calculated from the data (which 
might be one of several corrected or uncorrected distances, 
depending on the input sequence/character data).


Cheers,
Nick

On 5/12/11 2:18 PM, mgavil2 wrote:

All,
I have a tree topology (tree_name.tre), and a distance matrix, based
on that tree topology. However I cant not seem to find the nexus file
from which the matrix was generated.  Is there a way to use that
distance matrix to incorporate branch lengths into my topology?
I have looked into all the threads of questions posted in the list,
but still can not find an answer. my final objective is just to
generate a tree with branch lengths proportional to the distances on
the matrix (reviewers requirement for a publication).

Any suggestions would be greatly appreciated!

Thanks.

Maria Mercedes



--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] weirdness in confidence intervals returned by ace, pic option

2011-03-24 Thread Nick Matzke
On Wed, Mar 23, 2011 at 10:24 PM, Emmanuel Paradis
emmanuel.para...@ird.fr wrote:
 Hi Nick,

 With method = pic, the CIs are computed using the expected variances under
 the model, so they depend only on the tree. I've added a paragraph in the
 man page to explain this.

 Cheers,

 Emmanuel


Obviously, though, if one set of tip data ranges from 100-1000, and
another set of tip data ranges from 0.001-1, and both are mapped on
the same tree, the variances  confidence intervals will be different
at the estimated internal nodes for each trait.  But I was getting
identical confidence intervals across ~20 traits of widely different
magnitudes -- and then when I rescaled the trait data, the problem
went away, and each trait had different-sized CIs like you would
expect.  It was as if there some bizarre bug where there was a memory
error or some such and the same CI values were getting copied from one
trait to the next in some situations.

It's probably pointless to discuss, though, unless someone else can
replicate the problem I saw with the code I posted.  If the problem
disappears on other computers, then it's a local problem.  R.app
sometimes throws messages about memory errors and the like for no
apparent reason (not specifically associated with this, though), so it
might not even be an APE issue.

Cheers!
Nick




 Nick Matzke wrote on 22/03/2011 12:30:

 Hi all,

 This isn't crucial to my work at the moment since I am not using the PIC
 option of ace to do ancestral character estimation.  But while trying it out
 I noticed a very weird result that I can't explain...basically when I run
 ace on my raw trait values, I get the same sized confidence interval (97.5%
 CI minus 2.5% confidence interval) for all of my (drastically different)
 traits.



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] weirdness in confidence intervals returned by ace, pic option

2011-03-24 Thread Nick Matzke
Ah -- I get it now.  Even my rescaled trait data have the 
same width 95% CI, e.g. my CI for a particular node could be 
mean +/- 2.345 whether my input trait data ranges from 
0.3-0.5, or 400-800.


My fix with scaled data occurred because I was 
back-transforming, which scaled the size of the 95% CI with 
the size of the mean.  So I guess neither of these options 
is a real estimate of the CI, unlike when one runs ace, 
method=ML.


I noticed the same behavior using ace, method=gls, so that 
should be noted as well.  I have been using method=ml for 
ancestral character estimation, the width of its CIs vary as 
you might expect, so I was just surprised when PIC  GLS 
didn't exhibit the same behavior.


Cheers,
Nick

On 3/24/11 12:21 AM, Nick Matzke wrote:

On Wed, Mar 23, 2011 at 10:24 PM, Emmanuel Paradis
emmanuel.para...@ird.fr  wrote:

Hi Nick,

With method = pic, the CIs are computed using the expected variances under
the model, so they depend only on the tree. I've added a paragraph in the
man page to explain this.

Cheers,

Emmanuel


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1

2011-03-07 Thread Nick Matzke

Hi all,

It seems to be a popular week for questions!

I am running fitContinuous on a variety of continuous trait 
data.  I am noticing that when the traits are in units where 
the max is less than 1 (these are not ratio data, though), 
many of the various models produce log-likelihoods that are 
positive, which ought to be impossible.


If I rescale the trait values, e.g. by multiplying them by 
100, the problem goes away and all log-likelihoods are 
negative, and come out somewhat similar to each other 
between the BM, OU, etc. models, as one would expect.


Any hint about why this might be?

Cheers,
Nick


--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1

2011-03-07 Thread Nick Matzke
Doh!  Really should have remembered that, 
likelihoods-can-be-greater-than-1 is likelihood 101...


I am still a little puzzled by the dramatically different 
results between rescaling and not, will try to post an 
example in a sec...




On 3/7/11 12:37 PM, Nick Matzke wrote:

Hi all,

It seems to be a popular week for questions!

I am running fitContinuous on a variety of continuous trait
data. I am noticing that when the traits are in units where
the max is less than 1 (these are not ratio data, though),
many of the various models produce log-likelihoods that are
positive, which ought to be impossible.

If I rescale the trait values, e.g. by multiplying them by
100, the problem goes away and all log-likelihoods are
negative, and come out somewhat similar to each other
between the BM, OU, etc. models, as one would expect.

Any hint about why this might be?

Cheers,
Nick




--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] fitContinuous in geiger: positive log-likelihoods when trait values 1

2011-03-07 Thread Nick Matzke
Ah, so while re-creating my problem for copy-paste-debug 
goodness on the listserv, I discovered what was confusing me.


Originally, when I ran the various models, I got these 
log-likelihoods for results:


==
  tf2ic2kzkr  turn
BM -9.923018 -9.725328 -9.740485 -44.67030
OU 17.939356 61.492326 43.530827 -42.22548
lambda -9.923018 -9.725328 -9.740485 -42.26222
kappa  -9.923018 -9.725328 -9.740485 -44.45598
delta  18.121709 61.731899 42.452448 -43.09885
EB  9.649302 11.810710 11.612804 -44.67030
white  17.224360 58.832699 43.138356 -42.26222
==

The trait data is roughly normal, the phylogeny is pretty 
standard, so it was damned odd and rather hard to see why OU 
and white noise (!) would be dozens of log-likelihood units 
higher than Brownian motion, for instance.  turn trait 
data were all above 1, but for tf2ic2, kz, and kr, they were 
below 1.  So I was suspecting a bug or something.


As it turns out, back when first experimenting with 
fitContinuous, I had input non-default bounds on the models, 
and then forgot about it.  I think I probably ruled some of 
the optimal parameter values out-of-bounds for the trait 
data 1, and this gave pathological results.


Going back to defaults, we get:


 tf2ic2   kz   kr  turn
BM 17.46596 60.96182 40.75615 -44.67030
OU 17.93936 61.49233 43.53083 -42.22548
lambda 17.46596 60.96182 43.13836 -42.26235
kappa  17.46596 60.96182 41.43779 -44.45598
delta  18.12171 61.73190 42.45245 -43.09885
EB 17.46596 60.96182 40.75615 -44.67030
white  17.22436 58.83270 43.13836 -42.26222


...which seems far more reasonable...


Multiplying the trait data by 100 actually doesn't appear to 
change the subsequent log-likelihoods in this case, in the 
situation where the bounds are defaults rather than my 
screwed-up bounds...



 tf2ic2   kz   kr  turn
BM 17.46596 60.96182 40.75615 -44.67030
OU 17.93936 61.49233 43.53083 -42.22548
lambda 17.46596 60.96182 43.13836 -42.26235
kappa  17.46596 60.96182 41.43779 -44.45598
delta  18.12171 61.73190 42.45245 -43.09885
EB 17.46596 60.96182 40.75615 -44.67030
white  17.22436 58.83270 43.13836 -42.26222


...which is also encouraging.

So, basically, thanks for the help and sorry for the trouble!

Nick


On 3/7/11 4:41 PM, Dan Rabosky wrote:


Hi Nick-

Are you are getting differences in relative AICs between
models from simple rescaling (multiplying by a constant)?

The actual values of the traits *might* matter for
optimization, depending on various parameters associated
with optimization (and whatever algorithm is being used -
this should be L-BFGS-B for fitContinuous, I think). So...if
relative AICs are different, the first thing I would check
is whether some of your model AICs reflect local optima. Do
lots of optimizations with random starting parameters, which
is usually sufficient - and failing that, you can get into
the guts of optim and mess with the actual arguments that
control the optimization (e.g., parscale etc).

FYI, this is not immediately transparent in Geiger, as many
of the functions are hidden. To get at the optimization, look at

geiger:::fitContinuousModel

which does the heavy lifting within fitContinuous, and if
you need to, you can recode the relevant call to optim to
have a bit more flexibility.

~Dan




On Mar 7, 2011, at 4:04 PM, Nick Matzke wrote:


Doh! Really should have remembered that,
likelihoods-can-be-greater-than-1 is likelihood 101...

I am still a little puzzled by the dramatically different
results between rescaling and not, will try to post an
example in a sec...








--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong

Re: [R-sig-phylo] Multiple sequence alignment in R?

2011-02-22 Thread Nick Matzke

You can call any command-line thing from R with system().

Typically I use R to write the control file (e.g. for r8s), 
then do something like...


cmdstr = paste(program_name, -options, control_file, 
 output.log, sep= )

system(cmdstr)

Cheers!
Nick

On 2/22/11 5:42 AM, Scott Chamberlain wrote:

Hello,


Is clustal multiple sequence alignment implemented in any R packages, or is 
there an easy way to call ClustalW on your hard drive from R, perhaps with 
system()


Sincerely,

Scott Chamberlain
Rice University, EEB Dept.



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--

Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: 
http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-
[W]hen people thought the earth was flat, they were wrong. 
When people thought the earth was spherical, they were 
wrong. But if you think that thinking the earth is spherical 
is just as wrong as thinking the earth is flat, then your 
view is wronger than both of them put together.


Isaac Asimov (1989). The Relativity of Wrong. The 
Skeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo