Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread ppiras
Hi all,

Citing
http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
Using Felsenstein's (1985) phylogenetic independent
contrasts (pic); This is also a Brownian-motion based
estimator, but it only takes descendants of each node
into account in reconstructing the state at that node.
More basal nodes are ignored.

I  THINK that, on the opposite, more basal nodes are
NOT ignored in gls and for this reason results can
differ slightly
I'm wrong?



Hi Nick.

For your first (simple) problem, I believe you want to
do:

  read.csv(file=Mason_data.csv,row.names=1)-smdata

Regarding the more complicated issue, the problem of
non-independence in
linear regression comes in the residual error of the
model.  Thus, you
should fit a correlation structure for the error in Y
given X (not for X
 Y separately as you have done with PICs here).  This
indeed can be
done using gls() - so, for instance:

pglsModel1a-gls(Pause_Rate~Comp.1,
correlation=corMartins(1,
spbm),data=smdata)

fits a linear model in which the residual error of
Pause_Rate given
Comp.1 is distributed according to the correlation
structure
corMartins() with alpha estimated using REML.

The way to reproduce this result using contrasts is
the following:

  alpha-attr(pglsModel1a$apVar,Pars)[corStruct]
# get fitted alpha
  smdata-as.matrix(smdata) # important
  pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha))
  pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha))
  summary(lm(pr_contrast~pc1_contrast-1))

Which if you compare to:

  summary(pglsModel1a)

you should find yields the same results and P-value.
(Note that
smdata-as.matrix(smdata) seems to be important here
as if smdata is a
data frame, then smdata[,1] does not inherit the
rownames of smdata and
pic() will return an incorrect set of contrasts.)

I hope this helps.  No doubt other subscribers to the
list may have
something to add.

Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/13/2011 11:07 PM, Nicholas Mason wrote:
 Dear R-sig-phylo users,
 I have a question regarding comparative analyses of
 contrasts done with
 the functions fitContinuous() and pic() compared to
 using PGLS (using
 the gls() function).

  From my understanding the first method involving
 pic() below fits alpha
 (estimated using fitContinuous()) to each character
 independently then
 performs a regression on the resulting contrasts.

 The second (PGLS) method involving gls() fits both
 the regression and
 contrasts simultaneously and reports a single alpha
 value for the
 relationship between two traits.

 These two methods appear very similar, yet I get
 slightly different
 results between the two functions, particularly with
 respect to
 p-values. Using fitContinuous(), the results from
 the data set attached
 are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p
 =0.0519.
 Furthermore, if I switch the dependent and
 independent variables (V1~V2
 vs V2~V1), I get the same answer with pic(), but
 gls() differs: r2 =
 -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in
 the attachment).

 Could anyone explain why these functions vary (in my
 mind they were
 doing essentially the same thing) and if there are
 situations where one
 should be favored over another? Also, why does PGLS
 vary if you switch
 the dependent/independent variables in the linear
 model?

 Thanks in advance for any advice or comments
 offered!!

 Cheers,
 Nick Mason

 Below I have the code I have been using (also
 attached as Mason.R):

 require(ape)
 require(nlme)
 require(geiger)
 read.csv(file=Mason_data.csv)-smdata
 rownames(smdata)-smdata[,1]
 smdata[,1]-NULL#ASIDE: If anyone could tell me how
 to get around these
 two lines of code, that would also be awesome.
 Header=T doesn't seem to
 work for me.
 read.tree(file=Mason_tree.tre)-spbm
 name.check(smdata,spbm)

 fitContinuous(spbm,smdata,model=OU)-ou2

 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))
 pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))

 summary(lm(pr_contrast~pc1_contrast-1))

 pglsModel1a-gls(Pause_Rate~Comp.1,
 correlation=corMartins(1,
 spbm),data=smdata)
 summary(pglsModel1a)

 pglsModel1b-gls(Comp.1~Pause_Rate,
 correlation=corMartins(1,
 spbm),data=smdata)
 summary(pglsModel1b)








 -
 Nicholas Albert Mason
 MS Candidate, Burns Lab
 Department of Biology - EB Program
 San Diego State University
 5500 Campanile Drive
 San Diego, CA 92182-4614
 845-240-0649 (cell)





 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org

Re: [R-sig-phylo] Assigning node ages to a tree, revisited

2011-07-14 Thread Hunt, Gene
Roger,

If you like, go ahead and send me off-list your code and tree file.  The 
function does not do a lot of error checking, so it is probably hanging on tip 
or node names; an incorrect tree structure can flummox it, too.  If it is 
working ,it shouldn't take more than a few seconds.

Best,
Gene


On 7/14/11 12:15 AM, Scott Chamberlain myrmecocys...@gmail.com wrote:


 Hi Roger,

Can you provide a reproducible example (perhaps a subset of your tree and their 
tip and node ages if the tree is very large)? I don't know what the problem 
could be without the data, but perhaps Gene knows?


Scott


On Wednesday, July 13, 2011 at 10:57 PM, Roger Close wrote:


Hello all,

I wish to transform branch lengths on a tree according to ages of
terminal taxa and internal node ages; to this end I have tried to
implement the script written by Gene Hunt mentioned in a recent post
to this list 
(http://www.mail-archive.com/r-sig-phylo@r-project.org/msg01005.html;
script available at https://gist.github.com/938313).

However, when attempting to run the command scalePhylo(tr, tip.ages,
node.mins), it crunches away for hours on my computer (a 2010 MacBook
Pro Core i7, which is quite fast). Clearly there's something wrong.
I'm afraid I don't know enough about R to debug a script.

I thought I'd send this to the list in case someone other than Gene
Hunt or Scott Chamberlain has had a go at using this script.

Thanks in advance,

Roger









--
Gene Hunt
Curator, Department of Paleobiology
National Museum of Natural History
Smithsonian Institution [NHB, MRC 121]
P.O. Box 37012
Washington DC 20013-7012
Phone: 202-633-1331  Fax: 202-786-2832
http://paleobiology.si.edu/staff/individuals/hunt.cfm

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread Joe Felsenstein


Paolo Piras wrote --


Citing
http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
Using Felsenstein's (1985) phylogenetic independent
contrasts (pic); This is also a Brownian-motion based
estimator, but it only takes descendants of each node
into account in reconstructing the state at that node.
More basal nodes are ignored.

I  THINK that, on the opposite, more basal nodes are
NOT ignored in gls and for this reason results can
differ slightly
I'm wrong?


The contrast algorithm if continued to the root,
makes the correct ancestral reconstruction for the root.
You are correct that values for higher nodes in the
tree are not the correct ML reconstruction for those
nodes.  If the tree is rerooted at any interior node
and the algorithm used for that, then that node's
reconstruction will be correct.  There are ways of
re-using information so that the total effort of doing this
for all interior nodes will be no worse than about twice
that of a single pass through the tree.

However people may prefer to use PGLS, which if
properly done should give the proper estimates for
all nodes.  There is some discussion of this in
Rohlf's 2001 paper in Evolution.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread Theodore Garland Jr
The independent contrasts algebra for what Joe is talking about can be found in 
these papers:

Garland, T., Jr., P. E. Midford, and A. R. Ives. 1999. An introduction to 
phylogenetically based statistical methods, with a new method for confidence 
intervals on ancestral values. American Zoologist 39:374-388.

Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: 
Confidence intervals for regression equations in phylogenetic comparative 
methods. American Naturalist 155:346-364.

Cheers,
Ted

Theodore Garland, Jr.
Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Facsimile:  (951) 827-4286 = Dept. office (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html

Experimental Evolution: Concepts, Methods, and Applications of Selection 
Experiments
Edited by Theodore Garland, Jr. and Michael R. Rose
http://www.ucpress.edu/book.php?isbn=9780520261808
(PDFs of chapters are available from me or from the individual authors)


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Joe Felsenstein [j...@gs.washington.edu]
Sent: Thursday, July 14, 2011 6:35 AM
To: ppi...@uniroma3.it
Cc: r-sig-phylo
Subject: Re: [R-sig-phylo] pic() vs gls()

Paolo Piras wrote --

 Citing
 http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
 Using Felsenstein's (1985) phylogenetic independent
 contrasts (pic); This is also a Brownian-motion based
 estimator, but it only takes descendants of each node
 into account in reconstructing the state at that node.
 More basal nodes are ignored.

 I  THINK that, on the opposite, more basal nodes are
 NOT ignored in gls and for this reason results can
 differ slightly
 I'm wrong?

The contrast algorithm if continued to the root,
makes the correct ancestral reconstruction for the root.
You are correct that values for higher nodes in the
tree are not the correct ML reconstruction for those
nodes.  If the tree is rerooted at any interior node
and the algorithm used for that, then that node's
reconstruction will be correct.  There are ways of
re-using information so that the total effort of doing this
for all interior nodes will be no worse than about twice
that of a single pass through the tree.

However people may prefer to use PGLS, which if
properly done should give the proper estimates for
all nodes.  There is some discussion of this in
Rohlf's 2001 paper in Evolution.

Joe

Joe Felsenstein, j...@gs.washington.edu
  Dept. of Genome Sciences, Univ. of Washington
  Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] cut a tree in a given time interval

2011-07-14 Thread Nick Matzke

On 7/14/11 2:45 AM, ppi...@uniroma3.it wrote:
 Thankyou NIck,
 now...I have an error when running it:




Oops. Apologies, this stuff is in-house code, I haven't 
organized it. Add these to the text file...


===

get_daughters - function(nodenum, t)
{
daughter_edgenums = findall(nodenum, t$edge[,1])
daughter_nodenums = t$edge[,2][daughter_edgenums]
return(daughter_nodenums)
}

# Get indices of all matches to a list
findall - function(what, inlist)
{
TFmatches = inlist == what
indices = 1:length(inlist)
matching_indices = indices[TFmatches]
return(matching_indices)
}

===





object get_daughters not found
It does not seem I miss something
some help?
thanks again
best
paolo


Oops, left that out.  Here it is as text and an
attached file.

You will have to do
library(ape)

and maybe

library(phangorn)

...ahead of time (hopefully not others...)





chainsaw- function(tr, timepoint=10)
  {
  # Take a tree and saw it off evenly across a
certain
timepoint.
  # This removes any tips above the timepoint, and
replaces them
  # with a single tip representing the lineage
crossing
  # the timepoint (with a new tip name).

  # Get the tree in a table
  tr_table = prt(tr, printflag=FALSE)

  # Find the tips that are less than 10 my old and
drop them
  TF_exists_more_recently_than_10mya =
tr_table$time_bp
timepoint

  # Get the corresponding labels
  labels_for_tips_existing_more_recently_than_10mya
=
tr_table$label[ TF_exists_more_recently_than_10mya ==
TRUE ]

  ###
  # Draft chainsaw function
  ###
  # loop through the branches that cross 10 mya

  # get a list of the edge start/stops in the
phylogeny's
edges
  edge_times_bp = get_edge_times_before_present(tr)

  # which of these branches cross 10 mya?
  edges_start_earlier_than_10mya = edge_times_bp[,
1]
timepoint
  edges_end_later_than_10mya = edge_times_bp[, 2]
=
timepoint
  edges_to_chainsaw =
edges_start_earlier_than_10mya +
edges_end_later_than_10mya == 2

  # then, for each of these edges, figure out how
many
tips exist descending from it
  nodes_to_chainsaw = tr$edge[,
2][edges_to_chainsaw]
  nodes_to_chainsaw =
nodes_to_chainsaw[nodes_to_chainsaw
length(tr$tip.label)]

  # create a copy of the tree to chainsaw
  tree_to_chainsaw = tr

  for (i in 1:length(nodes_to_chainsaw))
  {
  tmp_subtree = extract.clade(tr,
nodes_to_chainsaw[i])
  #print(tmp_subtree$tip.label)

  tmp_number_of_tips =
length(tmp_subtree$tip.label)
  #print(tmp_number_of_tips)

  # number of tips to drop = (numtips -1)
  numtips_to_drop = tmp_number_of_tips - 1

  # tips_to_drop
  tmp_labels = tmp_subtree$tip.label

  labels_to_drop =
tmp_labels[1:numtips_to_drop]

  # new label
  label_kept_num = length(tmp_labels)
  label_kept = tmp_labels[label_kept_num]
  new_label = paste(CA_, label_kept, +,
numtips_to_drop, _tips, sep=)

tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label
==
label_kept] = new_label

  # chop off e.g. 2 of the 3 tips
  tree_to_chainsaw = drop.tip(tree_to_chainsaw,
labels_to_drop)

  }
  #plot(tree_to_chainsaw)
  #axisPhylo()

  tree_to_chainsaw_table = prt(tree_to_chainsaw,
printflag=FALSE)

  tree_to_chainsaw_table_tips_TF_time_bp_LT_10my =
tree_to_chainsaw_table$time_bp  timepoint


  tmp_edge_lengths =
tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]

  times_bp_for_edges_to_chainsaw =
tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]

  adjustment = times_bp_for_edges_to_chainsaw -
timepoint

  revised_tmp_edge_lengths = tmp_edge_lengths +
adjustment


tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]
= revised_tmp_edge_lengths

  # revised
  ordered_nodenames =
get_nodenums(tree_to_chainsaw)
  parent_branches =
get_indices_where_list1_occurs_in_list2(ordered_nodenames,
tree_to_chainsaw$edge[,2])

  NA_false =
is.not.na(tree_to_chainsaw_table$edge.length)

  tree_to_chainsaw$edge.length[parent_branches[NA_false]]
= tree_to_chainsaw_table$edge.length[NA_false]

  return(tree_to_chainsaw)
  }


# print tree in hierarchical format
prt- function(t, printflag=TRUE, relabel_nodes =
FALSE,
time_bp_digits=7)
  {
  # assemble beginning table

  # check if internal node labels exist
  if (node.label %in% attributes(t)$names ==
FALSE)
  {
  rootnum = get_nodenum_structural_root(t)

  new_node_labels = paste(inNode,
rootnum:(rootnum+t$Nnode-1), sep=)
  t$node.label = new_node_labels
  }

  # or manually relabel 

Re: [R-sig-phylo] Assigning node ages to a tree, revisited

2011-07-14 Thread Emmanuel Paradis

Hi all,

As a side note to this discussion: ape has now the function 
compute.brtime (though it considers only node ages), available on ape's 
SVN in R/compute.brtime.R.


Cheers,

Emmanuel

Scott Chamberlain wrote on 14/07/2011 11:15:
Hi Roger, 


Can you provide a reproducible example (perhaps a subset of your tree
and their tip and node ages if the tree is very large)? I don't know
what the problem could be without the data, but perhaps Gene knows? 



Scott

On Wednesday, July 13, 2011 at 10:57 PM, Roger Close wrote:


Hello all,

I wish to transform branch lengths on a tree according to ages of
terminal taxa and internal node ages; to this end I have tried to
implement the script written by Gene Hunt mentioned in a recent post
to this list 
(http://www.mail-archive.com/r-sig-phylo@r-project.org/msg01005.html;
script available at https://gist.github.com/938313).

However, when attempting to run the command scalePhylo(tr, tip.ages,
node.mins), it crunches away for hours on my computer (a 2010 MacBook
Pro Core i7, which is quite fast). Clearly there's something wrong.
I'm afraid I don't know enough about R to debug a script.

I thought I'd send this to the list in case someone other than Gene
Hunt or Scott Chamberlain has had a go at using this script.

Thanks in advance,

Roger



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo