Hi all,
This isn't crucial to my work at the moment since I am not
using the PIC option of ace to do ancestral character
estimation. But while trying it out I noticed a very weird
result that I can't explain...basically when I run ace on my
raw trait values, I get the same sized confidence interval
(97.5% CI minus 2.5% confidence interval) for all of my
(drastically different) traits.
E.g.:
===================
s1n s1t s2en s2et s2in s2it
20 4.852213 4.852213 4.852213 4.852213 4.852213
21 2.445078 2.445078 2.445078 2.445078 2.445078
22 3.200703 3.200703 3.200703 3.200703 3.200703
23 2.960947 2.960947 2.960947 2.960947 2.960947
24 2.240474 2.240474 2.240474 2.240474 2.240474
25 2.654838 2.654838 2.654838 2.654838 2.654838
===================
When I transform the data in various ways, the problem goes
away, even though the rest of the code is identical
(copy-paste identical, except for the input data), so I have
(I think!) excluded some idiotic error on my part. Although
the result is so strange, I can't imagine what would cause it.
An example of the expected result:
=================
s1n s1t s2en s2et s2in s2it
20 63.84491 1.4760944 44.97236 0.6305324 47.01539
21 32.17208 0.7438184 22.66201 0.3177314 23.69152
22 42.11451 0.9736874 29.66546 0.4159229 31.01312
23 38.95982 0.9007511 27.44330 0.3847672 28.69001
24 29.47992 0.6815757 20.76565 0.2911437 21.70901
25 34.93208 0.8076297 24.60616 0.3449892 25.72398
=================
I got the same results on 2 different computers, on APE 2.4
and 2.7, so I doubt it is something specific to my computer.
The code below should load the data & analysis from scratch
& reproduce the weirdness. Thoughts welcome!
Nick
==================
library(ape)
#############################
# Setup
#############################
# as.data.frame shortcut function
adf <- function(x)
{
return(as.data.frame(x, row.names=NULL,
stringsAsFactors=FALSE))
}
#############################
# Read in the ultrametric tree
#############################
trstring =
"(((tessulatus:0.5015255487,(((gloriamaris:0.3047954399,(ammiralis:0.2417876544,(dalli:0.1923168953,textile:0.1923168953):0.04947075908):0.06300778553):0.07060254417,(furvus:0.3149966906,((crocatus:0.2453337596,omaria:0.2453337596):0.009171713364,(aulicus:0.1887595997,episcopatus:0.1887595997):0.06574587326):0.06049121772):0.06040129347):0.07403402886,(bandanus:0.06858755301,marmoreus:0.06858755301):0.38084446):0.05209353574):0.06339155354,((laterculatus:0.4407953441,(arenatus:0.1438581051,pulicarius:0.1438581051):0.296937239):0.06667390592,(consors:0.3221133412,(aurisiacus:0.2279167829,stercusmuscarum:0.2279167829):0.09419655831):0.1853559088):0.05744785226):0.4350828977,orbignyi:1);"
chtr2 = read.tree(file="", trstring)
num_internal_nodes = chtr2$Nnode
treenodes = c(20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37)
#############################
# Read in trait values
#############################
tmpdata = c(8, 15, 15, 15, 15, 8, 15, 15, 15, 15, 15, 15, 6,
15, 15, 10, 15, 15, 8, 0.42, 0.28, 0.19, 0.28, 0.28, 0.42,
0.3, 0.28, 0.28, 0.28, 0.25, 0.25, 0.3, 0.3, 0.3, 0.35, 0.3,
0.3, 0.42, 6.7, 14, 17, 14, 14, 6.7, 5, 14, 14, 14, 10, 10,
5, 5, 5, 5, 5, 5, 6.7, 0.033, 0.17, 0.3, 0.17, 0.17, 0.033,
0.1, 0.17, 0.17, 0.17, 0.17, 0.17, 0.01, 0.1, 0.1, 0.2, 0.1,
0.1, 0.033, 8.3, 12.7, 3, 12.7, 12.7, 8.3, 5, 12.7, 12.7,
12.7, 20, 20, 5, 5, 5, 10, 5, 5, 8.3, 0.074, 0.1, 0.54, 0.1,
0.1, 0.074, 0.06, 0.1, 0.1, 0.1, 0.35, 0.35, 0.06, 0.06,
0.06, 0.15, 0.06, 0.06, 0.074, 30, 19.5, 20, 19.5, 19.5, 31,
7.9, 25, 19.5, 19.5, 40, 40, 5, 8, 8, 6, 8, 22, 30, 0.4,
0.1, 0.37, 0.1, 0.1, 0.37, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1,
0.1, 0.15, 0.15, 0.1, 0.15, 0.03, 0.4, 5, 4.5, 5, 7, 4.5, 5,
3.5, 5, 4, 4, 6, 6, 2.5, 3.5, 3.5, 1.5, 3.1, 1.3, 5, 0.015,
0.005, 0.008, 0.006, 0.006, 0.006, 0.02, 0.0065, 0.006,
0.0065, 0.007, 0.01, 0.009, 0.0044, 0.02, 0.06, 0.015,
0.0067, 0.008, 0.029, 0.007, 0.041, 0.007, 0.007, 0.018,
0.1, 0.007, 0.007, 0.007, 0.12, 0.18, 0.03, 0.012, 0.055,
0.13, 0.041, 0.009, 0.015, 5, 3.665, 3.96, 3.665, 3.665, 5,
2.28, 3.665, 3.665, 3.665, 3.79, 3.79, 3.28, 2.28, 2.28, 1,
2.28, 2.28, 5, 0.02, 0.21, 0, 0.21, 0.21, 0.02, 0, 0.21,
0.21, 0.21, 0.45, 0.45, 0, 0, 0, 0, 0, 0, 0.02, 4, 2.665,
2.96, 2.665, 2.665, 4, 1.28, 2.665, 2.665, 2.665, 2.79,
2.79, 2.28, 1.28, 1.28, 0, 1.28, 1.28, 4, 0.01, 0.33, 0.17,
0.33, 0.33, 0.01, 0.32, 0.33, 0.33, 0.33, 0.56, 0.56, 0.3,
0.32, 0.32, 0, 0.3, 0.3, 0.01, 1.2, 1.2, 6.82, 1.2, 1.2,
1.2, 1.78, 1.2, 1.2, 1.2, 2.47, 2.47, 2.58, 1.78, 1.78, 1,
1.78, 1.78, 1.2, 0.15, 0.085, 0, 0.085, 0.085, 0.15, 0,
0.085, 0.085, 0.085, 0.03, 0.03, 0, 0, 0, 0, 0, 0, 0.15,
0.2, 0.2, 5.82, 0.2, 0.2, 0.2, 0.78, 0.2, 0.2, 0.2, 1.47,
1.47, 1.58, 0.78, 0.78, 0, 0.78, 0.78, 0.2, 0.14, 0.18,
0.04, 0.18, 0.18, 0.14, 0.32, 0.18, 0.18, 0.18, 0.25, 0.25,
0.4, 0.32, 0.32, 0, 0.29, 0.29, 0.14, 0.03, 0.03, 0.03,
0.03, 0.03, 0.04, 0.01, 0.03, 0.03, 0.03, 0, 0, 0.03, 0.02,
0.01, 0.03, 0.03, 0.03, 0.03, 0.1, 0.05, 0.08, 0.1, 0.05,
0.09, 0.01, 0.05, 0.1, 0.05, 0.1, 0.09, 0.05, 0.05, 0.05,
0.05, 0.05, 0.05, 0.05, 6.5, 5, 5, 2, 2, 4, 1.5, 6, 2.5, 2,
1.3, 4, 8, 3.5, 2.5, 5, 8, 2, 8, 251, 251, 251, 251, 251,
351, 251, 251, 251, 251, 451, 451, 251, 251, 251, 251, 251,
251, 251, 500, 1750, 500, 500, 500, 500, 500, 500, 500, 500,
750, 750, 500, 500, 500, 300, 500, 500, 500)
dim(tmpdata) = c(19,24)
tmp_rownames = c("tessulatus", "gloriamaris", "ammiralis",
"dalli", "textile", "furvus", "crocatus", "omaria",
"aulicus", "episcopatus", "bandanus", "marmoreus",
"laterculatus", "arenatus", "pulicarius", "consors",
"aurisiacus", "stercusmuscarum", "orbignyi")
tmp_colnames = c("s1n", "s1t", "s2en", "s2et", "s2in",
"s2it", "s3n", "s3t", "sf2ea", "sf2ed", "sf2id", "tf2eb1",
"tf2ec1", "tf2eb2", "tf2ec2", "tf2ib1", "tf2ic1", "tf2ib2",
"tf2ic2", "kz", "kr", "turn" , "N", "T" )
char_dtf = tmpdata
char_dtf = adf(char_dtf)
names(char_dtf) = tmp_colnames
row.names(char_dtf) = tmp_rownames
################################################
# I get very strange confidence intervals when I run ACE
# under PIC on the raw values
#
# i.e., every trait column has exactly the same range, no
# matter which trait
################################################
PICcont_means = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
PICcont_lower025 = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
PICcont_upper975 = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
for (i in 1:ncol(char_dtf))
{
PICcont = ace(char_dtf[,i], chtr2, type="continuous",
method="pic", CI=TRUE, model="BM")
PICcont_means[, i] = PICcont$ace
PICcont_lower025[, i] = PICcont$CI95[, 1]
PICcont_upper975[, i] = PICcont$CI95[, 2]
}
PICcont_means = adf(PICcont_means)
names(PICcont_means) = names(char_dtf)
row.names(PICcont_means) = treenodes
PICcont_lower025 = adf(PICcont_lower025)
names(PICcont_lower025) = names(char_dtf)
row.names(PICcont_lower025) = row.names(PICcont_means)
PICcont_upper975 = adf(PICcont_upper975)
names(PICcont_upper975) = names(char_dtf)
row.names(PICcont_upper975) = row.names(PICcont_means)
# Look at this output. The means look OK...
PICcont_means
# But the confidence intervals are the same size
# for each variable!
# Strange, no?
PICcont_upper975-PICcont_lower025
################################
# For some reason, taking each trait and
# dividing by the mean of the tips,
# then multiplying each ancestral ACE
# estimate by the relevant tip mean
# seems to fix the problem -- at least,
# they aren't the same for each variable
#
################################
low_char_dtf = char_dtf
# Divide observed traits by their mean
for (i in 1:nrow(char_dtf))
{
low_char_dtf[i, ] = low_char_dtf[i,] / mean(char_dtf)
}
PICcont_means = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
PICcont_lower025 = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
PICcont_upper975 = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
# Be sure to transform back to normal coordinates
for (i in 1:ncol(char_dtf))
{
PICcont_ln = ace(low_char_dtf[,i], chtr2,
type="continuous", method="pic", CI=TRUE, model="BM")
# Multiply estimated traits by their mean
PICcont_means[, i] = mean(char_dtf[,i])*(PICcont_ln$ace)
PICcont_lower025[, i] =
mean(char_dtf[,i])*(PICcont_ln$CI95[, 1])
PICcont_upper975[, i] =
mean(char_dtf[,i])*(PICcont_ln$CI95[, 2])
}
PICcont_means = adf(PICcont_means)
names(PICcont_means) = names(char_dtf)
row.names(PICcont_means) = treenodes
PICcont_lower025 = adf(PICcont_lower025)
names(PICcont_lower025) = names(char_dtf)
row.names(PICcont_lower025) = row.names(PICcont_means)
PICcont_upper975 = adf(PICcont_upper975)
names(PICcont_upper975) = names(char_dtf)
row.names(PICcont_upper975) = row.names(PICcont_means)
# means OK
PICcont_means
# range of 95% CI is also variable now
# and node 20, the deepest node, also has
# the largest confidence intervals
PICcont_upper975-PICcont_lower025
#######################################
# Doing it in log space also works
#######################################
ln_char_dtf = char_dtf
min_char_dtf = NULL
for (col in 1:ncol(char_dtf))
{
# take min of nonzero values
tmpmin = min(char_dtf[,col][char_dtf[,col] != 0])
min_char_dtf = c(min_char_dtf, tmpmin)
}
# change 0s to 10% of trait minimum
for (col in 1:ncol(ln_char_dtf))
{
# assign 10% of nonzero minimum
ln_char_dtf[,col][ln_char_dtf[, col] == 0] =
min_char_dtf[col] * 0.1
}
# log-transform
ln_char_dtf = log(ln_char_dtf)
PICcont_means = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
PICcont_lower025 = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
PICcont_upper975 = array(NA, dim=c(num_internal_nodes,
ncol(char_dtf)) )
for (i in 1:ncol(ln_char_dtf))
{
PICcont = ace(ln_char_dtf[,i], chtr2, type="continuous",
method="pic", CI=TRUE, model="BM")
# convert back to nonlog space
PICcont_means[, i] = exp(PICcont$ace)
PICcont_lower025[, i] = exp(PICcont$CI95[, 1])
PICcont_upper975[, i] = exp(PICcont$CI95[, 2])
}
PICcont_means = adf(PICcont_means)
names(PICcont_means) = names(char_dtf)
row.names(PICcont_means) = treenodes
PICcont_lower025 = adf(PICcont_lower025)
names(PICcont_lower025) = names(char_dtf)
row.names(PICcont_lower025) = row.names(PICcont_means)
PICcont_upper975 = adf(PICcont_upper975)
names(PICcont_upper975) = names(char_dtf)
row.names(PICcont_upper975) = row.names(PICcont_means)
# Look at this output. The means look OK...
PICcont_means
# And the 95% CI range looks OK also
PICcont_upper975-PICcont_lower025
==================================================
--
====================================================
Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher
Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley
Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/
Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page:
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page:
http://fisher.berkeley.edu/cteg/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264
Cell phone: 510-301-0179
Email: mat...@berkeley.edu
Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140
-----------------------------------------------------
"[W]hen people thought the earth was flat, they were wrong.
When people thought the earth was spherical, they were
wrong. But if you think that thinking the earth is spherical
is just as wrong as thinking the earth is flat, then your
view is wronger than both of them put together."
Isaac Asimov (1989). "The Relativity of Wrong." The
Skeptical Inquirer, 14(1), 35-44. Fall 1989.
http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo