[R-sig-phylo] weirdness in confidence intervals returned by ace, pic option

Nick Matzke Mon, 21 Mar 2011 22:31:07 -0700

Hi all,

This isn't crucial to my work at the moment since I am notusing the PIC option of ace to do ancestral characterestimation. But while trying it out I noticed a very weirdresult that I can't explain...basically when I run ace on myraw trait values, I get the same sized confidence interval(97.5% CI minus 2.5% confidence interval) for all of my(drastically different) traits.


E.g.:

===================
s1n      s1t     s2en     s2et     s2in     s2it
20 4.852213 4.852213 4.852213 4.852213 4.852213
21 2.445078 2.445078 2.445078 2.445078 2.445078
22 3.200703 3.200703 3.200703 3.200703 3.200703
23 2.960947 2.960947 2.960947 2.960947 2.960947
24 2.240474 2.240474 2.240474 2.240474 2.240474
25 2.654838 2.654838 2.654838 2.654838 2.654838
===================

When I transform the data in various ways, the problem goesaway, even though the rest of the code is identical(copy-paste identical, except for the input data), so I have(I think!) excluded some idiotic error on my part. Althoughthe result is so strange, I can't imagine what would cause it.


An example of the expected result:
=================
s1n       s1t     s2en      s2et     s2in      s2it
20 63.84491 1.4760944 44.97236 0.6305324 47.01539
21 32.17208 0.7438184 22.66201 0.3177314 23.69152
22 42.11451 0.9736874 29.66546 0.4159229 31.01312
23 38.95982 0.9007511 27.44330 0.3847672 28.69001
24 29.47992 0.6815757 20.76565 0.2911437 21.70901
25 34.93208 0.8076297 24.60616 0.3449892 25.72398
=================

I got the same results on 2 different computers, on APE 2.4and 2.7, so I doubt it is something specific to my computer.

The code below should load the data & analysis from scratch& reproduce the weirdness. Thoughts welcome!


Nick


==================



library(ape)

#############################
# Setup
#############################
# as.data.frame shortcut function
adf <- function(x)
        {

return(as.data.frame(x, row.names=NULL,stringsAsFactors=FALSE))

        }


#############################
# Read in the ultrametric tree
#############################

trstring ="(((tessulatus:0.5015255487,(((gloriamaris:0.3047954399,(ammiralis:0.2417876544,(dalli:0.1923168953,textile:0.1923168953):0.04947075908):0.06300778553):0.07060254417,(furvus:0.3149966906,((crocatus:0.2453337596,omaria:0.2453337596):0.009171713364,(aulicus:0.1887595997,episcopatus:0.1887595997):0.06574587326):0.06049121772):0.06040129347):0.07403402886,(bandanus:0.06858755301,marmoreus:0.06858755301):0.38084446):0.05209353574):0.06339155354,((laterculatus:0.4407953441,(arenatus:0.1438581051,pulicarius:0.1438581051):0.296937239):0.06667390592,(consors:0.3221133412,(aurisiacus:0.2279167829,stercusmuscarum:0.2279167829):0.09419655831):0.1853559088):0.05744785226):0.4350828977,orbignyi:1);"


chtr2 = read.tree(file="", trstring)
num_internal_nodes = chtr2$Nnode

treenodes = c(20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,31, 32, 33, 34, 35, 36, 37)



#############################
# Read in trait values
#############################

tmpdata = c(8, 15, 15, 15, 15, 8, 15, 15, 15, 15, 15, 15, 6,15, 15, 10, 15, 15, 8, 0.42, 0.28, 0.19, 0.28, 0.28, 0.42,0.3, 0.28, 0.28, 0.28, 0.25, 0.25, 0.3, 0.3, 0.3, 0.35, 0.3,0.3, 0.42, 6.7, 14, 17, 14, 14, 6.7, 5, 14, 14, 14, 10, 10,5, 5, 5, 5, 5, 5, 6.7, 0.033, 0.17, 0.3, 0.17, 0.17, 0.033,0.1, 0.17, 0.17, 0.17, 0.17, 0.17, 0.01, 0.1, 0.1, 0.2, 0.1,0.1, 0.033, 8.3, 12.7, 3, 12.7, 12.7, 8.3, 5, 12.7, 12.7,12.7, 20, 20, 5, 5, 5, 10, 5, 5, 8.3, 0.074, 0.1, 0.54, 0.1,0.1, 0.074, 0.06, 0.1, 0.1, 0.1, 0.35, 0.35, 0.06, 0.06,0.06, 0.15, 0.06, 0.06, 0.074, 30, 19.5, 20, 19.5, 19.5, 31,7.9, 25, 19.5, 19.5, 40, 40, 5, 8, 8, 6, 8, 22, 30, 0.4,0.1, 0.37, 0.1, 0.1, 0.37, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1,0.1, 0.15, 0.15, 0.1, 0.15, 0.03, 0.4, 5, 4.5, 5, 7, 4.5, 5,3.5, 5, 4, 4, 6, 6, 2.5, 3.5, 3.5, 1.5, 3.1, 1.3, 5, 0.015,0.005, 0.008, 0.006, 0.006, 0.006, 0.02, 0.0065, 0.006,0.0065, 0.007, 0.01, 0.009, 0.0044, 0.02, 0.06, 0.015,0.0067, 0.008, 0.029, 0.007, 0.041, 0.007, 0.007, 0.018,0.1, 0.007, 0.007, 0.007, 0.12, 0.18, 0.03, 0.012, 0.055,0.13, 0.041, 0.009, 0.015, 5, 3.665, 3.96, 3.665, 3.665, 5,2.28, 3.665, 3.665, 3.665, 3.79, 3.79, 3.28, 2.28, 2.28, 1,2.28, 2.28, 5, 0.02, 0.21, 0, 0.21, 0.21, 0.02, 0, 0.21,0.21, 0.21, 0.45, 0.45, 0, 0, 0, 0, 0, 0, 0.02, 4, 2.665,2.96, 2.665, 2.665, 4, 1.28, 2.665, 2.665, 2.665, 2.79,2.79, 2.28, 1.28, 1.28, 0, 1.28, 1.28, 4, 0.01, 0.33, 0.17,0.33, 0.33, 0.01, 0.32, 0.33, 0.33, 0.33, 0.56, 0.56, 0.3,0.32, 0.32, 0, 0.3, 0.3, 0.01, 1.2, 1.2, 6.82, 1.2, 1.2,1.2, 1.78, 1.2, 1.2, 1.2, 2.47, 2.47, 2.58, 1.78, 1.78, 1,1.78, 1.78, 1.2, 0.15, 0.085, 0, 0.085, 0.085, 0.15, 0,0.085, 0.085, 0.085, 0.03, 0.03, 0, 0, 0, 0, 0, 0, 0.15,0.2, 0.2, 5.82, 0.2, 0.2, 0.2, 0.78, 0.2, 0.2, 0.2, 1.47,1.47, 1.58, 0.78, 0.78, 0, 0.78, 0.78, 0.2, 0.14, 0.18,0.04, 0.18, 0.18, 0.14, 0.32, 0.18, 0.18, 0.18, 0.25, 0.25,0.4, 0.32, 0.32, 0, 0.29, 0.29, 0.14, 0.03, 0.03, 0.03,0.03, 0.03, 0.04, 0.01, 0.03, 0.03, 0.03, 0, 0, 0.03, 0.02,0.01, 0.03, 0.03, 0.03, 0.03, 0.1, 0.05, 0.08, 0.1, 0.05,0.09, 0.01, 0.05, 0.1, 0.05, 0.1, 0.09, 0.05, 0.05, 0.05,0.05, 0.05, 0.05, 0.05, 6.5, 5, 5, 2, 2, 4, 1.5, 6, 2.5, 2,1.3, 4, 8, 3.5, 2.5, 5, 8, 2, 8, 251, 251, 251, 251, 251,351, 251, 251, 251, 251, 451, 451, 251, 251, 251, 251, 251,251, 251, 500, 1750, 500, 500, 500, 500, 500, 500, 500, 500,750, 750, 500, 500, 500, 300, 500, 500, 500)


dim(tmpdata) = c(19,24)

tmp_rownames = c("tessulatus", "gloriamaris", "ammiralis","dalli", "textile", "furvus", "crocatus", "omaria","aulicus", "episcopatus", "bandanus", "marmoreus","laterculatus", "arenatus", "pulicarius", "consors","aurisiacus", "stercusmuscarum", "orbignyi")

tmp_colnames = c("s1n", "s1t", "s2en", "s2et", "s2in","s2it", "s3n", "s3t", "sf2ea", "sf2ed", "sf2id", "tf2eb1","tf2ec1", "tf2eb2", "tf2ec2", "tf2ib1", "tf2ic1", "tf2ib2","tf2ic2", "kz", "kr", "turn" , "N", "T" )


char_dtf = tmpdata
char_dtf = adf(char_dtf)
names(char_dtf) = tmp_colnames
row.names(char_dtf) = tmp_rownames



################################################
# I get very strange confidence intervals when I run ACE
# under PIC on the raw values
#
# i.e., every trait column has exactly the same range, no
# matter which trait
################################################

PICcont_means = array(NA, dim=c(num_internal_nodes,ncol(char_dtf)) )PICcont_lower025 = array(NA, dim=c(num_internal_nodes,ncol(char_dtf)) )PICcont_upper975 = array(NA, dim=c(num_internal_nodes,ncol(char_dtf)) )


for (i in 1:ncol(char_dtf))
        {

PICcont = ace(char_dtf[,i], chtr2, type="continuous",method="pic", CI=TRUE, model="BM")

        PICcont_means[, i] = PICcont$ace
        PICcont_lower025[, i] = PICcont$CI95[, 1]
        PICcont_upper975[, i] = PICcont$CI95[, 2]

        }
PICcont_means = adf(PICcont_means)
names(PICcont_means) = names(char_dtf)


row.names(PICcont_means) = treenodes
PICcont_lower025 = adf(PICcont_lower025)
names(PICcont_lower025) = names(char_dtf)
row.names(PICcont_lower025) = row.names(PICcont_means)
PICcont_upper975 = adf(PICcont_upper975)
names(PICcont_upper975) = names(char_dtf)
row.names(PICcont_upper975) = row.names(PICcont_means)


# Look at this output.  The means look OK...
PICcont_means

# But the confidence intervals are the same size
# for each variable!
# Strange, no?
PICcont_upper975-PICcont_lower025







################################
# For some reason, taking each trait and
# dividing by the mean of the tips,
# then multiplying each ancestral ACE
# estimate by the relevant tip mean
# seems to fix the problem -- at least,
# they aren't the same for each variable
#
################################


low_char_dtf = char_dtf

# Divide observed traits by their mean
for (i in 1:nrow(char_dtf))
        {
        low_char_dtf[i, ] = low_char_dtf[i,] / mean(char_dtf)
        }


# Be sure to transform back to normal coordinates
for (i in 1:ncol(char_dtf))
        {

PICcont_ln = ace(low_char_dtf[,i], chtr2,type="continuous", method="pic", CI=TRUE, model="BM")

        
        # Multiply estimated traits by their mean
        PICcont_means[, i] = mean(char_dtf[,i])*(PICcont_ln$ace)

PICcont_lower025[, i] =mean(char_dtf[,i])*(PICcont_ln$CI95[, 1])PICcont_upper975[, i] =mean(char_dtf[,i])*(PICcont_ln$CI95[, 2])


        }
PICcont_means = adf(PICcont_means)
names(PICcont_means) = names(char_dtf)


row.names(PICcont_means) = treenodes
PICcont_lower025 = adf(PICcont_lower025)
names(PICcont_lower025) = names(char_dtf)
row.names(PICcont_lower025) = row.names(PICcont_means)
PICcont_upper975 = adf(PICcont_upper975)
names(PICcont_upper975) = names(char_dtf)
row.names(PICcont_upper975) = row.names(PICcont_means)


# means OK
PICcont_means

# range of 95% CI is also variable now
# and node 20, the deepest node, also has
# the largest confidence intervals
PICcont_upper975-PICcont_lower025






#######################################
# Doing it in log space also works
#######################################
ln_char_dtf = char_dtf

min_char_dtf = NULL
for (col in 1:ncol(char_dtf))
        {
        # take min of nonzero values
        tmpmin = min(char_dtf[,col][char_dtf[,col] != 0])
        min_char_dtf = c(min_char_dtf, tmpmin)
        }

# change 0s to 10% of trait minimum
for (col in 1:ncol(ln_char_dtf))
        {
        # assign 10% of nonzero minimum

ln_char_dtf[,col][ln_char_dtf[, col] == 0] =min_char_dtf[col] * 0.1

        }

# log-transform
ln_char_dtf = log(ln_char_dtf)


for (i in 1:ncol(ln_char_dtf))
        {

PICcont = ace(ln_char_dtf[,i], chtr2, type="continuous",method="pic", CI=TRUE, model="BM")

        # convert back to nonlog space
        PICcont_means[, i] = exp(PICcont$ace)
        PICcont_lower025[, i] = exp(PICcont$CI95[, 1])
        PICcont_upper975[, i] = exp(PICcont$CI95[, 2])

        }
PICcont_means = adf(PICcont_means)
names(PICcont_means) = names(char_dtf)


row.names(PICcont_means) = treenodes
PICcont_lower025 = adf(PICcont_lower025)
names(PICcont_lower025) = names(char_dtf)
row.names(PICcont_lower025) = row.names(PICcont_means)
PICcont_upper975 = adf(PICcont_upper975)
names(PICcont_upper975) = names(char_dtf)
row.names(PICcont_upper975) = row.names(PICcont_means)


# Look at this output.  The means look OK...
PICcont_means

# And the 95% CI range looks OK also
PICcont_upper975-PICcont_lower025










==================================================



--
====================================================
Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher

Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Graduate Student Instructor, IB200B
Principles of Phylogenetics: Ecology and Evolution
http://ib.berkeley.edu/courses/ib200b/
http://phylo.wikidot.com/


Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html

Dept. personal page:http://ib.berkeley.edu/people/students/person_detail.php?person=370Lab personal page:http://fisher.berkeley.edu/cteg/members/matzke.html

Lab phone: 510-643-6299
Dept. fax: 510-643-6264

Cell phone: 510-301-0179
Email: mat...@berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-----------------------------------------------------

"[W]hen people thought the earth was flat, they were wrong.When people thought the earth was spherical, they werewrong. But if you think that thinking the earth is sphericalis just as wrong as thinking the earth is flat, then yourview is wronger than both of them put together."

Isaac Asimov (1989). "The Relativity of Wrong." TheSkeptical Inquirer, 14(1), 35-44. Fall 1989.

http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

[R-sig-phylo] weirdness in confidence intervals returned by ace, pic option

Reply via email to