Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

David Bapst Wed, 08 Oct 2014 07:53:12 -0700

Hello all,

Unable to get phyloch's read.mrbayes() to read my .con.tre files (for
reasons unknown), I've gone with Graham's suggestion to use the simple
output format from sumt. However, as I have a large number of analyses
which took considerable computational time (and I am stubborn to learn
how to run these on a cluster) I decided the more economical thing to
do was to write an R script that took the existing nexus files and
wrote a nexus file with MrBayes blocks that would output 'simple'
format consensus trees. I realize such string manipulation might be
easier in some other language, but I know R best.


Here it is, in case anyone else ever needs such a thing:

_________________________________________________

#change to dir for MB files
setwd("C:/dave/workspace/mrbayes")

#get files
files<-list.files()
#get tree files
treeFiles<-files[grep(x=files,pattern="run..t")]
#get the nexus files
inputs<-sort(unique(sub(".run..t","",treeFiles)))
inputs<-sort(unique(sub(".tree.","",inputs)))

#now start constructing sumt lines
bayesBlock<-"#NEXUS"
for(i in 1:length(inputs)){
    origNex<-scan(inputs[i],what="character",sep="\n",blank.lines.skip=FALSE)
    cropNex<-origNex[2:grep(origNex,pattern="mcmcp")]
    #remove log line
    cropNex<-cropNex[-grep(cropNex,pattern="log start")]
    sumtLine<-paste0("sumt filename=",inputs[i]," outputname=",
        paste0(inputs[i],".simple")," conformat=simple;")
    bayesBlock<-c(bayesBlock,cropNex,sumtLine,"","end;","")
    }

write(bayesBlock,file="simple_sumt.nex")

_______________________________________________

Some caveats: this script assumes you already have mrbayes blocks in
your nexus files (all of which are stored with all output in the same
directory), including a line where you set MCMC parameters (mcmcp)
just prior to calling the MCMC itself and after any other
modifications to the dataset, model or prior settings.

If you execute the nexus file created, it should run sumt and output
new output files with 'simple' in the file name. Analyses where there
are multiple trees (such as partitioned analyses with unlinked branch
lengths) will have multiple consensus trees estimated for them, as
expected if we had originally done this immediately after running
mcmc. Also note I crop out any lines that create new log files,
because I didn't want to write over my old log files.

The resulting simple-format consensus tree files can be read into ape
with read.nexus just fine. For whatever reason, these nexus files
contain two trees, otherwise identical other than that one contains
the posterior probabilities as its node-labels.

....However! There is a big however here.

If we re-root the tree in ape (and there are many reasons why we might
want to do so), the posterior probabilities, which are stored as node
labels, no longer make sense as those node labels are not properties
technically of the nodes but of the edges preceding those nodes.
Change the directionality of the graph by rerooting and those
posterior probabilities end up in the wrong place. (TreeFig apparently
figures this out on its own, automatically, if your reroot in
TreeFig.)

I haven't found any option in an R package that slides node-labels
around as if they were support values when the tree is rerooted (and
if there is one, please let me know!) so I had to write my own thing
for doing that to the newly created 'simple' consensus tree files...

So here's another script.

_____________________________________________

#change dir
setwd("C:/dave/workspace/mrbayes")

library(ape)

files<-list.files()
#get consensus nexus files (ugh regexp)
conFiles<-files[grep(x=files,pattern=".con.tre")]
#get only 'simple' files
conFiles<-conFiles[grep(x=conFiles,pattern="simple")]
conName<-sub(".nex.simple","",conFiles)
conName<-sub(".con.tre","",conName)

trees<-list()
for(i in 1:length(conFiles)){
    treesRead<-read.nexus(conFiles[i])
    treeA<-treesRead[[2]]
    treeB<-treesRead[[1]]
    #now we're going to reroot and let's use fake species names hahah
    treeA<-root(treeA,resolve.root=TRUE,outgroup=c("OutOfTheGroup","OutThere",
        "outgroupy","groupout"))
    #okay basically need to figure out
        #which taxa are included by a node
        #and match between treeB and tree A
    splitsB<-lapply(prop.part(treeB),function(x)
        if(any(x==1)){treeB$tip.label[x]
        }else{treeB$tip.label[-x]})
    splitsA<-lapply(prop.part(treeA),function(x)
        if(any(x==1)){treeA$tip.label[x]
        }else{treeA$tip.label[-x]})
    #plot(treeB,show.node.label=TRUE)
    treeA$node.label<-treeB$node.label[match(splitsA,splitsB)]
    trees[[i]]<-treeA
    }
class(trees)<-"multiPhylo"

#plot(trees)

pdf("consensus_trees_10-07-14.pdf",height=8,width=8)
for(i in 1:length(trees)){
    plot(trees[[i]],main=conName[i],show.node.label=TRUE)
    add.scale.bar()
    }
dev.off()

_____________________________________________________

The last bit just plots it in a pdf, which isn't important at all but
I included it for the heck of it. The important stuff is the innards
of the first for loop, which reads in the 'simple' consensus tree
nexus file, reroots it and then rearranges the posterior probabilities
to match the same splits across both trees. I've checked this against
the read out logged by MrBayes and the result of rerooted trees in
FigTree, and the resulting posterior probabilities match those shown
across all three programs.

Anyway, I hope this helps the next person trying to figure out how to
plot posterior probabilites from MrBayes in R!

Cheers,
-Dave



On Fri, Oct 3, 2014 at 10:29 AM, David Bapst <dwba...@gmail.com> wrote:
> Just to clarify, off-list discussion with Graham reveals (after some
> confusion on my part) that if the simple format option is used in sumt
> in MrBayes, then ape's read.nexus will natively read the posterior
> probability values as node-labels. Which is interesting and I had not
> come across that information previously.
>
> Graham's comment is not in reference to the functioning of 
> read.annotated.nexus.
>
> Cheers,
> -Dave
>
>
> On Fri, Oct 3, 2014 at 8:49 AM, Slater, Graham <slat...@si.edu> wrote:
>> Dave,
>>
>> I’ve had the same trouble with shuffling. However, all of this can be
>> avoided if you specify the simple format for your .con file in the mrBayes
>> block.
>>
>> sumt conformat = simple;
>>
>> The resulting tree will correctly display posterior probabilities in a
>> phyloformat tree.
>>
>> Graham
>>
>>
>>
>> ------------------------------------------------------------
>> Graham Slater
>> Peter Buck Post-Doctoral Fellow
>> Department of Paleobiology
>> National Museum of Natural History
>> The Smithsonian Institution [NHB, MRC 121]
>> P.O. Box 37012
>>
>>
>> (202) 633-1316
>> slat...@si.edu
>> www.fourdimensionalbiology.com
>>
>>
>>
>>
>>
>> On Oct 3, 2014, at 10:42 AM, David Bapst <dwba...@gmail.com> wrote:
>>
>> Hello all,
>>
>> Recently, I wanted to display posterior probabilities on a 50%
>> compatibility tree from a MrBayes run, created with the 'sumt'
>> command. I looked around for ways to do this and found this email
>> thread from last year:
>>
>> https://stat.ethz.ch/pipermail/r-sig-phylo/2013-June/002825.html
>>
>> ...which suggests using read.annotated.nexus() in package epibase,
>> which has been renamed OutbreakTools more recently.
>>
>> Unfortunately, it appears read.annotated.nexus in OutbreakTools
>> (v0.1-11, in R 3.1.1) improperly creates the new edge matrix,
>> resulting in an apparent shuffling of tip labels. As this function was
>> discussed on R-Sig-Phylo, I am sending this bug report to both the
>> list and to the package maintainer (Thibaut J.).
>>
>> I don't have any BEAST output files, so I cannot test if this occurs
>> with files that are not created by MrBayes.
>>
>> To show this phenomen, I have created an example .con.tre file using
>> an example matrix of standard characters for several elephants that I
>> stole off of Joe F.'s website. You can find the .con.tre file here on
>> gdrive:
>>
>> https://drive.google.com/file/d/0B_xvEcEvKno_LTFhSVJrdHMtNFU/view?usp=sharing
>>
>> And now in R, a comparison with a tree:
>>
>> library(OutbreakTools)
>> library(ape)
>>
>> tree1<-read.nexus("mat.nex.con.tre")
>> tree2<-read.annotated.nexus("mat.nex.con.tre")
>>
>> layout(1:2)
>> plot(tree1,no.margin=TRUE)
>> plot(tree2,no.margin=TRUE)
>>
>> identical(tree1$tip.label,tree2$tip.label)
>> identical(tree1$edge.length,tree2$edge.length)
>> identical(tree1$edge,tree2$edge)
>>
>> Inspection of the tree file with FigTree suggests that the ape
>> function is producing the correct tree and the OutbreakTools function
>> is not. We can see the why with last three lines in the console:
>>
>> identical(tree1$tip.label,tree2$tip.label)
>>
>> [1] TRUE
>>
>> identical(tree1$edge.length,tree2$edge.length)
>>
>> [1] TRUE
>>
>> identical(tree1$edge,tree2$edge)
>>
>> [1] FALSE
>>
>> Closer inspection suggests that the issue appears to be that some
>> terminal edges are inappropriately shuffled, thus shuffling the
>> terminal tip labels on those edges, while leaving the tip.label order
>> and the edge lengths the same. Also, larger trees seem to produce
>> larger inconsistencies in the tree produced.
>>
>> So... now for the list, are there any other methods on CRAN for
>> obtaining the posterior probabilities from a .con.tre file? Otherwise,
>> next I guess I'll be trying phyloch.
>>
>> Cheers,
>> -Dave
>>
>> --
>> David W. Bapst, PhD
>> Adjunct Asst. Professor, Geology and Geol. Eng.
>> South Dakota School of Mines and Technology
>> 501 E. St. Joseph
>> Rapid City, SD 57701
>>
>> http://webpages.sdsmt.edu/~dbapst/
>> http://cran.r-project.org/web/packages/paleotree/index.html
>>
>> _______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
>>
>
>
>
> --
> David W. Bapst, PhD
> Adjunct Asst. Professor, Geology and Geol. Eng.
> South Dakota School of Mines and Technology
> 501 E. St. Joseph
> Rapid City, SD 57701
>
> http://webpages.sdsmt.edu/~dbapst/
> http://cran.r-project.org/web/packages/paleotree/index.html



-- 
David W. Bapst, PhD
Adjunct Asst. Professor, Geology and Geol. Eng.
South Dakota School of Mines and Technology
501 E. St. Joseph
Rapid City, SD 57701

http://webpages.sdsmt.edu/~dbapst/
http://cran.r-project.org/web/packages/paleotree/index.html

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

Reply via email to