[R-sig-phylo] Running R on a Computer Cluster in the Cloud - cloudnumbers.com
Dear phylogenetic and comparative methods experts, cloudnumbers.com provides researchers and companies with the resources to perform high performance calculations in the cloud. As cloudnumbers.com's community manager I may invite you to register and test R on a computer cluster in the cloud for free: http://my.cloudnumbers.com/register Our aim is to change the way of research collaboration is done today by bringing together scientists and businesses from all over the world on a single platform. cloudnumbers.com is a Berlin (Germany) based international high-tech startup striving for enabling everyone to benefit from the High Performance Computing related advantages of the cloud. We provide easy access to applications running on any kind of computer hardware: from single core high memory machines up to 1000 cores computer clusters. Our platform provides several advantages: * Turn fixed into variable costs and pay only for the capacity you need. Watch our latest saving costs with cloudnumbers.com video: http://www.youtube.com/watch?v=ln_BSVigUhgfeature=player_embedded * Enter the cloud using an intuitive and user friendly platform. Watch our latest cloudnumbers.com in a nutshell video: http://www.youtube.com/watch?v=0ZNEpR_ElV0feature=player_embedded * Be released from ongoing technological obsolescence and continuous maintenance costs (e.g. linking to libraries or system dependencies) * Accelerated your R, C, C++, Fortran, Python, ... calculations through parallel processing and great computing capacity - more than 1000 cores are available and GPUs are coming soon. * Share your results worldwide (coming soon). * Get high speed access to public databases (please let us know, if your favorite database is missing!). * We have developed a security architecture that meets high requirements of data security and privacy. Read our security white paper: http://d1372nki7bx5yg.cloudfront.net/wp-content/uploads/2011/06/cloudnumberscom-security.whitepaper.pdf This is only a selection of our top features. To get more information check out our web-page (http://www.cloudnumbers.com/) or follow our blog about cloud computing, HPC and HPC applications (with R): http://cloudnumbers.com/blog Register and test for free now at cloudnumbers.com: http://my.cloudnumbers.com/register We are looking forward to get your feedback and consumer insights. Take the chance and have an impact to the development of a new cloud computing calculation platform. Best Markus -- Dr. rer. nat. Markus Schmidberger Senior Community Manager Cloudnumbers.com GmbH Chausseestraße 6 10119 Berlin www.cloudnumbers.com E-Mail: markus.schmidber...@cloudnumbers.com * Amtsgericht München, HRB 191138 Geschäftsführer: Erik Muttersbach, Markus Fensterer, Moritz v. Petersdorff-Campen ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] cut a tree in a given time interval
Oops, left that out. Here it is as text and an attached file. You will have to do library(ape) and maybe library(phangorn) ...ahead of time (hopefully not others...) chainsaw - function(tr, timepoint=10) { # Take a tree and saw it off evenly across a certain timepoint. # This removes any tips above the timepoint, and replaces them # with a single tip representing the lineage crossing # the timepoint (with a new tip name). # Get the tree in a table tr_table = prt(tr, printflag=FALSE) # Find the tips that are less than 10 my old and drop them TF_exists_more_recently_than_10mya = tr_table$time_bp timepoint # Get the corresponding labels labels_for_tips_existing_more_recently_than_10mya = tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ] ### # Draft chainsaw function ### # loop through the branches that cross 10 mya # get a list of the edge start/stops in the phylogeny's edges edge_times_bp = get_edge_times_before_present(tr) # which of these branches cross 10 mya? edges_start_earlier_than_10mya = edge_times_bp[, 1] timepoint edges_end_later_than_10mya = edge_times_bp[, 2] = timepoint edges_to_chainsaw = edges_start_earlier_than_10mya + edges_end_later_than_10mya == 2 # then, for each of these edges, figure out how many tips exist descending from it nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw] nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw length(tr$tip.label)] # create a copy of the tree to chainsaw tree_to_chainsaw = tr for (i in 1:length(nodes_to_chainsaw)) { tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i]) #print(tmp_subtree$tip.label) tmp_number_of_tips = length(tmp_subtree$tip.label) #print(tmp_number_of_tips) # number of tips to drop = (numtips -1) numtips_to_drop = tmp_number_of_tips - 1 # tips_to_drop tmp_labels = tmp_subtree$tip.label labels_to_drop = tmp_labels[1:numtips_to_drop] # new label label_kept_num = length(tmp_labels) label_kept = tmp_labels[label_kept_num] new_label = paste(CA_, label_kept, +, numtips_to_drop, _tips, sep=) tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == label_kept] = new_label # chop off e.g. 2 of the 3 tips tree_to_chainsaw = drop.tip(tree_to_chainsaw, labels_to_drop) } #plot(tree_to_chainsaw) #axisPhylo() tree_to_chainsaw_table = prt(tree_to_chainsaw, printflag=FALSE) tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = tree_to_chainsaw_table$time_bp timepoint tmp_edge_lengths = tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] times_bp_for_edges_to_chainsaw = tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] adjustment = times_bp_for_edges_to_chainsaw - timepoint revised_tmp_edge_lengths = tmp_edge_lengths + adjustment tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] = revised_tmp_edge_lengths # revised ordered_nodenames = get_nodenums(tree_to_chainsaw) parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, tree_to_chainsaw$edge[,2]) NA_false = is.not.na(tree_to_chainsaw_table$edge.length) tree_to_chainsaw$edge.length[parent_branches[NA_false]] = tree_to_chainsaw_table$edge.length[NA_false] return(tree_to_chainsaw) } # print tree in hierarchical format prt - function(t, printflag=TRUE, relabel_nodes = FALSE, time_bp_digits=7) { # assemble beginning table # check if internal node labels exist if (node.label %in% attributes(t)$names == FALSE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } # or manually relabel the internal nodes, if desired if (relabel_nodes == TRUE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } labels = c(t$tip.label, t$node.label) ordered_nodenames = get_nodenums(t) #nodenums = 1:length(labels) node.types1 = rep(tip, length(t$tip.label)) node.types2 = rep(internal, length(t$node.label)) node.types2[1] = root node.types = c(node.types1, node.types2) # These are the index numbers of the edges below each node parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, t$edge[,2]) #parent_edges = parent_branches brlen_to_parent = t$edge.length[parent_branches] parent_nodes = t$edge[,1][parent_branches] daughter_nodes = lapply(ordered_nodenames,
[R-sig-phylo] pic() vs gls()
Dear R-sig-phylo users,I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function).From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts.The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits.These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment).Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model?Thanks in advance for any advice or comments offered!!Cheers,Nick MasonBelow I have the code I have been using (also attached as Mason.R):require(ape)require(nlme)require(geiger)read.csv(file="Mason_data.csv")-smdatarownames(smdata)-smdata[,1]smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me.read.tree(file="Mason_tree.tre")-spbmname.check(smdata,spbm)fitContinuous(spbm,smdata,model="OU")-ou2pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))summary(lm(pr_contrast~pc1_contrast-1))pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata)summary(pglsModel1a)pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata)summary(pglsModel1b) Mason_tree.tre Description: Binary data ,Pause_Rate,Comp.1,Comp.2 Sporophila_schistacea,0.97572108797618,1.3796099841333,-1.12587956726316 Sporophila_intermedia,0.314864247370089,2.36946494397968,1.16313695174602 Sporophila_plumbea,0.525348288841447,2.71609091582214,-0.0201502845760047 Sporophila_torqueola,0.090114188687708,3.55768691291833,0.0588440560485339 Sporophila_collaris,0.258598106973663,-0.866576621272912,-0.375459372210033 Sporophila_lineola,1.91924407069217,5.38973476610418,-0.719604956651293 Sporophila_luctuosa,0.316867010852651,3.51591652591355,-1.01122149539569 Sporophila_nigricollis,0.646766139216069,4.60967961843748,0.403572095059875 Sporophila_caerulescens,1.19967606461856,4.42942981620688,0.266281245585206 Sporophila_albogularis,-0.747926431242122,2.14986795813319,0.453292241973436 Sporophila_peruviana,-0.662055276873553,-5.75668334764164,0.194910239070253 Sporophila_simplex,-0.0159932866483984,0.50911606352083,-0.961689651320771 Sporophila_bouvreuil,-0.113798179556032,4.96419876880559,-0.0444962146993295 Sporophila_minuta,-0.653039978630658,4.51909264308507,0.271433436605055 Sporophila_hypoxantha,-1.33520075061216,3.6424747461571,-0.143885556798834 Sporophila_ruficollis,-1.10188388209151,3.69620160094318,-1.122292891 Sporophila_castaneiventris,-0.957617880681526,5.13264024147439,0.71993032372333 Sporophila_hypochroma,-0.860526127835558,3.70604616959997,-0.317101374278479 Sporophila_cinnamomea,-0.859253764452833,4.92903311573274,-0.397590156130398 Sporophila_melanogaster,0.246280339669374,3.57517426978125,0.492140024454993 Sporophila_telasco,0.512066210743585,4.57687350229544,0.151380525693064 Oryzoborus_nuttingi,-0.803111018941645,-16.0737467067366,-1.53209487276288 Oryzoborus_crassirostris,-0.57221657664358,-10.5639148825931,1.92861328894315 Oryzoborus_atrirostris,-0.871304566495871,-18.0081579920927,-0.767118053337893 Oryzoborus_maximiliani,-0.326024013387329,-14.0150572642919,-0.0204930609520138 Oryzoborus_angolensis,-0.142855044536879,-6.48632361126553,0.96227143885814 Dolospingus_fringilloides,1.26961240149379,-2.93723705653849,1.74362516139737 Mason.R Description: Binary data -Nicholas Albert MasonMS Candidate, Burns LabDepartment of Biology - EB ProgramSan Diego State University5500 Campanile DriveSan Diego, CA 92182-4614845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
Hi Nick. For your first (simple) problem, I believe you want to do: read.csv(file=Mason_data.csv,row.names=1)-smdata Regarding the more complicated issue, the problem of non-independence in linear regression comes in the residual error of the model. Thus, you should fit a correlation structure for the error in Y given X (not for X Y separately as you have done with PICs here). This indeed can be done using gls() - so, for instance: pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) fits a linear model in which the residual error of Pause_Rate given Comp.1 is distributed according to the correlation structure corMartins() with alpha estimated using REML. The way to reproduce this result using contrasts is the following: alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha smdata-as.matrix(smdata) # important pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha)) summary(lm(pr_contrast~pc1_contrast-1)) Which if you compare to: summary(pglsModel1a) you should find yields the same results and P-value. (Note that smdata-as.matrix(smdata) seems to be important here as if smdata is a data frame, then smdata[,1] does not inherit the rownames of smdata and pic() will return an incorrect set of contrasts.) I hope this helps. No doubt other subscribers to the list may have something to add. Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/13/2011 11:07 PM, Nicholas Mason wrote: Dear R-sig-phylo users, I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function). From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts. The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits. These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment). Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model? Thanks in advance for any advice or comments offered!! Cheers, Nick Mason Below I have the code I have been using (also attached as Mason.R): require(ape) require(nlme) require(geiger) read.csv(file=Mason_data.csv)-smdata rownames(smdata)-smdata[,1] smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me. read.tree(file=Mason_tree.tre)-spbm name.check(smdata,spbm) fitContinuous(spbm,smdata,model=OU)-ou2 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha)) summary(lm(pr_contrast~pc1_contrast-1)) pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1a) pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1b) - Nicholas Albert Mason MS Candidate, Burns Lab Department of Biology - EB Program San Diego State University 5500 Campanile Drive San Diego, CA 92182-4614 845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Assigning node ages to a tree, revisited
Hi Roger, Can you provide a reproducible example (perhaps a subset of your tree and their tip and node ages if the tree is very large)? I don't know what the problem could be without the data, but perhaps Gene knows? Scott On Wednesday, July 13, 2011 at 10:57 PM, Roger Close wrote: Hello all, I wish to transform branch lengths on a tree according to ages of terminal taxa and internal node ages; to this end I have tried to implement the script written by Gene Hunt mentioned in a recent post to this list (http://www.mail-archive.com/r-sig-phylo@r-project.org/msg01005.html; script available at https://gist.github.com/938313). However, when attempting to run the command scalePhylo(tr, tip.ages, node.mins), it crunches away for hours on my computer (a 2010 MacBook Pro Core i7, which is quite fast). Clearly there's something wrong. I'm afraid I don't know enough about R to debug a script. I thought I'd send this to the list in case someone other than Gene Hunt or Scott Chamberlain has had a go at using this script. Thanks in advance, Roger [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo