Re: [R-sig-phylo] problems with assign(), paste(), and data.frame() for folders containing trees
Hi John, You seem to transform a (relatively) simple problem into a complicated one. First, you can get all the tree file names in one command, such as (I moved to a directory with one subdir with trees estimated by ML and another one by NJ; this is slightly arranged): > f <- grep("\\.tre", list.files(recursive = TRUE), value = TRUE) > f [1] "ML/trees_55species.tre" "ML/treesNADH2_45species.tre" [3] "ML/TR.ML.Dloop_55sp.tre" "ML/TR.ML.NADH2_45sp.tre" [5] "nj/trees_55species.tre" "nj/treesNADH2_45species.tre" [7] "nj/TR.NJ.Dloop_55sp.tre" "nj/TR.NJ.NADH2_45sp.tre" Because in R file paths are resolved relatively, there is no need to navigate with setwd(). Second, I think you should use a list instead of a data frame because (I presume) you may have files with different numbers of trees (if this is not the case, you can transform the list in a data frame later). You may have commands like this, eg, if you want to get the mean branch length of each tree: ntree <- length(f) L <- list() for (i in 1:ntree) { tr <- read.tree(f[i]) if (class(tr) == "phylo") L[[i]] <- mean(tr$edge.length) if (class(tr) == "multiPhylo") L[[i]] <- sapply(tr, function(x) mean(x$edge.length)) } Finally, you may name your list with the file names: names(L) <- f This has the advantage that you can select some of the results, eg, the trees that were estimated by NJ: > grep("nj/", names(L)) [1] 5 6 7 8 or those from D-loop: > grep("Dloop", names(L)) [1] 3 7 You get the number of trees in each file with sapply(L, length). There can be many variations around this scheme. For instance, if you want to extract the branch lengths as in your example, the two lines above with "mean" would become: if (class(tr) == "phylo") L[[i]] <- tr$edge.length if (class(tr) == "multiPhylo") L[[i]] <- lapply(tr, "$", edge.length) Best, Emmanuel -Original Message- From: John Denton Sender: r-sig-phylo-boun...@r-project.org Date: Tue, 24 Apr 2012 20:30:26 To: R Sig Phylo Listserv Subject: [R-sig-phylo] problems with assign(), paste(), and data.frame() for folders containing trees Hi folks, I am trying to recurse through several numbered subfolders in a directory. Each folder has many trees that I want to display summary values for. I have been expanding data frames using code with the structure name <- rbind(name, newvals) to produce a data frame with n rows equal to the number of files in one of the folders, and n column equal to the number of values in the file. I can loop over the values within a single subdirectory fine with, for example, library(ape) trees <- list.files(pattern="*.tre") iters=length(trees) branchdata.5 <- data.frame() iterations <- as.character(c(1:length(trees))) for (i in 1:iters) { tree <- read.tree(trees[i]) iteration.edges.5 <- as.numeric(tree$edge.length) branchdata.5 <- rbind(branchdata.5, iteration.edges.5) } The problem comes when I want to iterate through the numbered subdirectories while also iterating through the files in a given directory. I want to recursively assign these data frames as well, with something like f <- list.dirs(path = "/.../.../etc", full.names = FALSE, recursive = FALSE) for (j in 1:length(f)) { setwd(paste("/.../.../.",j,sep="")) assign( paste("branchdata.5",j,sep=""), data.frame() ) iterations <- as.character(c(1:length(trees))) for (i in 1:iters) { tree <- read.tree(trees[i]) assign(paste("iteration.edges.5",j,sep=""), as.numeric(tree$edge.length) ) paste("branchdata.5",j,sep="") <- rbind(paste("branchdata.5",j,sep=""), paste("iteration.edges.5",j,sep="")) } names(iterations) <- NULL boxplot(t(paste("branchdata.5",j,sep="")) , horizontal=TRUE , names=iterations , ylim=c(0,2), xlab="Branch Lengths" , ylab="Iterations" , main = "") } The problem seems to be in the rbind() when using values with assign() and paste(). I would love some help on this! John S. S. Denton Ph.D. Candidate Department of Ichthyology and Richard Gilder Graduate School American Museum of Natural History www.johnssdenton.com ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] problems with assign(), paste(), and data.frame() for folders containing trees
Hi folks, I am trying to recurse through several numbered subfolders in a directory. Each folder has many trees that I want to display summary values for. I have been expanding data frames using code with the structure name <- rbind(name, newvals) to produce a data frame with n rows equal to the number of files in one of the folders, and n column equal to the number of values in the file. I can loop over the values within a single subdirectory fine with, for example, library(ape) trees <- list.files(pattern="*.tre") iters=length(trees) branchdata.5 <- data.frame() iterations <- as.character(c(1:length(trees))) for (i in 1:iters) { tree <- read.tree(trees[i]) iteration.edges.5 <- as.numeric(tree$edge.length) branchdata.5 <- rbind(branchdata.5, iteration.edges.5) } The problem comes when I want to iterate through the numbered subdirectories while also iterating through the files in a given directory. I want to recursively assign these data frames as well, with something like f <- list.dirs(path = "/.../.../etc", full.names = FALSE, recursive = FALSE) for (j in 1:length(f)) { setwd(paste("/.../.../.",j,sep="")) assign( paste("branchdata.5",j,sep=""), data.frame() ) iterations <- as.character(c(1:length(trees))) for (i in 1:iters) { tree <- read.tree(trees[i]) assign(paste("iteration.edges.5",j,sep=""), as.numeric(tree$edge.length) ) paste("branchdata.5",j,sep="") <- rbind(paste("branchdata.5",j,sep=""), paste("iteration.edges.5",j,sep="")) } names(iterations) <- NULL boxplot(t(paste("branchdata.5",j,sep="")) , horizontal=TRUE , names=iterations , ylim=c(0,2), xlab="Branch Lengths" , ylab="Iterations" , main = "") } The problem seems to be in the rbind() when using values with assign() and paste(). I would love some help on this! John S. S. Denton Ph.D. Candidate Department of Ichthyology and Richard Gilder Graduate School American Museum of Natural History www.johnssdenton.com ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Call for Software Bazaar entries open for Conference on Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio)
The Call for Software Bazaar entries is now open for the 2012 conference on Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio), at http://ievobio.org/ocs2/index.php/ievobio/2012. See below for instructions. The Software Bazaar features presenters demonstrating their software live on a laptop. At iEvoBio, this session takes the place of a poster session, and will be approximately 2.0 hours in duration. Conference attendees will be able to walk from one demonstration to the next and talk with the presenters. Please also see our FAQ (http://ievobio.org/faq.html). Entries should be software aimed at advancing research in phylogenetics, evolution, and biodiversity, and can include interactive visualizations that have been pre-computed (such as SVGs, or Google Earth-compatible KML files). Note that commercial marketing activities are not permitted - presenters wishing to promote commercial or proprietary services or products should contact the Evolution conference about exhibitor space (http://www.confersense.ca/Evolution2012). Submissions consist of a title, which will typically be the name of the software (or visualization method) being presented, the URL of a website where more information can be obtained, and the license under which the source code is available. The website must contain a link to where the source code (and possibly binaries) can be downloaded. If it is not obvious from the website, the submission must describe what the software does. Reviewers will judge whether a submission is within scope of the conference (see above), and need to be able to verify whether the open-source requirement(*) is met. Presenters are expected to bring their own laptops for presentation, and any auxiliary devices necessary (such as a mouse). Power will be available at the presentation tables (110V/60Hz, US-style plugs; international presenters need to bring a suitable adaptor). Please let the organizing committee know as much in advance as possible if you expect to have unusually high demands for wireless network bandwidth, a large display, or other hardware. Review and acceptance of Software Bazaar submissions will be on a rolling basis. The deadline for submission is the morning of the first day of the conference (July 10), but, because space for Software Bazaar presentations is finite, we cannot guarantee the availability of slots for late submissions. We cannot accept submissions until the open-source requirements are met. Software Bazaar demonstrations are only 1 of 5 kinds of contributed content that iEvoBio will feature. The other 4 are: 1) Full talks (closed), 2) Lightning talks, 3) Challenge entries, and 4) Birds-of-a-Feather gatherings. The Calls for Challenge entries (http://ievobio.org/challenge.html) and Lightning Talks (same submission URL as above) remain open, and the Birds-of-a-Feather call is forthcoming. More details about the conference and program are available at http://ievobio.org. You can also find continuous updates on the conference's Twitter feed at http://twitter.com/iEvoBio and Google+ page, or subscribe to the low-traffic iEvoBio announcements mailing list at http://groups.google.com/group/ievobio-announce. iEvoBio 2012 is sponsored by the US National Evolutionary Synthesis Center (NESCent) and by Biomatters Ltd., in partnership with the Society for the Study of Evolution (SSE) and the Systematic Biologists (SSB). The iEvoBio 2012 Organizing Committee: Hilmar Lapp, US National Evolutionary Synthesis Center (chair) Robert Beiko, Dalhousie University Nico Cellinese, University of Florida and Florida Museum of Natural History Robert Guralnick, University of Colorado at Boulder Rebecca Kao, Denver Botanic Gardens Ellinor Michel, Natural History Museum, London Nadia Talent, Royal Ontario Museum Andrea Thomer, University of Illinois at Urbana-Champaign (*) iEvoBio and its sponsors are dedicated to promoting the practice and philosophy of Open Source software development (see http://www.opensource.org/docs/definition.php) and reuse within the research community. For this reason, software to be demonstrated to conference attendees must be licensed with a recognized Open Source License (see http://www.opensource.org/licenses/), and be available for download, including source code, by a tar/zip file accessed through ftp/http or through a widely used version control system like cvs, Subversion, git, Bazaar, or Mercurial. Authors are advised that non-compliant submissions must be revised to meet the requirement by July 8 at the latest, and in the event that presentation slots run out, precedence is established by the date they are first found in compliance, not the date of submission. ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo