Re: [R-sig-phylo] problems with assign(), paste(), and data.frame() for folders containing trees

2012-04-24 Thread Emmanuel Paradis
Hi John,

You seem to transform a (relatively) simple problem into a complicated one. 
First, you can get all the tree file names in one command, such as (I moved to 
a directory with one subdir with trees estimated by ML and another one by NJ; 
this is slightly arranged):

> f <- grep("\\.tre", list.files(recursive = TRUE), value = TRUE)
> f
[1] "ML/trees_55species.tre"  "ML/treesNADH2_45species.tre"
[3] "ML/TR.ML.Dloop_55sp.tre" "ML/TR.ML.NADH2_45sp.tre"
[5] "nj/trees_55species.tre"  "nj/treesNADH2_45species.tre"
[7] "nj/TR.NJ.Dloop_55sp.tre" "nj/TR.NJ.NADH2_45sp.tre"

Because in R file paths are resolved relatively, there is no need to navigate 
with setwd().

Second, I think you should use a list instead of a data frame because (I 
presume) you may have files with different numbers of trees (if this is not the 
case, you can transform the list in a data frame later).

You may have commands like this, eg, if you want to get the mean branch length 
of each tree:

ntree <- length(f)
L <- list()
for (i in 1:ntree) {
tr <- read.tree(f[i])
if (class(tr) == "phylo") L[[i]] <- mean(tr$edge.length)
if (class(tr) == "multiPhylo") L[[i]] <- sapply(tr, function(x) 
mean(x$edge.length))
}

Finally, you may name your list with the file names:

names(L) <- f

This has the advantage that you can select some of the results, eg, the trees 
that were estimated by NJ:

> grep("nj/", names(L))
[1] 5 6 7 8

or those from D-loop:

> grep("Dloop", names(L))
[1] 3 7

You get the number of trees in each file with sapply(L, length).

There can be many variations around this scheme. For instance, if you want to 
extract the branch lengths as in your example, the two lines above with "mean" 
would become:

if (class(tr) == "phylo") L[[i]] <- tr$edge.length
if (class(tr) == "multiPhylo") L[[i]] <- lapply(tr, "$", edge.length)

Best,

Emmanuel
-Original Message-
From: John Denton 
Sender: r-sig-phylo-boun...@r-project.org
Date: Tue, 24 Apr 2012 20:30:26 
To: R Sig Phylo Listserv
Subject: [R-sig-phylo] problems with assign(), paste(),
 and data.frame() for folders containing trees

Hi folks,

I am trying to recurse through several numbered subfolders in a directory. Each 
folder has many trees that I want to display summary values for. I have been 
expanding data frames using code with the structure name <- rbind(name, 
newvals) to produce a data frame with n rows equal to the number of files in 
one of the folders, and n column equal to the number of values in the file.

I can loop over the values within a single subdirectory fine with, for example,

library(ape)

trees <- list.files(pattern="*.tre")
iters=length(trees)

branchdata.5 <- data.frame()

iterations <- as.character(c(1:length(trees)))

for (i in 1:iters) {

tree <- read.tree(trees[i])
iteration.edges.5 <- as.numeric(tree$edge.length)

branchdata.5 <- rbind(branchdata.5, iteration.edges.5)

}

The problem comes when I want to iterate through the numbered subdirectories 
while also iterating through the files in a given directory. I want to 
recursively assign these data frames as well, with something like

f <- list.dirs(path = "/.../.../etc", full.names = FALSE, recursive = FALSE)

for (j in 1:length(f)) {

setwd(paste("/.../.../.",j,sep=""))

assign( paste("branchdata.5",j,sep=""), data.frame() )

iterations <- as.character(c(1:length(trees)))

for (i in 1:iters) {

tree <- read.tree(trees[i])
assign(paste("iteration.edges.5",j,sep=""), as.numeric(tree$edge.length) )

paste("branchdata.5",j,sep="") <- rbind(paste("branchdata.5",j,sep=""), 
paste("iteration.edges.5",j,sep=""))

}

names(iterations) <- NULL
boxplot(t(paste("branchdata.5",j,sep="")) , horizontal=TRUE , names=iterations 
, ylim=c(0,2), xlab="Branch Lengths" , ylab="Iterations" , main = "")

}

The problem seems to be in the rbind() when using values with assign() and 
paste(). I would love some help on this! 


John S. S. Denton
Ph.D. Candidate
Department of Ichthyology and Richard Gilder Graduate School
American Museum of Natural History
www.johnssdenton.com
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] problems with assign(), paste(), and data.frame() for folders containing trees

2012-04-24 Thread John Denton
Hi folks,

I am trying to recurse through several numbered subfolders in a directory. Each 
folder has many trees that I want to display summary values for. I have been 
expanding data frames using code with the structure name <- rbind(name, 
newvals) to produce a data frame with n rows equal to the number of files in 
one of the folders, and n column equal to the number of values in the file.

I can loop over the values within a single subdirectory fine with, for example,

library(ape)

trees <- list.files(pattern="*.tre")
iters=length(trees)

branchdata.5 <- data.frame()

iterations <- as.character(c(1:length(trees)))

for (i in 1:iters) {

tree <- read.tree(trees[i])
iteration.edges.5 <- as.numeric(tree$edge.length)

branchdata.5 <- rbind(branchdata.5, iteration.edges.5)

}

The problem comes when I want to iterate through the numbered subdirectories 
while also iterating through the files in a given directory. I want to 
recursively assign these data frames as well, with something like

f <- list.dirs(path = "/.../.../etc", full.names = FALSE, recursive = FALSE)

for (j in 1:length(f)) {

setwd(paste("/.../.../.",j,sep=""))

assign( paste("branchdata.5",j,sep=""), data.frame() )

iterations <- as.character(c(1:length(trees)))

for (i in 1:iters) {

tree <- read.tree(trees[i])
assign(paste("iteration.edges.5",j,sep=""), as.numeric(tree$edge.length) )

paste("branchdata.5",j,sep="") <- rbind(paste("branchdata.5",j,sep=""), 
paste("iteration.edges.5",j,sep=""))

}

names(iterations) <- NULL
boxplot(t(paste("branchdata.5",j,sep="")) , horizontal=TRUE , names=iterations 
, ylim=c(0,2), xlab="Branch Lengths" , ylab="Iterations" , main = "")

}

The problem seems to be in the rbind() when using values with assign() and 
paste(). I would love some help on this! 


John S. S. Denton
Ph.D. Candidate
Department of Ichthyology and Richard Gilder Graduate School
American Museum of Natural History
www.johnssdenton.com
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[R-sig-phylo] Call for Software Bazaar entries open for Conference on Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio)

2012-04-24 Thread Hilmar Lapp
The Call for Software Bazaar entries is now open for the 2012 conference on 
Informatics for Phylogenetics, Evolution, and Biodiversity (iEvoBio), at 
http://ievobio.org/ocs2/index.php/ievobio/2012. See below for instructions. 

The Software Bazaar features presenters demonstrating their software live on a 
laptop. At iEvoBio, this session takes the place of a poster session, and will 
be approximately 2.0 hours in duration. Conference attendees will be able to 
walk from one demonstration to the next and talk with the presenters.  Please 
also see our FAQ (http://ievobio.org/faq.html).

Entries should be software aimed at advancing research in phylogenetics, 
evolution, and biodiversity, and can include interactive visualizations that 
have been pre-computed (such as SVGs, or Google Earth-compatible KML files). 
Note that commercial marketing activities are not permitted - presenters 
wishing to promote commercial or proprietary services or products should 
contact the Evolution conference about exhibitor space 
(http://www.confersense.ca/Evolution2012).

Submissions consist of a title, which will typically be the name of the 
software (or visualization method) being presented, the URL of a website where 
more information can be obtained, and the license under which the source code 
is available. The website must contain a link to where the source code (and 
possibly binaries) can be downloaded. If it is not obvious from the website, 
the submission must describe what the software does. Reviewers will judge 
whether a submission is within scope of the conference (see above), and need to 
be able to verify whether the open-source requirement(*) is met.

Presenters are expected to bring their own laptops for presentation, and any 
auxiliary devices necessary (such as a mouse). Power will be available at the 
presentation tables (110V/60Hz, US-style plugs; international presenters need 
to bring a suitable adaptor). Please let the organizing committee know as much 
in advance as possible if you expect to have unusually high demands for 
wireless network bandwidth, a large display, or other hardware.

Review and acceptance of Software Bazaar submissions will be on a rolling 
basis. The deadline for submission is the morning of the first day of the 
conference (July 10), but, because space for Software Bazaar presentations is 
finite, we cannot guarantee the availability of slots for late submissions. We 
cannot accept submissions until the open-source requirements are met.

Software Bazaar demonstrations are only 1 of 5 kinds of contributed content 
that iEvoBio will feature. The other 4 are: 1) Full talks (closed), 2) 
Lightning talks, 3) Challenge entries, and 4) Birds-of-a-Feather gatherings. 
The Calls for Challenge entries (http://ievobio.org/challenge.html) and 
Lightning Talks (same submission URL as above) remain open, and the 
Birds-of-a-Feather call is forthcoming.

More details about the conference and program are available at 
http://ievobio.org. You can also find continuous updates on the conference's 
Twitter feed at http://twitter.com/iEvoBio and Google+ page, or subscribe to 
the low-traffic iEvoBio announcements mailing list at 
http://groups.google.com/group/ievobio-announce.

iEvoBio 2012 is sponsored by the US National Evolutionary Synthesis Center 
(NESCent) and by Biomatters Ltd., in partnership with the Society for the Study 
of Evolution (SSE) and the Systematic Biologists (SSB).

The iEvoBio 2012 Organizing Committee:
Hilmar Lapp, US National Evolutionary Synthesis Center (chair)
Robert Beiko, Dalhousie University 
Nico Cellinese, University of Florida and Florida Museum of Natural History
Robert Guralnick, University of Colorado at Boulder
Rebecca Kao, Denver Botanic Gardens
Ellinor Michel, Natural History Museum, London
Nadia Talent, Royal Ontario Museum
Andrea Thomer, University of Illinois at Urbana-Champaign

(*) iEvoBio and its sponsors are dedicated to promoting the practice and 
philosophy of Open Source software development (see 
http://www.opensource.org/docs/definition.php) and reuse within the research 
community. For this reason, software to be demonstrated to conference attendees 
must be licensed with a recognized Open Source License (see 
http://www.opensource.org/licenses/), and be available for download, including 
source code, by a tar/zip file accessed through ftp/http or through a widely 
used version control system like cvs, Subversion, git, Bazaar, or Mercurial.  
Authors are advised that non-compliant submissions must be revised to meet the 
requirement by July 8 at the latest, and in the event that presentation slots 
run out, precedence is established by the date they are first found in 
compliance, not the date of submission.
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo