Hi, a few comments below.
On Fri, Sep 12, 2008 at 9:34 AM, Michael A. Gilchrist <[EMAIL PROTECTED]> wrote: > Hello, > > I am currently using R to run an external program and then read the results > the external program sends to the stdout which are tsv data. > > When R reads the results in it converts it to to a list of strings which I > then have to maniuplate with a whole slew of commands (which, figuring out > how to do was a reall challenge for a newbie like myself)--see below. > > Here's the code I'm using. COMMAND runs the external program. > > rawInput= system(COMMAND,intern=TRUE);##read in tsv values For debugging purposes etc, it is good to read the data into a buffer like this; instead of wrapping up everything in one big nested expression. The overhead for doing this should be minimal. > rawInput = strsplit(rawInput, split="\t");##split elements w/in the list FYI, strsplit(x, split="\t", fixed=TRUE) is *heaps* faster (than fixed=FALSE), e.g. > x <- paste(1:3e4, collapse="\t") > t <- system.time(y <- strsplit(x, split="\t")) > t user system elapsed 2.89 0.00 2.89 > t <- system.time(y <- strsplit(x, split="\t", fixed=TRUE)) > t user system elapsed 0 0 0 > ##of character strings by "\t" > rawInput = unlist(rawInput); ##unlist, making it one long vector FYI, unlist(x, use.names=FALSE) is faster, especially when 'x' is long/large. > mode(rawInput)="double"; ##convert from strings to double > finalInput = data.frame(t(matrix(rawInput, nrow=6))); ##convert Taking the transpose t() takes time - requires a copy in memory. Do you really need data transposed? Converting a matrix to a data frame takes time. Do you really need data as a data frame? > > Because I will be doing this 100,000 of times as part of an optimization > problem, I am interested in learning a more efficient way of doing this > conversion. Do you need the data in each iteration? If not, collect the data as strings and then do the coercing to doubles and turning it into a matrix all together. That is likely to be faster because there is a bit of overhead in each iteration. As suggested, using scan() and providing R with as much hints as possible - explicit arguments to scan() when you know something about the input so that R doesn't have to guess - will also speed things up. parseA <- function(x, ...) { y <- strsplit(x, split="\t", fixed=FALSE); y <- unlist(y); y <- as.double(y); } parseB <- function(x, ...) { y <- strsplit(x, split="\t", fixed=TRUE); y <- unlist(y, use.names=FALSE); y <- as.double(y); } parseC <- function(x, ...) { con <- textConnection(x); on.exit(close(con)); y <- scan(file=con, what=double(0), sep="\t", quiet=TRUE); y; } parseD <- function(x, ...) { con <- textConnection(x); on.exit(close(con)); y <- scan(file=con, what=double(0), sep="\t", quote=NULL, na.strings=NULL, strip.white=FALSE, comment.char="", allowEscapes=FALSE, quiet=TRUE); y; } > x <- paste(1:3e4, collapse="\t"); > tA <- system.time(yA <- parseA(x)); > tA; user system elapsed 2.91 0.00 2.91 > tB <- system.time(yB <- parseB(x)); > tB; user system elapsed 0.03 0.00 0.04 > tC <- system.time(yC <- parseC(x)); > tC; user system elapsed 0.03 0.00 0.03 > tD <- system.time(yD <- parseD(x)); > tD; user system elapsed 0.03 0.00 0.03 > x <- paste(1:1e6, collapse="\t"); # parseA() painfully slow > tB <- system.time(yB <- parseB(x)); > tB user system elapsed 2.30 0.00 2.31 > tC <- system.time(yC <- parseC(x)); > tC user system elapsed 1.14 0.00 1.16 > tD <- system.time(yD <- parseD(x)); > tD user system elapsed 1.16 0.01 1.17 Ok, so parseD() doesn't seem to be much faster than parseC(), but depending on your output format it may be. Take home message: read the help pages and try to help R as much as possible so it does not have to guess. You can always make your code twice as fast! /HB > > Any suggestions would be appreciated. > > > Thanks in advance. > > Mike > > > ----------------------------------------------------- > Department of Ecology & Evolutionary Biology > 569 Dabney Hall > University of Tennessee > Knoxville, TN 37996-1610 > > phone:(865) 974-6453 > fax: (865) 974-6042 > > web: http://eeb.bio.utk.edu/gilchrist.asp > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.