> But I would appreciate help with:
>
>   load_parallel_results_split_on_newline(filenametable)
>   load_parallel_results_split_to_columns(filenametable)
>
>
I'm happy to write these, though I'm limited on time.  Could you could
write a generator for test data?  In particular, it'd be good to be able to
adjust the size of the files if you're interested in testing the scaling to
large files and/or lots of files.

R has limited options for reading data with a non-newline record separator
characters. My first approach here would be to pipe the data through  tr or
sed to swap the desired record separator character with "\n", so that we
can read things into R with the usual commands.  I'm assuming we're on a
POSIX system, or something where we can do that.  Otherwise, I think we'd
have to read each file as a giant string (as you're doing for 'raw'), and
then parse things ourselves, which I'd suspect would be much slower.

BTW, for 'raw', it might be worth comparing the performance of using
readLines, followed by collapsing the newlines, to the following approach:

readChar(fileName, file.info(fileName)$size)

(which I got from
http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string)


David


>
>
> load_parallel_results_filenames <- function(resdir) {
>   ## Find files called .../stdout
>   stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
>   ## Find files called .../stderr
>   stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
>   if(length(stdoutnames) == 0) {
>     ## Return empty data frame if no files found
>     return(data.frame());
>   }
>   m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow =
> length(stdoutnames),byrow=T);
>   tbl <- as.table(m[,c(F,T)]);
>   ## Append the stdout and stderr filenames
>   tbl <- cbind(tbl,unlist(stdoutnames),unlist(stderrnames));
>   colnames(tbl) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
>   return(tbl);
> }
>
> load_parallel_results_raw_content <- function(tbl) {
>   ## Read them
>   stdoutcontents <-
>     lapply(tbl[,c("stdout")],
>            function(x) {
>
>  return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
>            } );
>   stderrcontents <-
>     lapply(tbl[,c("stderr")],
>            function(x) {
>
>  return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n"))
>            } );
>   # Replace filenames with file contents
>   tbl[,c("stdout","stderr")] <-
>     c(as.character(stdoutcontents),as.character(stderrcontents));
>   return(tbl);
> }
>
>

Reply via email to