All these have been really helpful. Once again I see that anything's possible in R!
Thank you for the suggestion Bill, I think arranging the data in one data frame is a good idea. -Daniel William Dunlap wrote: > > >> -----Original Message----- >> From: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] On Behalf Of delnatan >> Sent: Saturday, October 24, 2009 8:32 PM >> To: r-help@r-project.org >> Subject: [R] Importing data from text file with mixed format >> >> >> Hi, >> I'm having difficulty importing my textfile that looks >> something like this: >> >> #begin text file >> Timepoint 1 >> ObjectNumber Volume SurfaceArea >> 1 5.3 9.7 >> 2 4.9 8.3 >> 3 5.0 9.1 >> 4 3.5 7.8 >> >> Timepoint 2 >> ObjectNumber Volume SurfaceArea >> 1 5.1 9.0 >> 2 4.7 8.9 >> 3 4.3 8.3 >> 4 4.2 7.9 >> >> ... #goes on to Timepoint 80 >> >> How would I import this data into a list containing >> data.frame for each >> timepoint? >> I'd like my data to be organized like this: >> >> >myList >> [[1]] >> ObjectNumber Volume SurfaceArea >> 1 1 5.3 9.7 >> 2 2 4.9 8.3 >> 3 3 5.0 9.1 >> 4 4 3.5 7.8 >> >> [[2]] >> ObjectNumber Volume SurfaceArea >> 1 1 5.1 9.0 >> 2 2 4.7 8.9 >> 3 3 4.3 8.3 >> 4 4 4.2 7.9 > > The following function reads that text file into one data.frame, > which has a Timepoint column, which is a format I usually find > more convenient. You can use split(data, data$Timepoint) > to get to the format you asked for. If you use the one-data-frame > format you can use the cast and melt functions from the reshape > package to rearrange it. > > readMyData <- function (file) { > # read every line in the file > lines <- readLines(file) > # drop empty lines > lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE) > # find and check header lines > isHeaderLine <- regexpr("^ObjectNumber", lines) > 0 > if (sum(isHeaderLine)==0) > stop("No header lines of form 'ObjectNumber ...'") > if (length(u <- unique(lines[isHeaderLine]))>1) > stop("Header lines vary: ", paste(sQuote(head(u)), collapse=", > ")) > col.names <- strsplit(lines[which(isHeaderLine)[1]], > "[[:space:]]+")[[1]] > # after making column names from header lines, drop header lines > lines <- lines[!isHeaderLine] > # process Timepoint lines > isTimepointLine <- regexpr("^Timepoint", lines) > 0 > if (sum(isTimepointLine)==0) > stop("No lines of form 'Timepoint <number>'") > timepoints <- sub("^Timepoint[[:space:]]*", "", > lines[isTimepointLine]) > timepoints <- as.integer(timepoints) > if (any(is.na(timepoints))) > stop("Non-integer found in a Timepoint line: ", > sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]])) > nRowsPerTimepoint <- > diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1 > # drop Timepoint lines. Remaining lines should be data lines > lines <- lines[!isTimepointLine] > # An error in read.table means there were lines we should have > dropped > result <- read.table(header=FALSE, > row.names=NULL, > col.names=col.names, > textConnection(lines)) > # Add Timepoint column > result$Timepoint <- rep(timepoints, nRowsPerTimepoint) > result > } > > E.g., >> data <- readMyData("c:/temp/t.txt") >> data > ObjectNumber Volume SurfaceArea Timepoint > 1 1 5.3 9.7 1 > 2 2 4.9 8.3 1 > 3 3 5.0 9.1 1 > 4 4 3.5 7.8 1 > 5 1 5.1 9.0 2 > 6 2 4.7 8.9 2 > 7 3 4.3 8.3 2 > 8 4 4.2 7.9 2 >> split(data, data$Timepoint) > $`1` > ObjectNumber Volume SurfaceArea Timepoint > 1 1 5.3 9.7 1 > 2 2 4.9 8.3 1 > 3 3 5.0 9.1 1 > 4 4 3.5 7.8 1 > > $`2` > ObjectNumber Volume SurfaceArea Timepoint > 5 1 5.1 9.0 2 > 6 2 4.7 8.9 2 > 7 3 4.3 8.3 2 > 8 4 4.2 7.9 2 >> mdata <- melt(data, id=c("ObjectNumber","Timepoint")) >> cast(mdata, Timepoint~variable, fun.aggregate=c, > subset=variable=="SurfaceArea") > Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4 > 1 1 9.7 8.3 9.1 7.8 > 2 2 9.0 8.9 8.3 7.9 >> cast(mdata, ObjectNumber~variable, fun.aggregate=c, > subset=variable=="SurfaceArea") > ObjectNumber SurfaceArea_X1 SurfaceArea_X2 > 1 1 9.7 9.0 > 2 2 8.3 8.9 > 3 3 9.1 8.3 > 4 4 7.8 7.9 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> >> -Daniel >> -- >> View this message in context: >> http://www.nabble.com/Importing-data-from-text-file-with-mixed > -format-tp26045031p26045031.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26063496.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.