> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of delnatan > Sent: Saturday, October 24, 2009 8:32 PM > To: r-help@r-project.org > Subject: [R] Importing data from text file with mixed format > > > Hi, > I'm having difficulty importing my textfile that looks > something like this: > > #begin text file > Timepoint 1 > ObjectNumber Volume SurfaceArea > 1 5.3 9.7 > 2 4.9 8.3 > 3 5.0 9.1 > 4 3.5 7.8 > > Timepoint 2 > ObjectNumber Volume SurfaceArea > 1 5.1 9.0 > 2 4.7 8.9 > 3 4.3 8.3 > 4 4.2 7.9 > > ... #goes on to Timepoint 80 > > How would I import this data into a list containing > data.frame for each > timepoint? > I'd like my data to be organized like this: > > >myList > [[1]] > ObjectNumber Volume SurfaceArea > 1 1 5.3 9.7 > 2 2 4.9 8.3 > 3 3 5.0 9.1 > 4 4 3.5 7.8 > > [[2]] > ObjectNumber Volume SurfaceArea > 1 1 5.1 9.0 > 2 2 4.7 8.9 > 3 3 4.3 8.3 > 4 4 4.2 7.9
The following function reads that text file into one data.frame, which has a Timepoint column, which is a format I usually find more convenient. You can use split(data, data$Timepoint) to get to the format you asked for. If you use the one-data-frame format you can use the cast and melt functions from the reshape package to rearrange it. readMyData <- function (file) { # read every line in the file lines <- readLines(file) # drop empty lines lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE) # find and check header lines isHeaderLine <- regexpr("^ObjectNumber", lines) > 0 if (sum(isHeaderLine)==0) stop("No header lines of form 'ObjectNumber ...'") if (length(u <- unique(lines[isHeaderLine]))>1) stop("Header lines vary: ", paste(sQuote(head(u)), collapse=", ")) col.names <- strsplit(lines[which(isHeaderLine)[1]], "[[:space:]]+")[[1]] # after making column names from header lines, drop header lines lines <- lines[!isHeaderLine] # process Timepoint lines isTimepointLine <- regexpr("^Timepoint", lines) > 0 if (sum(isTimepointLine)==0) stop("No lines of form 'Timepoint <number>'") timepoints <- sub("^Timepoint[[:space:]]*", "", lines[isTimepointLine]) timepoints <- as.integer(timepoints) if (any(is.na(timepoints))) stop("Non-integer found in a Timepoint line: ", sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]])) nRowsPerTimepoint <- diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1 # drop Timepoint lines. Remaining lines should be data lines lines <- lines[!isTimepointLine] # An error in read.table means there were lines we should have dropped result <- read.table(header=FALSE, row.names=NULL, col.names=col.names, textConnection(lines)) # Add Timepoint column result$Timepoint <- rep(timepoints, nRowsPerTimepoint) result } E.g., > data <- readMyData("c:/temp/t.txt") > data ObjectNumber Volume SurfaceArea Timepoint 1 1 5.3 9.7 1 2 2 4.9 8.3 1 3 3 5.0 9.1 1 4 4 3.5 7.8 1 5 1 5.1 9.0 2 6 2 4.7 8.9 2 7 3 4.3 8.3 2 8 4 4.2 7.9 2 > split(data, data$Timepoint) $`1` ObjectNumber Volume SurfaceArea Timepoint 1 1 5.3 9.7 1 2 2 4.9 8.3 1 3 3 5.0 9.1 1 4 4 3.5 7.8 1 $`2` ObjectNumber Volume SurfaceArea Timepoint 5 1 5.1 9.0 2 6 2 4.7 8.9 2 7 3 4.3 8.3 2 8 4 4.2 7.9 2 > mdata <- melt(data, id=c("ObjectNumber","Timepoint")) > cast(mdata, Timepoint~variable, fun.aggregate=c, subset=variable=="SurfaceArea") Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4 1 1 9.7 8.3 9.1 7.8 2 2 9.0 8.9 8.3 7.9 > cast(mdata, ObjectNumber~variable, fun.aggregate=c, subset=variable=="SurfaceArea") ObjectNumber SurfaceArea_X1 SurfaceArea_X2 1 1 9.7 9.0 2 2 8.3 8.9 3 3 9.1 8.3 4 4 7.8 7.9 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > -Daniel > -- > View this message in context: > http://www.nabble.com/Importing-data-from-text-file-with-mixed -format-tp26045031p26045031.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.