On 3/1/07, Bart Joosen <[EMAIL PROTECTED]> wrote: > Dear All, > > thanks for the replies, Jim Holtman has given a solution which fits my > needs, but Gabor Grothendieck did the same thing, > but it looks like the coding will allow faster processing (should check this > out tomorrow on a big datafile). > > @gabor: I don't understand the use of the grep command: > grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE) > What is this expression ("^[1-9][0-9. ]*$|Time") actually doing? > I looked in the help page, but couldn't find a suitable answer.
I briefly discussed it in the first paragraph of my response. It matches and returns only those lines that start (^ matches start of line) with a digit, i.e. [1-9], and contains only digits, dots and spaces, i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means or) contains the word Time. If you don't have lines like ... (which you did in your example) then the regexp could be simplified to "^[0-9. ]+$|Time". You may need to match tabs too if your input contains those. > > > Thanks to All > > > Bart > > ----- Original Message ----- > From: "Gabor Grothendieck" <[EMAIL PROTECTED]> > To: "Bart Joosen" <[EMAIL PROTECTED]> > Cc: <r-help@stat.math.ethz.ch> > Sent: Thursday, March 01, 2007 6:35 PM > Subject: Re: [R] How to read in this data format? > > > > Read in the data using readLines, extract out > > all desired lines (namely those containing only > > numbers, dots and spaces or those with the > > word Time) and remove Retention from all > > lines so that all remaining lines have two > > fields. Now that we have desired lines > > and all lines have two fields read them in > > using read.table. > > > > Finally, split them into groups and restructure > > them using "by" and in the last line we > > convert the "by" output to a data frame. > > > > At the end we display an alternate function f > > for use with by should we wish to generate long > > rather than wide output (using the terminology > > of the reshape command). > > > > > > Lines <- "$$ Experiment Number: > > $$ Associated Data: > > > > FUNCTION 1 > > > > Scan 1 > > Retention Time 0.017 > > > > 399.8112 184 > > 399.8742 0 > > 399.9372 152 > > .... > > > > Scan 2 > > Retention Time 0.021 > > > > 399.8112 181 > > 399.8742 1 > > 399.9372 153 > > " > > > > # replace next line with: Lines. <- readLines("myfile.dat") > > Lines. <- readLines(textConnection(Lines)) > > Lines. <- grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE) > > Lines. <- gsub("Retention", "", Lines.) > > > > DF <- read.table(textConnection(Lines.), as.is = TRUE) > > closeAllConnections() > > > > f <- function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) > > out.by <- by(DF, cumsum(DF[,1] == "Time"), f) > > as.data.frame(do.call("rbind", out.by)) > > > > > > We could alternately consider producing long > > format by replacing the function f with: > > > > f <- function(x) data.frame(x[-1,], id = x[1,2]) > > > > > > On 3/1/07, Bart Joosen <[EMAIL PROTECTED]> wrote: > >> Hi, > >> > >> I recieved an ascii file, containing following information: > >> > >> $$ Experiment Number: > >> $$ Associated Data: > >> > >> FUNCTION 1 > >> > >> Scan 1 > >> Retention Time 0.017 > >> > >> 399.8112 184 > >> 399.8742 0 > >> 399.9372 152 > >> .... > >> > >> Scan 2 > >> Retention Time 0.021 > >> > >> 399.8112 181 > >> 399.8742 1 > >> 399.9372 153 > >> ..... > >> > >> > >> I would like to import this data in R into a dataframe, where there is a > >> column time, the first numbers as column names, and the second numbers as > >> data in the dataframe: > >> > >> Time 399.8112 399.8742 399.9372 > >> 0.017 184 0 152 > >> 0.021 181 1 153 > >> > >> I did take a look at the read.table, read.delim, scan, ... But I 've no > >> idea > >> about how to solve this problem. > >> > >> Anyone? > >> > >> > >> Thanks > >> > >> Bart > >> > >> ______________________________________________ > >> R-help@stat.math.ethz.ch mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.