Gabor, thanks for the clarification, now I understand the expression.
Thanks to everyone Bart >From: "Gabor Grothendieck" <[EMAIL PROTECTED]> >To: "Bart Joosen" <[EMAIL PROTECTED]> >CC: r-help@stat.math.ethz.ch >Subject: Re: [R] How to read in this data format? >Date: Thu, 1 Mar 2007 16:46:21 -0500 > >On 3/1/07, Bart Joosen <[EMAIL PROTECTED]> wrote: >>Dear All, >> >>thanks for the replies, Jim Holtman has given a solution which fits my >>needs, but Gabor Grothendieck did the same thing, >>but it looks like the coding will allow faster processing (should check >>this >>out tomorrow on a big datafile). >> >>@gabor: I don't understand the use of the grep command: >> grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE) >>What is this expression ("^[1-9][0-9. ]*$|Time") actually doing? >>I looked in the help page, but couldn't find a suitable answer. > >I briefly discussed it in the first paragraph of my response. It >matches and returns only those lines that start (^ matches start of line) >with a digit, i.e. [1-9], and contains only digits, dots and spaces, >i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means >or) contains the word Time. >If you don't have lines like ... (which you did in your example) then >the regexp >could be simplified to "^[0-9. ]+$|Time". You may need to match tabs too >if your input contains those. > >> >> >>Thanks to All >> >> >>Bart >> >>----- Original Message ----- >>From: "Gabor Grothendieck" <[EMAIL PROTECTED]> >>To: "Bart Joosen" <[EMAIL PROTECTED]> >>Cc: <r-help@stat.math.ethz.ch> >>Sent: Thursday, March 01, 2007 6:35 PM >>Subject: Re: [R] How to read in this data format? >> >> >> > Read in the data using readLines, extract out >> > all desired lines (namely those containing only >> > numbers, dots and spaces or those with the >> > word Time) and remove Retention from all >> > lines so that all remaining lines have two >> > fields. Now that we have desired lines >> > and all lines have two fields read them in >> > using read.table. >> > >> > Finally, split them into groups and restructure >> > them using "by" and in the last line we >> > convert the "by" output to a data frame. >> > >> > At the end we display an alternate function f >> > for use with by should we wish to generate long >> > rather than wide output (using the terminology >> > of the reshape command). >> > >> > >> > Lines <- "$$ Experiment Number: >> > $$ Associated Data: >> > >> > FUNCTION 1 >> > >> > Scan 1 >> > Retention Time 0.017 >> > >> > 399.8112 184 >> > 399.8742 0 >> > 399.9372 152 >> > .... >> > >> > Scan 2 >> > Retention Time 0.021 >> > >> > 399.8112 181 >> > 399.8742 1 >> > 399.9372 153 >> > " >> > >> > # replace next line with: Lines. <- readLines("myfile.dat") >> > Lines. <- readLines(textConnection(Lines)) >> > Lines. <- grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE) >> > Lines. <- gsub("Retention", "", Lines.) >> > >> > DF <- read.table(textConnection(Lines.), as.is = TRUE) >> > closeAllConnections() >> > >> > f <- function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) >> > out.by <- by(DF, cumsum(DF[,1] == "Time"), f) >> > as.data.frame(do.call("rbind", out.by)) >> > >> > >> > We could alternately consider producing long >> > format by replacing the function f with: >> > >> > f <- function(x) data.frame(x[-1,], id = x[1,2]) >> > >> > >> > On 3/1/07, Bart Joosen <[EMAIL PROTECTED]> wrote: >> >> Hi, >> >> >> >> I recieved an ascii file, containing following information: >> >> >> >> $$ Experiment Number: >> >> $$ Associated Data: >> >> >> >> FUNCTION 1 >> >> >> >> Scan 1 >> >> Retention Time 0.017 >> >> >> >> 399.8112 184 >> >> 399.8742 0 >> >> 399.9372 152 >> >> .... >> >> >> >> Scan 2 >> >> Retention Time 0.021 >> >> >> >> 399.8112 181 >> >> 399.8742 1 >> >> 399.9372 153 >> >> ..... >> >> >> >> >> >> I would like to import this data in R into a dataframe, where there is >>a >> >> column time, the first numbers as column names, and the second numbers >>as >> >> data in the dataframe: >> >> >> >> Time 399.8112 399.8742 399.9372 >> >> 0.017 184 0 152 >> >> 0.021 181 1 153 >> >> >> >> I did take a look at the read.table, read.delim, scan, ... But I 've >>no >> >> idea >> >> about how to solve this problem. >> >> >> >> Anyone? >> >> >> >> >> >> Thanks >> >> >> >> Bart >> >> >> >> ______________________________________________ >> >> R-help@stat.math.ethz.ch mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> >> ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.