Re: [R] How to read in this data format?
Hi, Although the solution worked, I'v got some troubles with some data files. These datafiles are very large (600-700 MB), so my computer starts swapping. If I use the code, written below, I get: Error in .Call(R_lazyLoadDBfetch, key, file, compressed, hook, PACKAGE = base) : recursive default argument reference After about 15 minutes of loading the data with the Lines. - readLines(myfile.dat) command. When I look in the help for readLines, I saw that there is a n to setup a maximum number, but is there a way to set a starting row number? If I can split up my datafiles in 4-8 small datasets, it's ok for me. But I couldn't figure it out. Thanks Bart From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] How to read in this data format? Date: Thu, 1 Mar 2007 16:46:21 -0500 On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Dear All, thanks for the replies, Jim Holtman has given a solution which fits my needs, but Gabor Grothendieck did the same thing, but it looks like the coding will allow faster processing (should check this out tomorrow on a big datafile). @gabor: I don't understand the use of the grep command: grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) What is this expression (^[1-9][0-9. ]*$|Time) actually doing? I looked in the help page, but couldn't find a suitable answer. I briefly discussed it in the first paragraph of my response. It matches and returns only those lines that start (^ matches start of line) with a digit, i.e. [1-9], and contains only digits, dots and spaces, i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means or) contains the word Time. If you don't have lines like ... (which you did in your example) then the regexp could be simplified to ^[0-9. ]+$|Time. You may need to match tabs too if your input contains those. Thanks to All Bart - Original Message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, March 01, 2007 6:35 PM Subject: Re: [R] How to read in this data format? Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] How to read in this data format?
If you want to process 'n' lines from the file, then just setup the file as a connection and read the desired length in a loop like below: f.1 - file('/tempxx.txt', 'r') nlines - 0 # read 1000 lines at a time while (TRUE){ lines - readLines(f.1, n=1000) if (length(lines) == 0) break # quit then no lines are read # processing nlines - nlines + length(lines) } cat (nlines, lines read\n) On 3/5/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, Although the solution worked, I'v got some troubles with some data files. These datafiles are very large (600-700 MB), so my computer starts swapping. If I use the code, written below, I get: Error in .Call(R_lazyLoadDBfetch, key, file, compressed, hook, PACKAGE = base) : recursive default argument reference After about 15 minutes of loading the data with the Lines. - readLines(myfile.dat) command. When I look in the help for readLines, I saw that there is a n to setup a maximum number, but is there a way to set a starting row number? If I can split up my datafiles in 4-8 small datasets, it's ok for me. But I couldn't figure it out. Thanks Bart From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] How to read in this data format? Date: Thu, 1 Mar 2007 16:46:21 -0500 On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Dear All, thanks for the replies, Jim Holtman has given a solution which fits my needs, but Gabor Grothendieck did the same thing, but it looks like the coding will allow faster processing (should check this out tomorrow on a big datafile). @gabor: I don't understand the use of the grep command: grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) What is this expression (^[1-9][0-9. ]*$|Time) actually doing? I looked in the help page, but couldn't find a suitable answer. I briefly discussed it in the first paragraph of my response. It matches and returns only those lines that start (^ matches start of line) with a digit, i.e. [1-9], and contains only digits, dots and spaces, i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means or) contains the word Time. If you don't have lines like ... (which you did in your example) then the regexp could be simplified to ^[0-9. ]+$|Time. You may need to match tabs too if your input contains those. Thanks to All Bart - Original Message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, March 01, 2007 6:35 PM Subject: Re: [R] How to read in this data format? Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152
Re: [R] How to read in this data format?
Well, not extremely elegant, but should work: 1) open your file in some ascii text editor, delete the rubbish at the beginning up to line Scan 1, and replace all spaces in names - e.g. make a mass replace of 'Retention Time' by let say 'RetentionTime'. 2) Use read.table(), matrix() and data.frame(): d - read.table('yourfile') dd - matrix(as.numeric(t(d)[2,]),byrow=TRUE,nrow=HowManyScansYouHave) dd - data.frame(dd) names(dd) - d[[1]][1:HowManyObservationsYouHavePerScan] Petr Bart Joosen napsal(a): Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan 1 Retention Time0.017 399.8112 184 399.8742 0 399.9372 152 Scan 2 Retention Time0.021 399.8112 181 399.8742 1 399.9372 153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time 399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
You can't expect general-purpose tools like read.table in R to be able to deal with highly specialized file format. Here's what I'd start. It doesn't put data in the format you specified exactly, but I doubt you'll need that. This might be sufficient for your purpose: dat - readLines(file(yourdata.dat)) ## Get rid of blank lines. dat - dat[dat != ] scan.lines - grep(Scan, dat) ## Chop off the header rows. dat - dat[scan.lines[1]:length(dat)] scan.lines - scan.lines - scan.lines[1] + 1 lines.per.scan - c(scan.lines[-1], length(dat) + 1) - scan.lines ## Split the data into a list, with each scan taking up one component. dat - split(dat, rep(seq(along=lines.per.scan), each=lines.per.scan)) ## Process the data one scan at a time. result - lapply(dat, function(x) { x - strsplit(x, \t) rtime - x[[2]][2] # second field of second line t(matrix(as.numeric(do.call(rbind, c(rtime, x[-(1:2)]))), ncol=2)) }) This is what I get from the data you've shown: R result $`1` [,1] [,2] [,3] [,4] [1,] 0.017 399.8112 399.8742 399.9372 [2,] 0.017 184. 0. 152. $`2` [,1] [,2] [,3] [,4] [1,] 0.021 399.8112 399.8742 399.9372 [2,] 0.021 181. 1. 153. Note that you probably should avoid using numbers as column names in a data frame, even if it's possible. Andy From: Bart Joosen Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan 1 Retention Time0.017 399.8112 184 399.8742 0 399.9372 152 Scan 2 Retention Time0.021 399.8112 181 399.8742 1 399.9372 153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time 399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
Here is one way of doing it using the reshape package: # test data from email x - $$ Experiment Number: + $$ Associated Data: + + FUNCTION 1 + + Scan 1 + Retention Time 0.017 + + 399.8112 184 + 399.8742 0 + 399.9372 152 + + + Scan 2 + Retention Time 0.021 + + 399.8112 181 + 399.8742 1 + 399.9372 153 + . + # read in the vector x.in - readLines(textConnection(x)) result - list()# output list i.result - 1 # process each line for (i in x.in){ + # if Retention, pick off the time + if (regexpr(^Retention, i) 0){ + time - gsub(^Ret.*?([0-9.]+), \\1, i, perl=TRUE) + } else if (regexpr(^\\d+, i, perl=TRUE) 0){ + # if data, parse it and store in result + idVal - strsplit(i, \\s+) + result[[i.result]] - c(time, idVal[[1]]) + i.result - i.result + 1 + } + } # create data frame df - as.data.frame(do.call(rbind, result)) colnames(df) - c('time', 'id', 'value') require(reshape) # use reshape package Loading required package: reshape [1] TRUE y - melt(df) # convert to long cast(y, time ~ id) time X399.8112 X399.8742 X399.9372 1 0.017 184 0 152 2 0.021 181 1 153 Jim Holtman What is the problem you are trying to solve? - Original Message From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, March 1, 2007 12:35:43 PM Subject: Re: [R] How to read in this data format? Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
Dear All, thanks for the replies, Jim Holtman has given a solution which fits my needs, but Gabor Grothendieck did the same thing, but it looks like the coding will allow faster processing (should check this out tomorrow on a big datafile). @gabor: I don't understand the use of the grep command: grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) What is this expression (^[1-9][0-9. ]*$|Time) actually doing? I looked in the help page, but couldn't find a suitable answer. Thanks to All Bart - Original Message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, March 01, 2007 6:35 PM Subject: Re: [R] How to read in this data format? Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Dear All, thanks for the replies, Jim Holtman has given a solution which fits my needs, but Gabor Grothendieck did the same thing, but it looks like the coding will allow faster processing (should check this out tomorrow on a big datafile). @gabor: I don't understand the use of the grep command: grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) What is this expression (^[1-9][0-9. ]*$|Time) actually doing? I looked in the help page, but couldn't find a suitable answer. I briefly discussed it in the first paragraph of my response. It matches and returns only those lines that start (^ matches start of line) with a digit, i.e. [1-9], and contains only digits, dots and spaces, i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means or) contains the word Time. If you don't have lines like ... (which you did in your example) then the regexp could be simplified to ^[0-9. ]+$|Time. You may need to match tabs too if your input contains those. Thanks to All Bart - Original Message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, March 01, 2007 6:35 PM Subject: Re: [R] How to read in this data format? Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
Gabor, thanks for the clarification, now I understand the expression. Thanks to everyone Bart From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] How to read in this data format? Date: Thu, 1 Mar 2007 16:46:21 -0500 On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Dear All, thanks for the replies, Jim Holtman has given a solution which fits my needs, but Gabor Grothendieck did the same thing, but it looks like the coding will allow faster processing (should check this out tomorrow on a big datafile). @gabor: I don't understand the use of the grep command: grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) What is this expression (^[1-9][0-9. ]*$|Time) actually doing? I looked in the help page, but couldn't find a suitable answer. I briefly discussed it in the first paragraph of my response. It matches and returns only those lines that start (^ matches start of line) with a digit, i.e. [1-9], and contains only digits, dots and spaces, i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means or) contains the word Time. If you don't have lines like ... (which you did in your example) then the regexp could be simplified to ^[0-9. ]+$|Time. You may need to match tabs too if your input contains those. Thanks to All Bart - Original Message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Bart Joosen [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, March 01, 2007 6:35 PM Subject: Re: [R] How to read in this data format? Read in the data using readLines, extract out all desired lines (namely those containing only numbers, dots and spaces or those with the word Time) and remove Retention from all lines so that all remaining lines have two fields. Now that we have desired lines and all lines have two fields read them in using read.table. Finally, split them into groups and restructure them using by and in the last line we convert the by output to a data frame. At the end we display an alternate function f for use with by should we wish to generate long rather than wide output (using the terminology of the reshape command). Lines - $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 # replace next line with: Lines. - readLines(myfile.dat) Lines. - readLines(textConnection(Lines)) Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE) Lines. - gsub(Retention, , Lines.) DF - read.table(textConnection(Lines.), as.is = TRUE) closeAllConnections() f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1])) out.by - by(DF, cumsum(DF[,1] == Time), f) as.data.frame(do.call(rbind, out.by)) We could alternately consider producing long format by replacing the function f with: f - function(x) data.frame(x[-1,], id = x[1,2]) On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote: Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan1 Retention Time 0.017 399.8112184 399.87420 399.9372152 Scan2 Retention Time 0.021 399.8112181 399.87421 399.9372153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.