Re: [R] How to read in this data format?

2007-03-05 Thread Bart Joosen
Hi,

Although the solution worked, I'v got some troubles with some data files.
These datafiles are very large (600-700 MB), so my computer starts swapping.

If I use the code, written below, I get:
Error in .Call(R_lazyLoadDBfetch, key, file, compressed, hook, PACKAGE = 
base) :
recursive default argument reference
After about 15 minutes of loading the data with the  Lines. - 
readLines(myfile.dat) command.

When I look in the help for readLines, I saw that there is a n to setup a 
maximum number, but is there a way to set a starting row number? If I can 
split up my datafiles in 4-8 small datasets, it's ok for me. But I couldn't 
figure it out.


Thanks

Bart




From: Gabor Grothendieck [EMAIL PROTECTED]
To: Bart Joosen [EMAIL PROTECTED]
CC: r-help@stat.math.ethz.ch
Subject: Re: [R] How to read in this data format?
Date: Thu, 1 Mar 2007 16:46:21 -0500

On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
Dear All,

thanks for the replies, Jim Holtman has given a solution which fits my
needs, but Gabor Grothendieck did the same thing,
but it looks like the coding will allow faster processing (should check 
this
out tomorrow on a big datafile).

@gabor: I don't understand the use of the grep command:
grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
What is this expression  (^[1-9][0-9. ]*$|Time) actually doing?
I looked in the help page, but couldn't find a suitable answer.

I briefly discussed it in the first paragraph of my response.  It
matches and returns only those lines that start (^ matches start of line)
with a digit, i.e. [1-9], and contains only digits, dots and spaces,
i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means
or) contains the word Time.
If you don't have lines like ... (which you did in your example) then
the regexp
could be simplified to ^[0-9. ]+$|Time.  You may need to match tabs too
if your input contains those.



Thanks to All


Bart

- Original Message -
From: Gabor Grothendieck [EMAIL PROTECTED]
To: Bart Joosen [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Thursday, March 01, 2007 6:35 PM
Subject: Re: [R] How to read in this data format?


  Read in the data using readLines, extract out
  all desired lines (namely those containing only
  numbers, dots and spaces or those with the
  word Time) and remove Retention from all
  lines so that all remaining lines have two
  fields.  Now that we have desired lines
  and all lines have two fields read them in
  using read.table.
 
  Finally, split them into groups and restructure
  them using by and in the last line we
  convert the by output to a data frame.
 
  At the end we display an alternate function f
  for use with by should we wish to generate long
  rather than wide output (using the terminology
  of the reshape command).
 
 
  Lines - $$ Experiment Number:
  $$ Associated Data:
 
  FUNCTION 1
 
  Scan1
  Retention Time  0.017
 
  399.8112184
  399.87420
  399.9372152
  
 
  Scan2
  Retention Time  0.021
 
  399.8112181
  399.87421
  399.9372153
  
 
  # replace next line with: Lines. - readLines(myfile.dat)
  Lines. - readLines(textConnection(Lines))
  Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
  Lines. - gsub(Retention, , Lines.)
 
  DF - read.table(textConnection(Lines.), as.is = TRUE)
  closeAllConnections()
 
  f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
  out.by - by(DF, cumsum(DF[,1] == Time), f)
  as.data.frame(do.call(rbind, out.by))
 
 
  We could alternately consider producing long
  format by replacing the function f with:
 
  f - function(x) data.frame(x[-1,], id = x[1,2])
 
 
  On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
  Hi,
 
  I recieved an ascii file, containing following information:
 
  $$ Experiment Number:
  $$ Associated Data:
 
  FUNCTION 1
 
  Scan1
  Retention Time  0.017
 
  399.8112184
  399.87420
  399.9372152
  
 
  Scan2
  Retention Time  0.021
 
  399.8112181
  399.87421
  399.9372153
  .
 
 
  I would like to import this data in R into a dataframe, where there is 
a
  column time, the first numbers as column names, and the second numbers 
as
  data in the dataframe:
 
  Time399.8112399.8742399.9372
  0.017   184 0   152
  0.021   181 1   153
 
  I did take a look at the read.table, read.delim, scan, ... But I 've 
no
  idea
  about how to solve this problem.
 
  Anyone?
 
 
  Thanks
 
  Bart
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE

Re: [R] How to read in this data format?

2007-03-05 Thread jim holtman
If you want to process 'n' lines from the file, then just setup the file as
a connection and read the desired length in a loop like below:

f.1 - file('/tempxx.txt', 'r')
nlines - 0
# read 1000 lines at a time
while (TRUE){
lines - readLines(f.1, n=1000)
if (length(lines) == 0) break  # quit then no lines are read
# processing
nlines - nlines + length(lines)
}
cat (nlines, lines read\n)



On 3/5/07, Bart Joosen [EMAIL PROTECTED] wrote:

 Hi,

 Although the solution worked, I'v got some troubles with some data files.
 These datafiles are very large (600-700 MB), so my computer starts
 swapping.

 If I use the code, written below, I get:
 Error in .Call(R_lazyLoadDBfetch, key, file, compressed, hook, PACKAGE =
 base) :
recursive default argument reference
 After about 15 minutes of loading the data with the  Lines. -
 readLines(myfile.dat) command.

 When I look in the help for readLines, I saw that there is a n to setup a
 maximum number, but is there a way to set a starting row number? If I can
 split up my datafiles in 4-8 small datasets, it's ok for me. But I
 couldn't
 figure it out.


 Thanks

 Bart




 From: Gabor Grothendieck [EMAIL PROTECTED]
 To: Bart Joosen [EMAIL PROTECTED]
 CC: r-help@stat.math.ethz.ch
 Subject: Re: [R] How to read in this data format?
 Date: Thu, 1 Mar 2007 16:46:21 -0500
 
 On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
 Dear All,
 
 thanks for the replies, Jim Holtman has given a solution which fits my
 needs, but Gabor Grothendieck did the same thing,
 but it looks like the coding will allow faster processing (should check
 this
 out tomorrow on a big datafile).
 
 @gabor: I don't understand the use of the grep command:
 grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
 What is this expression  (^[1-9][0-9. ]*$|Time) actually doing?
 I looked in the help page, but couldn't find a suitable answer.
 
 I briefly discussed it in the first paragraph of my response.  It
 matches and returns only those lines that start (^ matches start of line)
 with a digit, i.e. [1-9], and contains only digits, dots and spaces,
 i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means
 or) contains the word Time.
 If you don't have lines like ... (which you did in your example) then
 the regexp
 could be simplified to ^[0-9. ]+$|Time.  You may need to match tabs too
 if your input contains those.
 
 
 
 Thanks to All
 
 
 Bart
 
 - Original Message -
 From: Gabor Grothendieck [EMAIL PROTECTED]
 To: Bart Joosen [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Thursday, March 01, 2007 6:35 PM
 Subject: Re: [R] How to read in this data format?
 
 
   Read in the data using readLines, extract out
   all desired lines (namely those containing only
   numbers, dots and spaces or those with the
   word Time) and remove Retention from all
   lines so that all remaining lines have two
   fields.  Now that we have desired lines
   and all lines have two fields read them in
   using read.table.
  
   Finally, split them into groups and restructure
   them using by and in the last line we
   convert the by output to a data frame.
  
   At the end we display an alternate function f
   for use with by should we wish to generate long
   rather than wide output (using the terminology
   of the reshape command).
  
  
   Lines - $$ Experiment Number:
   $$ Associated Data:
  
   FUNCTION 1
  
   Scan1
   Retention Time  0.017
  
   399.8112184
   399.87420
   399.9372152
   
  
   Scan2
   Retention Time  0.021
  
   399.8112181
   399.87421
   399.9372153
   
  
   # replace next line with: Lines. - readLines(myfile.dat)
   Lines. - readLines(textConnection(Lines))
   Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
   Lines. - gsub(Retention, , Lines.)
  
   DF - read.table(textConnection(Lines.), as.is = TRUE)
   closeAllConnections()
  
   f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
   out.by - by(DF, cumsum(DF[,1] == Time), f)
   as.data.frame(do.call(rbind, out.by))
  
  
   We could alternately consider producing long
   format by replacing the function f with:
  
   f - function(x) data.frame(x[-1,], id = x[1,2])
  
  
   On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
   Hi,
  
   I recieved an ascii file, containing following information:
  
   $$ Experiment Number:
   $$ Associated Data:
  
   FUNCTION 1
  
   Scan1
   Retention Time  0.017
  
   399.8112184
   399.87420
   399.9372152
   
  
   Scan2
   Retention Time  0.021
  
   399.8112181
   399.87421
   399.9372153
   .
  
  
   I would like to import this data in R into a dataframe, where there
 is
 a
   column time, the first numbers as column names, and the second
 numbers
 as
   data in the dataframe:
  
   Time399.8112399.8742399.9372
   0.017   184 0   152

Re: [R] How to read in this data format?

2007-03-01 Thread Petr Klasterecky
Well, not extremely elegant, but should work:
1) open your file in some ascii text editor, delete the rubbish at the 
beginning up to line Scan 1, and replace all spaces in names - e.g. make 
a mass replace of 'Retention Time' by let say 'RetentionTime'.

2) Use read.table(), matrix() and data.frame():
d - read.table('yourfile')
dd - matrix(as.numeric(t(d)[2,]),byrow=TRUE,nrow=HowManyScansYouHave)
dd - data.frame(dd)
names(dd) - d[[1]][1:HowManyObservationsYouHavePerScan]

Petr

Bart Joosen napsal(a):
 Hi,
 
 I recieved an ascii file, containing following information:
 
 $$ Experiment Number:
 $$ Associated Data:
 
 FUNCTION 1
 
 Scan  1
 Retention Time0.017
 
 399.8112  184
 399.8742  0
 399.9372  152
 
 
 Scan  2
 Retention Time0.021
 
 399.8112  181
 399.8742  1
 399.9372  153
 .
 
 
 I would like to import this data in R into a dataframe, where there is a 
 column time, the first numbers as column names, and the second numbers as 
 data in the dataframe:
 
 Time  399.8112399.8742399.9372
 0.017 184 0   152
 0.021 181 1   153
 
 I did take a look at the read.table, read.delim, scan, ... But I 've no idea 
 about how to solve this problem.
 
 Anyone?
 
 
 Thanks
 
 Bart
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Petr Klasterecky
Dept. of Probability and Statistics
Charles University in Prague
Czech Republic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread Liaw, Andy
You can't expect general-purpose tools like read.table in R to be able
to deal with highly specialized file format.  Here's what I'd start.  It
doesn't put data in the format you specified exactly, but I doubt you'll
need that.  This might be sufficient for your purpose:

dat - readLines(file(yourdata.dat))
## Get rid of blank lines.
dat - dat[dat != ]
scan.lines - grep(Scan, dat)
## Chop off the header rows.
dat - dat[scan.lines[1]:length(dat)]
scan.lines - scan.lines - scan.lines[1] + 1
lines.per.scan - c(scan.lines[-1], length(dat) + 1) - scan.lines
## Split the data into a list, with each scan taking up one component.
dat - split(dat, rep(seq(along=lines.per.scan), each=lines.per.scan))
## Process the data one scan at a time.
result - lapply(dat, function(x) {
x - strsplit(x, \t)
rtime - x[[2]][2]  # second field of second line
t(matrix(as.numeric(do.call(rbind, c(rtime, x[-(1:2)]))), ncol=2))
})

This is what I get from the data you've shown:

R result
$`1`
  [,1] [,2] [,3] [,4]
[1,] 0.017 399.8112 399.8742 399.9372
[2,] 0.017 184.   0. 152.

$`2`
  [,1] [,2] [,3] [,4]
[1,] 0.021 399.8112 399.8742 399.9372
[2,] 0.021 181.   1. 153.

Note that you probably should avoid using numbers as column names in a
data frame, even if it's possible.

Andy


From: Bart Joosen
 
 Hi,
 
 I recieved an ascii file, containing following information:
 
 $$ Experiment Number:
 $$ Associated Data:
 
 FUNCTION 1
 
 Scan  1
 Retention Time0.017
 
 399.8112  184
 399.8742  0
 399.9372  152
 
 
 Scan  2
 Retention Time0.021
 
 399.8112  181
 399.8742  1
 399.9372  153
 .
 
 
 I would like to import this data in R into a dataframe, where 
 there is a column time, the first numbers as column names, 
 and the second numbers as data in the dataframe:
 
 Time  399.8112399.8742399.9372
 0.017 184 0   152
 0.021 181 1   153
 
 I did take a look at the read.table, read.delim, scan, ... 
 But I 've no idea about how to solve this problem.
 
 Anyone?
 
 
 Thanks
 
 Bart
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread Gabor Grothendieck
Read in the data using readLines, extract out
all desired lines (namely those containing only
numbers, dots and spaces or those with the
word Time) and remove Retention from all
lines so that all remaining lines have two
fields.  Now that we have desired lines
and all lines have two fields read them in
using read.table.

Finally, split them into groups and restructure
them using by and in the last line we
convert the by output to a data frame.

At the end we display an alternate function f
for use with by should we wish to generate long
rather than wide output (using the terminology
of the reshape command).


Lines - $$ Experiment Number:
$$ Associated Data:

FUNCTION 1

Scan1
Retention Time  0.017

399.8112184
399.87420
399.9372152


Scan2
Retention Time  0.021

399.8112181
399.87421
399.9372153


# replace next line with: Lines. - readLines(myfile.dat)
Lines. - readLines(textConnection(Lines))
Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
Lines. - gsub(Retention, , Lines.)

DF - read.table(textConnection(Lines.), as.is = TRUE)
closeAllConnections()

f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
out.by - by(DF, cumsum(DF[,1] == Time), f)
as.data.frame(do.call(rbind, out.by))


We could alternately consider producing long
format by replacing the function f with:

f - function(x) data.frame(x[-1,], id = x[1,2])


On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
 Hi,

 I recieved an ascii file, containing following information:

 $$ Experiment Number:
 $$ Associated Data:

 FUNCTION 1

 Scan1
 Retention Time  0.017

 399.8112184
 399.87420
 399.9372152
 

 Scan2
 Retention Time  0.021

 399.8112181
 399.87421
 399.9372153
 .


 I would like to import this data in R into a dataframe, where there is a
 column time, the first numbers as column names, and the second numbers as
 data in the dataframe:

 Time399.8112399.8742399.9372
 0.017   184 0   152
 0.021   181 1   153

 I did take a look at the read.table, read.delim, scan, ... But I 've no idea
 about how to solve this problem.

 Anyone?


 Thanks

 Bart

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread jim holtman
Here is one way of doing it using the reshape package:

 # test data from email
 x - $$ Experiment Number:
+ $$ Associated Data:
+ 
+ FUNCTION 1
+ 
+ Scan  1
+ Retention Time 0.017
+ 
+ 399.8112 184
+ 399.8742 0
+ 399.9372 152
+ 
+ 
+ Scan  2
+ Retention Time 0.021
+ 
+ 399.8112 181
+ 399.8742 1
+ 399.9372 153
+ .
+ 
 # read in the vector
 x.in - readLines(textConnection(x))
 result - list()# output list
 i.result - 1
 # process each line
 for (i in x.in){
+ # if Retention, pick off the time
+ if (regexpr(^Retention, i)  0){
+ time - gsub(^Ret.*?([0-9.]+), \\1, i, perl=TRUE)
+ } else if (regexpr(^\\d+, i, perl=TRUE)  0){
+ # if data, parse it and store in result
+ idVal - strsplit(i, \\s+)
+ result[[i.result]] - c(time, idVal[[1]])
+ i.result - i.result + 1
+ }
+ }
 # create data frame
 df - as.data.frame(do.call(rbind, result))
 colnames(df) - c('time', 'id', 'value')
 require(reshape) # use reshape package
Loading required package: reshape
[1] TRUE
 y - melt(df)
 # convert to long
 cast(y, time ~ id)
   time X399.8112 X399.8742 X399.9372
1 0.017   184 0   152
2 0.021   181 1   153
 
 

 
Jim Holtman

What is the problem you are trying to solve?



- Original Message 
From: Gabor Grothendieck [EMAIL PROTECTED]
To: Bart Joosen [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Thursday, March 1, 2007 12:35:43 PM
Subject: Re: [R] How to read in this data format?


Read in the data using readLines, extract out
all desired lines (namely those containing only
numbers, dots and spaces or those with the
word Time) and remove Retention from all
lines so that all remaining lines have two
fields.  Now that we have desired lines
and all lines have two fields read them in
using read.table.

Finally, split them into groups and restructure
them using by and in the last line we
convert the by output to a data frame.

At the end we display an alternate function f
for use with by should we wish to generate long
rather than wide output (using the terminology
of the reshape command).


Lines - $$ Experiment Number:
$$ Associated Data:

FUNCTION 1

Scan1
Retention Time  0.017

399.8112184
399.87420
399.9372152


Scan2
Retention Time  0.021

399.8112181
399.87421
399.9372153


# replace next line with: Lines. - readLines(myfile.dat)
Lines. - readLines(textConnection(Lines))
Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
Lines. - gsub(Retention, , Lines.)

DF - read.table(textConnection(Lines.), as.is = TRUE)
closeAllConnections()

f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
out.by - by(DF, cumsum(DF[,1] == Time), f)
as.data.frame(do.call(rbind, out.by))


We could alternately consider producing long
format by replacing the function f with:

f - function(x) data.frame(x[-1,], id = x[1,2])


On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
 Hi,

 I recieved an ascii file, containing following information:

 $$ Experiment Number:
 $$ Associated Data:

 FUNCTION 1

 Scan1
 Retention Time  0.017

 399.8112184
 399.87420
 399.9372152
 

 Scan2
 Retention Time  0.021

 399.8112181
 399.87421
 399.9372153
 .


 I would like to import this data in R into a dataframe, where there is a
 column time, the first numbers as column names, and the second numbers as
 data in the dataframe:

 Time399.8112399.8742399.9372
 0.017   184 0   152
 0.021   181 1   153

 I did take a look at the read.table, read.delim, scan, ... But I 've no idea
 about how to solve this problem.

 Anyone?


 Thanks

 Bart

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


 

It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread Bart Joosen
Dear All,

thanks for the replies, Jim Holtman has given a solution which fits my 
needs, but Gabor Grothendieck did the same thing,
but it looks like the coding will allow faster processing (should check this 
out tomorrow on a big datafile).

@gabor: I don't understand the use of the grep command:
grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
What is this expression  (^[1-9][0-9. ]*$|Time) actually doing?
I looked in the help page, but couldn't find a suitable answer.


Thanks to All


Bart

- Original Message - 
From: Gabor Grothendieck [EMAIL PROTECTED]
To: Bart Joosen [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Thursday, March 01, 2007 6:35 PM
Subject: Re: [R] How to read in this data format?


 Read in the data using readLines, extract out
 all desired lines (namely those containing only
 numbers, dots and spaces or those with the
 word Time) and remove Retention from all
 lines so that all remaining lines have two
 fields.  Now that we have desired lines
 and all lines have two fields read them in
 using read.table.

 Finally, split them into groups and restructure
 them using by and in the last line we
 convert the by output to a data frame.

 At the end we display an alternate function f
 for use with by should we wish to generate long
 rather than wide output (using the terminology
 of the reshape command).


 Lines - $$ Experiment Number:
 $$ Associated Data:

 FUNCTION 1

 Scan1
 Retention Time  0.017

 399.8112184
 399.87420
 399.9372152
 

 Scan2
 Retention Time  0.021

 399.8112181
 399.87421
 399.9372153
 

 # replace next line with: Lines. - readLines(myfile.dat)
 Lines. - readLines(textConnection(Lines))
 Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
 Lines. - gsub(Retention, , Lines.)

 DF - read.table(textConnection(Lines.), as.is = TRUE)
 closeAllConnections()

 f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
 out.by - by(DF, cumsum(DF[,1] == Time), f)
 as.data.frame(do.call(rbind, out.by))


 We could alternately consider producing long
 format by replacing the function f with:

 f - function(x) data.frame(x[-1,], id = x[1,2])


 On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
 Hi,

 I recieved an ascii file, containing following information:

 $$ Experiment Number:
 $$ Associated Data:

 FUNCTION 1

 Scan1
 Retention Time  0.017

 399.8112184
 399.87420
 399.9372152
 

 Scan2
 Retention Time  0.021

 399.8112181
 399.87421
 399.9372153
 .


 I would like to import this data in R into a dataframe, where there is a
 column time, the first numbers as column names, and the second numbers as
 data in the dataframe:

 Time399.8112399.8742399.9372
 0.017   184 0   152
 0.021   181 1   153

 I did take a look at the read.table, read.delim, scan, ... But I 've no 
 idea
 about how to solve this problem.

 Anyone?


 Thanks

 Bart

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread Gabor Grothendieck
On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
 Dear All,

 thanks for the replies, Jim Holtman has given a solution which fits my
 needs, but Gabor Grothendieck did the same thing,
 but it looks like the coding will allow faster processing (should check this
 out tomorrow on a big datafile).

 @gabor: I don't understand the use of the grep command:
grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
 What is this expression  (^[1-9][0-9. ]*$|Time) actually doing?
 I looked in the help page, but couldn't find a suitable answer.

I briefly discussed it in the first paragraph of my response.  It
matches and returns only those lines that start (^ matches start of line)
with a digit, i.e. [1-9], and contains only digits, dots and spaces,
i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means
or) contains the word Time.
If you don't have lines like ... (which you did in your example) then
the regexp
could be simplified to ^[0-9. ]+$|Time.  You may need to match tabs too
if your input contains those.



 Thanks to All


 Bart

 - Original Message -
 From: Gabor Grothendieck [EMAIL PROTECTED]
 To: Bart Joosen [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Thursday, March 01, 2007 6:35 PM
 Subject: Re: [R] How to read in this data format?


  Read in the data using readLines, extract out
  all desired lines (namely those containing only
  numbers, dots and spaces or those with the
  word Time) and remove Retention from all
  lines so that all remaining lines have two
  fields.  Now that we have desired lines
  and all lines have two fields read them in
  using read.table.
 
  Finally, split them into groups and restructure
  them using by and in the last line we
  convert the by output to a data frame.
 
  At the end we display an alternate function f
  for use with by should we wish to generate long
  rather than wide output (using the terminology
  of the reshape command).
 
 
  Lines - $$ Experiment Number:
  $$ Associated Data:
 
  FUNCTION 1
 
  Scan1
  Retention Time  0.017
 
  399.8112184
  399.87420
  399.9372152
  
 
  Scan2
  Retention Time  0.021
 
  399.8112181
  399.87421
  399.9372153
  
 
  # replace next line with: Lines. - readLines(myfile.dat)
  Lines. - readLines(textConnection(Lines))
  Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
  Lines. - gsub(Retention, , Lines.)
 
  DF - read.table(textConnection(Lines.), as.is = TRUE)
  closeAllConnections()
 
  f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
  out.by - by(DF, cumsum(DF[,1] == Time), f)
  as.data.frame(do.call(rbind, out.by))
 
 
  We could alternately consider producing long
  format by replacing the function f with:
 
  f - function(x) data.frame(x[-1,], id = x[1,2])
 
 
  On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
  Hi,
 
  I recieved an ascii file, containing following information:
 
  $$ Experiment Number:
  $$ Associated Data:
 
  FUNCTION 1
 
  Scan1
  Retention Time  0.017
 
  399.8112184
  399.87420
  399.9372152
  
 
  Scan2
  Retention Time  0.021
 
  399.8112181
  399.87421
  399.9372153
  .
 
 
  I would like to import this data in R into a dataframe, where there is a
  column time, the first numbers as column names, and the second numbers as
  data in the dataframe:
 
  Time399.8112399.8742399.9372
  0.017   184 0   152
  0.021   181 1   153
 
  I did take a look at the read.table, read.delim, scan, ... But I 've no
  idea
  about how to solve this problem.
 
  Anyone?
 
 
  Thanks
 
  Bart
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read in this data format?

2007-03-01 Thread Bart Joosen
Gabor,

thanks for the clarification, now I understand the expression.

Thanks to everyone


Bart


From: Gabor Grothendieck [EMAIL PROTECTED]
To: Bart Joosen [EMAIL PROTECTED]
CC: r-help@stat.math.ethz.ch
Subject: Re: [R] How to read in this data format?
Date: Thu, 1 Mar 2007 16:46:21 -0500

On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
Dear All,

thanks for the replies, Jim Holtman has given a solution which fits my
needs, but Gabor Grothendieck did the same thing,
but it looks like the coding will allow faster processing (should check 
this
out tomorrow on a big datafile).

@gabor: I don't understand the use of the grep command:
grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
What is this expression  (^[1-9][0-9. ]*$|Time) actually doing?
I looked in the help page, but couldn't find a suitable answer.

I briefly discussed it in the first paragraph of my response.  It
matches and returns only those lines that start (^ matches start of line)
with a digit, i.e. [1-9], and contains only digits, dots and spaces,
i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means
or) contains the word Time.
If you don't have lines like ... (which you did in your example) then
the regexp
could be simplified to ^[0-9. ]+$|Time.  You may need to match tabs too
if your input contains those.



Thanks to All


Bart

- Original Message -
From: Gabor Grothendieck [EMAIL PROTECTED]
To: Bart Joosen [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Thursday, March 01, 2007 6:35 PM
Subject: Re: [R] How to read in this data format?


  Read in the data using readLines, extract out
  all desired lines (namely those containing only
  numbers, dots and spaces or those with the
  word Time) and remove Retention from all
  lines so that all remaining lines have two
  fields.  Now that we have desired lines
  and all lines have two fields read them in
  using read.table.
 
  Finally, split them into groups and restructure
  them using by and in the last line we
  convert the by output to a data frame.
 
  At the end we display an alternate function f
  for use with by should we wish to generate long
  rather than wide output (using the terminology
  of the reshape command).
 
 
  Lines - $$ Experiment Number:
  $$ Associated Data:
 
  FUNCTION 1
 
  Scan1
  Retention Time  0.017
 
  399.8112184
  399.87420
  399.9372152
  
 
  Scan2
  Retention Time  0.021
 
  399.8112181
  399.87421
  399.9372153
  
 
  # replace next line with: Lines. - readLines(myfile.dat)
  Lines. - readLines(textConnection(Lines))
  Lines. - grep(^[1-9][0-9. ]*$|Time, Lines., value = TRUE)
  Lines. - gsub(Retention, , Lines.)
 
  DF - read.table(textConnection(Lines.), as.is = TRUE)
  closeAllConnections()
 
  f - function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
  out.by - by(DF, cumsum(DF[,1] == Time), f)
  as.data.frame(do.call(rbind, out.by))
 
 
  We could alternately consider producing long
  format by replacing the function f with:
 
  f - function(x) data.frame(x[-1,], id = x[1,2])
 
 
  On 3/1/07, Bart Joosen [EMAIL PROTECTED] wrote:
  Hi,
 
  I recieved an ascii file, containing following information:
 
  $$ Experiment Number:
  $$ Associated Data:
 
  FUNCTION 1
 
  Scan1
  Retention Time  0.017
 
  399.8112184
  399.87420
  399.9372152
  
 
  Scan2
  Retention Time  0.021
 
  399.8112181
  399.87421
  399.9372153
  .
 
 
  I would like to import this data in R into a dataframe, where there is 
a
  column time, the first numbers as column names, and the second numbers 
as
  data in the dataframe:
 
  Time399.8112399.8742399.9372
  0.017   184 0   152
  0.021   181 1   153
 
  I did take a look at the read.table, read.delim, scan, ... But I 've 
no
  idea
  about how to solve this problem.
 
  Anyone?
 
 
  Thanks
 
  Bart
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.