Re: [R] parsing numeric values

2009-11-18 Thread Henrique Dallazuanna
Try this:

strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind,
combine = as.numeric)

On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
A minor variant might be the following:

   library(gsubfn)
   strapply(input, \\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify = rbind)

where:

- as.numeric is used in place of c in which case we do not need combine
- \\d+ matches one or more digits
- \\. matches a decimal point
- [-+]? matches -, + or nothing (i.e. an optional sign).
- parentheses around the regular expression not needed

On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind,
 combine = as.numeric)

 On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
 baptiste.aug...@googlemail.com wrote:
 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread baptiste auguie
Thanks a lot, both of you.

Incidentally, I made R crash when I forgot the X argument to strapply,

library(gsubfn)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
strapply(test, as.numeric)

 *** caught bus error ***
address 0x13c, cause 'non-existent physical address'

Traceback:
 1: .External(dotTclcallback, ..., PACKAGE = tcltk)
 2: .Tcl.callback(x, e)
 3: makeAtomicCallback(x, e)
 4: makeCallback(get(value, envir = ref), get(envir, envir = ref))
 5: FUN(X[[3L]], ...)
 6: lapply(val, val2obj)
 7: .Tcl.args.objv(...)
 8: structure(.External(dotTclObjv, objv, PACKAGE = tcltk), class
= tclObj)
 9: .Tcl.objv(.Tcl.args.objv(...))
10: tcl(set, e, e)
11: strapply1(x, pattern, backref, ignore.case)
12: FUN(test[[1L]], ...)
13: lapply(X, FUN, ...)
14: sapply(X, ff, simplify = is.logical(simplify)  simplify,
USE.NAMES = USE.NAMES)
15: strapply(test, as.numeric)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

sessionInfo()
R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods
[8] base

other attached packages:
[1] ggplot2_0.8.3  reshape_0.8.3  plyr_0.1.9 proto_0.3-8fortunes_1.3-6

2009/11/18 Gabor Grothendieck ggrothendi...@gmail.com:
 A minor variant might be the following:

   library(gsubfn)
   strapply(input, \\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify = rbind)

 where:

 - as.numeric is used in place of c in which case we do not need combine
 - \\d+ matches one or more digits
 - \\. matches a decimal point
 - [-+]? matches -, + or nothing (i.e. an optional sign).
 - parentheses around the regular expression not needed

 On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna www...@gmail.com 
 wrote:
 Try this:

 strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind,
 combine = as.numeric)

 On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
 baptiste.aug...@googlemail.com wrote:
 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
Thanks. This is now fixed in the development version so that it gives
an error rather than crashing:

 library(gsubfn)
Loading required package: proto
Loading required package: tcltk
Loading Tcl/Tk interface ... done
 source(http://gsubfn.googlecode.com/svn/trunk/R/gsubfn.R;)
 strapply(test, as.numeric)
Error in as.character(pattern) :
  cannot coerce type 'builtin' to vector of type 'character'


On Wed, Nov 18, 2009 at 8:49 AM, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Thanks a lot, both of you.

 Incidentally, I made R crash when I forgot the X argument to strapply,

 library(gsubfn)
 Loading required package: tcltk
 Loading Tcl/Tk interface ... done
 strapply(test, as.numeric)

  *** caught bus error ***
 address 0x13c, cause 'non-existent physical address'

 Traceback:
  1: .External(dotTclcallback, ..., PACKAGE = tcltk)
  2: .Tcl.callback(x, e)
  3: makeAtomicCallback(x, e)
  4: makeCallback(get(value, envir = ref), get(envir, envir = ref))
  5: FUN(X[[3L]], ...)
  6: lapply(val, val2obj)
  7: .Tcl.args.objv(...)
  8: structure(.External(dotTclObjv, objv, PACKAGE = tcltk), class
 = tclObj)
  9: .Tcl.objv(.Tcl.args.objv(...))
 10: tcl(set, e, e)
 11: strapply1(x, pattern, backref, ignore.case)
 12: FUN(test[[1L]], ...)
 13: lapply(X, FUN, ...)
 14: sapply(X, ff, simplify = is.logical(simplify)  simplify,
 USE.NAMES = USE.NAMES)
 15: strapply(test, as.numeric)

 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace

 sessionInfo()
 R version 2.10.0 (2009-10-26)
 i386-apple-darwin9.8.0

 locale:
 [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  grid      methods
 [8] base

 other attached packages:
 [1] ggplot2_0.8.3  reshape_0.8.3  plyr_0.1.9     proto_0.3-8    fortunes_1.3-6

 2009/11/18 Gabor Grothendieck ggrothendi...@gmail.com:
 A minor variant might be the following:

   library(gsubfn)
   strapply(input, \\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify = rbind)

 where:

 - as.numeric is used in place of c in which case we do not need combine
 - \\d+ matches one or more digits
 - \\. matches a decimal point
 - [-+]? matches -, + or nothing (i.e. an optional sign).
 - parentheses around the regular expression not needed

 On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna www...@gmail.com 
 wrote:
 Try this:

 strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind,
 combine = as.numeric)

 On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
 baptiste.aug...@googlemail.com wrote:
 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Bert Gunter
The previous elegant solutions required the use of the gsubfn package.
Nothing wrong with that, of course, but I'm always curious whether still
relatively simple base R solutions can be found, as they are often (but not
always!) much faster. And anyway, it seems to be in the spirit of your query
to try such a solution. So here is one base R approach that I believe works.
I'll break it up into 2 lines so you can see what's going on.

## Using your example...
## First replace everything but the number with spaces

 z - gsub([^[:digit:]E.+-], ,input)
 z
[1]   
[2] 1.3770E-03   3.4644E-07   
[3] 1.9412E-04   4.8840E-08   
[4]   
[5]   
[6]   1.3770E-033.4644E-07
[7]   1.9412E-044.8840E-08

## Now it can be scanned to a numeric via

 z-scan(textConnection(z),what=0)
Read 8 items
 z
[1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
1.9412e-04 4.8840e-08


I believe this strategy is reasonably general, but I haven't checked it
carefully and would appreciate folks pointing out where it trips up (e.g.
perhaps with NA's).

Best,

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of baptiste auguie
Sent: Wednesday, November 18, 2009 3:57 AM
To: r-help
Subject: [R] parsing numeric values

Dear list,

I'm seeking advice to extract some numeric values from a log file
created by an external program. Consider the following example,

input -
readLines(textConnection(
some text
  ax =1.3770E-03 bx =3.4644E-07
  ay =1.9412E-04 by =4.8840E-08

other text
  aax  =1.3770E-03 bbx =3.4644E-07
  aay  =1.9412E-04 bby =4.8840E-08))

## this is what I want
results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
 as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
 as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
 as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
 )

## [1] 0.00137700 0.00019412 0.00137700 0.00019412

The use of strsplit is not ideal here as there is a different number
of space characters in the lines containing ax and aax for
instance (hence the indices 8 and 9 respectively).

I tried to use gsubfn for a cleaner construct,

strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

but I can't seem to find the correct regular expression to deal with
the exponent.


Any tips are welcome!


Best regards,

baptiste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread baptiste auguie
Hi,

Thanks for the alternative approach. However, I should have made my
example more complete in that other lines may also have numeric
values, which I'm not interested in. Below is an updated problem, with
my current solution,

tc - textConnection(
some text
 ax =1.3770E-03 bx =3.4644E-07
 ay =1.9412E-04 by =4.8840E-08

other text
 aax  =1.3770E-03 bbx =3.4644E-07
 aay  =1.9412E-04 bby =4.8840E-08

lots of other material,  including numeric values
 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
etc...)

input -
readLines(tc)
close(tc)

## I want to retrieve the values for
## ax, ay, aax and aay only

results - c(
strapply(input, ax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
simplify = rbind),
strapply(input, ay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
simplify = rbind),
strapply(input, aax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
simplify = rbind),
strapply(input, aay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
simplify = rbind))

results

Using the suggested base R solution, I've come up with this variation,

z - gsub([^[:digit:]E.+-], , grep(ax|ay|aax|aay, input,
value=TRUE))

test - scan(textConnection(z),what=0)
test[seq(1, length(test), by=2)]


Thanks again,

baptiste

2009/11/18 Bert Gunter gunter.ber...@gene.com:
 The previous elegant solutions required the use of the gsubfn package.
 Nothing wrong with that, of course, but I'm always curious whether still
 relatively simple base R solutions can be found, as they are often (but not
 always!) much faster. And anyway, it seems to be in the spirit of your query
 to try such a solution. So here is one base R approach that I believe works.
 I'll break it up into 2 lines so you can see what's going on.

 ## Using your example...
 ## First replace everything but the number with spaces

 z - gsub([^[:digit:]E.+-], ,input)
 z
 [1]          
 [2]             1.3770E-03               3.4644E-07
 [3]             1.9412E-04               4.8840E-08
 [4] 
 [5]           
 [6]               1.3770E-03                3.4644E-07
 [7]               1.9412E-04                4.8840E-08

 ## Now it can be scanned to a numeric via

 z-scan(textConnection(z),what=0)
 Read 8 items
 z
 [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
 1.9412e-04 4.8840e-08

 
 I believe this strategy is reasonably general, but I haven't checked it
 carefully and would appreciate folks pointing out where it trips up (e.g.
 perhaps with NA's).

 Best,

 Bert Gunter
 Genentech Nonclinical Biostatistics

  -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of baptiste auguie
 Sent: Wednesday, November 18, 2009 3:57 AM
 To: r-help
 Subject: [R] parsing numeric values

 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
It only works if some text at the beginning has no digits, dots, E
characters or sign characters.

On Wed, Nov 18, 2009 at 12:44 PM, Bert Gunter gunter.ber...@gene.com wrote:
 The previous elegant solutions required the use of the gsubfn package.
 Nothing wrong with that, of course, but I'm always curious whether still
 relatively simple base R solutions can be found, as they are often (but not
 always!) much faster. And anyway, it seems to be in the spirit of your query
 to try such a solution. So here is one base R approach that I believe works.
 I'll break it up into 2 lines so you can see what's going on.

 ## Using your example...
 ## First replace everything but the number with spaces

 z - gsub([^[:digit:]E.+-], ,input)
 z
 [1]          
 [2]             1.3770E-03               3.4644E-07
 [3]             1.9412E-04               4.8840E-08
 [4] 
 [5]           
 [6]               1.3770E-03                3.4644E-07
 [7]               1.9412E-04                4.8840E-08

 ## Now it can be scanned to a numeric via

 z-scan(textConnection(z),what=0)
 Read 8 items
 z
 [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
 1.9412e-04 4.8840e-08

 
 I believe this strategy is reasonably general, but I haven't checked it
 carefully and would appreciate folks pointing out where it trips up (e.g.
 perhaps with NA's).

 Best,

 Bert Gunter
 Genentech Nonclinical Biostatistics

  -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of baptiste auguie
 Sent: Wednesday, November 18, 2009 3:57 AM
 To: r-help
 Subject: [R] parsing numeric values

 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing numeric values

2009-11-18 Thread Gabor Grothendieck
Here is a slight variation:

 read.table(textConnection(grep(aa?[xy], input, value = TRUE)),
+colClasses = c(NULL, NULL, numeric))
  V3 V6
1 0.00137700 3.4644e-07
2 0.00019412 4.8840e-08
3 0.00137700 3.4644e-07
4 0.00019412 4.8840e-08



On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Hi,

 Thanks for the alternative approach. However, I should have made my
 example more complete in that other lines may also have numeric
 values, which I'm not interested in. Below is an updated problem, with
 my current solution,

 tc - textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08

 lots of other material,  including numeric values
  1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
  12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
 etc...)

 input -
 readLines(tc)
 close(tc)

 ## I want to retrieve the values for
 ## ax, ay, aax and aay only

 results - c(
 strapply(input, ax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind),
 strapply(input, ay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind),
 strapply(input, aax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind),
 strapply(input, aay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind))

 results

 Using the suggested base R solution, I've come up with this variation,

 z - `, grep(ax|ay|aax|aay, input,
 value=TRUE))

 test - scan(textConnection(z),what=0)
 test[seq(1, length(test), by=2)]


 Thanks again,

 baptiste

 2009/11/18 Bert Gunter gunter.ber...@gene.com:
 The previous elegant solutions required the use of the gsubfn package.
 Nothing wrong with that, of course, but I'm always curious whether still
 relatively simple base R solutions can be found, as they are often (but not
 always!) much faster. And anyway, it seems to be in the spirit of your query
 to try such a solution. So here is one base R approach that I believe works.
 I'll break it up into 2 lines so you can see what's going on.

 ## Using your example...
 ## First replace everything but the number with spaces

 z - gsub([^[:digit:]E.+-], ,input)
 z
 [1]          
 [2]             1.3770E-03               3.4644E-07
 [3]             1.9412E-04               4.8840E-08
 [4] 
 [5]           
 [6]               1.3770E-03                3.4644E-07
 [7]               1.9412E-04                4.8840E-08

 ## Now it can be scanned to a numeric via

 z-scan(textConnection(z),what=0)
 Read 8 items
 z
 [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
 1.9412e-04 4.8840e-08

 
 I believe this strategy is reasonably general, but I haven't checked it
 carefully and would appreciate folks pointing out where it trips up (e.g.
 perhaps with NA's).

 Best,

 Bert Gunter
 Genentech Nonclinical Biostatistics

  -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of baptiste auguie
 Sent: Wednesday, November 18, 2009 3:57 AM
 To: r-help
 Subject: [R] parsing numeric values

 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing 

Re: [R] parsing numeric values

2009-11-18 Thread baptiste auguie
another useful trick that could come in handy, thanks!

baptiste

2009/11/18 Gabor Grothendieck ggrothendi...@gmail.com:
 Here is a slight variation:

 read.table(textConnection(grep(aa?[xy], input, value = TRUE)),
 +    colClasses = c(NULL, NULL, numeric))
          V3         V6
 1 0.00137700 3.4644e-07
 2 0.00019412 4.8840e-08
 3 0.00137700 3.4644e-07
 4 0.00019412 4.8840e-08



 On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
 baptiste.aug...@googlemail.com wrote:
 Hi,

 Thanks for the alternative approach. However, I should have made my
 example more complete in that other lines may also have numeric
 values, which I'm not interested in. Below is an updated problem, with
 my current solution,

 tc - textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08

 lots of other material,  including numeric values
  1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
  12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
 etc...)

 input -
 readLines(tc)
 close(tc)

 ## I want to retrieve the values for
 ## ax, ay, aax and aay only

 results - c(
 strapply(input, ax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind),
 strapply(input, ay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind),
 strapply(input, aax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind),
 strapply(input, aay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric,
 simplify = rbind))

 results

 Using the suggested base R solution, I've come up with this variation,

 z - `, grep(ax|ay|aax|aay, input,
 value=TRUE))

 test - scan(textConnection(z),what=0)
 test[seq(1, length(test), by=2)]


 Thanks again,

 baptiste

 2009/11/18 Bert Gunter gunter.ber...@gene.com:
 The previous elegant solutions required the use of the gsubfn package.
 Nothing wrong with that, of course, but I'm always curious whether still
 relatively simple base R solutions can be found, as they are often (but not
 always!) much faster. And anyway, it seems to be in the spirit of your query
 to try such a solution. So here is one base R approach that I believe works.
 I'll break it up into 2 lines so you can see what's going on.

 ## Using your example...
 ## First replace everything but the number with spaces

 z - gsub([^[:digit:]E.+-], ,input)
 z
 [1]          
 [2]             1.3770E-03               3.4644E-07
 [3]             1.9412E-04               4.8840E-08
 [4] 
 [5]           
 [6]               1.3770E-03                3.4644E-07
 [7]               1.9412E-04                4.8840E-08

 ## Now it can be scanned to a numeric via

 z-scan(textConnection(z),what=0)
 Read 8 items
 z
 [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
 1.9412e-04 4.8840e-08

 
 I believe this strategy is reasonably general, but I haven't checked it
 carefully and would appreciate folks pointing out where it trips up (e.g.
 perhaps with NA's).

 Best,

 Bert Gunter
 Genentech Nonclinical Biostatistics

  -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of baptiste auguie
 Sent: Wednesday, November 18, 2009 3:57 AM
 To: r-help
 Subject: [R] parsing numeric values

 Dear list,

 I'm seeking advice to extract some numeric values from a log file
 created by an external program. Consider the following example,

 input -
 readLines(textConnection(
 some text
  ax =    1.3770E-03     bx =    3.4644E-07
  ay =    1.9412E-04     by =    4.8840E-08

 other text
  aax  =    1.3770E-03     bbx =    3.4644E-07
  aay  =    1.9412E-04     bby =    4.8840E-08))

 ## this is what I want
 results - c(as.numeric(strsplit(grep(ax, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(ay, input,val=T),  )[[1]][8]),
             as.numeric(strsplit(grep(aax, input,val=T),  )[[1]][9]),
             as.numeric(strsplit(grep(aay, input,val=T),  )[[1]][9])
             )

 ## [1] 0.00137700 0.00019412 0.00137700 0.00019412

 The use of strsplit is not ideal here as there is a different number
 of space characters in the lines containing ax and aax for
 instance (hence the indices 8 and 9 respectively).

 I tried to use gsubfn for a cleaner construct,

 strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric)

 but I can't seem to find the correct regular expression to deal with
 the exponent.


 Any tips are welcome!


 Best regards,

 baptiste

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide