Re: [R] parsing numeric values
Try this: strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind, combine = as.numeric) On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
A minor variant might be the following: library(gsubfn) strapply(input, \\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify = rbind) where: - as.numeric is used in place of c in which case we do not need combine - \\d+ matches one or more digits - \\. matches a decimal point - [-+]? matches -, + or nothing (i.e. an optional sign). - parentheses around the regular expression not needed On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind, combine = as.numeric) On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
Thanks a lot, both of you. Incidentally, I made R crash when I forgot the X argument to strapply, library(gsubfn) Loading required package: tcltk Loading Tcl/Tk interface ... done strapply(test, as.numeric) *** caught bus error *** address 0x13c, cause 'non-existent physical address' Traceback: 1: .External(dotTclcallback, ..., PACKAGE = tcltk) 2: .Tcl.callback(x, e) 3: makeAtomicCallback(x, e) 4: makeCallback(get(value, envir = ref), get(envir, envir = ref)) 5: FUN(X[[3L]], ...) 6: lapply(val, val2obj) 7: .Tcl.args.objv(...) 8: structure(.External(dotTclObjv, objv, PACKAGE = tcltk), class = tclObj) 9: .Tcl.objv(.Tcl.args.objv(...)) 10: tcl(set, e, e) 11: strapply1(x, pattern, backref, ignore.case) 12: FUN(test[[1L]], ...) 13: lapply(X, FUN, ...) 14: sapply(X, ff, simplify = is.logical(simplify) simplify, USE.NAMES = USE.NAMES) 15: strapply(test, as.numeric) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace sessionInfo() R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] ggplot2_0.8.3 reshape_0.8.3 plyr_0.1.9 proto_0.3-8fortunes_1.3-6 2009/11/18 Gabor Grothendieck ggrothendi...@gmail.com: A minor variant might be the following: library(gsubfn) strapply(input, \\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify = rbind) where: - as.numeric is used in place of c in which case we do not need combine - \\d+ matches one or more digits - \\. matches a decimal point - [-+]? matches -, + or nothing (i.e. an optional sign). - parentheses around the regular expression not needed On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind, combine = as.numeric) On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
Thanks. This is now fixed in the development version so that it gives an error rather than crashing: library(gsubfn) Loading required package: proto Loading required package: tcltk Loading Tcl/Tk interface ... done source(http://gsubfn.googlecode.com/svn/trunk/R/gsubfn.R;) strapply(test, as.numeric) Error in as.character(pattern) : cannot coerce type 'builtin' to vector of type 'character' On Wed, Nov 18, 2009 at 8:49 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Thanks a lot, both of you. Incidentally, I made R crash when I forgot the X argument to strapply, library(gsubfn) Loading required package: tcltk Loading Tcl/Tk interface ... done strapply(test, as.numeric) *** caught bus error *** address 0x13c, cause 'non-existent physical address' Traceback: 1: .External(dotTclcallback, ..., PACKAGE = tcltk) 2: .Tcl.callback(x, e) 3: makeAtomicCallback(x, e) 4: makeCallback(get(value, envir = ref), get(envir, envir = ref)) 5: FUN(X[[3L]], ...) 6: lapply(val, val2obj) 7: .Tcl.args.objv(...) 8: structure(.External(dotTclObjv, objv, PACKAGE = tcltk), class = tclObj) 9: .Tcl.objv(.Tcl.args.objv(...)) 10: tcl(set, e, e) 11: strapply1(x, pattern, backref, ignore.case) 12: FUN(test[[1L]], ...) 13: lapply(X, FUN, ...) 14: sapply(X, ff, simplify = is.logical(simplify) simplify, USE.NAMES = USE.NAMES) 15: strapply(test, as.numeric) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace sessionInfo() R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] ggplot2_0.8.3 reshape_0.8.3 plyr_0.1.9 proto_0.3-8 fortunes_1.3-6 2009/11/18 Gabor Grothendieck ggrothendi...@gmail.com: A minor variant might be the following: library(gsubfn) strapply(input, \\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify = rbind) where: - as.numeric is used in place of c in which case we do not need combine - \\d+ matches one or more digits - \\. matches a decimal point - [-+]? matches -, + or nothing (i.e. an optional sign). - parentheses around the regular expression not needed On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: strapply(input, ([0-9]+\\.[0-9]+E-[0-9]+), c, simplify = rbind, combine = as.numeric) On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
The previous elegant solutions required the use of the gsubfn package. Nothing wrong with that, of course, but I'm always curious whether still relatively simple base R solutions can be found, as they are often (but not always!) much faster. And anyway, it seems to be in the spirit of your query to try such a solution. So here is one base R approach that I believe works. I'll break it up into 2 lines so you can see what's going on. ## Using your example... ## First replace everything but the number with spaces z - gsub([^[:digit:]E.+-], ,input) z [1] [2] 1.3770E-03 3.4644E-07 [3] 1.9412E-04 4.8840E-08 [4] [5] [6] 1.3770E-033.4644E-07 [7] 1.9412E-044.8840E-08 ## Now it can be scanned to a numeric via z-scan(textConnection(z),what=0) Read 8 items z [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 I believe this strategy is reasonably general, but I haven't checked it carefully and would appreciate folks pointing out where it trips up (e.g. perhaps with NA's). Best, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of baptiste auguie Sent: Wednesday, November 18, 2009 3:57 AM To: r-help Subject: [R] parsing numeric values Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax =1.3770E-03 bx =3.4644E-07 ay =1.9412E-04 by =4.8840E-08 other text aax =1.3770E-03 bbx =3.4644E-07 aay =1.9412E-04 bby =4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
Hi, Thanks for the alternative approach. However, I should have made my example more complete in that other lines may also have numeric values, which I'm not interested in. Below is an updated problem, with my current solution, tc - textConnection( some text ax =1.3770E-03 bx =3.4644E-07 ay =1.9412E-04 by =4.8840E-08 other text aax =1.3770E-03 bbx =3.4644E-07 aay =1.9412E-04 bby =4.8840E-08 lots of other material, including numeric values 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5 etc...) input - readLines(tc) close(tc) ## I want to retrieve the values for ## ax, ay, aax and aay only results - c( strapply(input, ax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, ay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, aax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, aay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind)) results Using the suggested base R solution, I've come up with this variation, z - gsub([^[:digit:]E.+-], , grep(ax|ay|aax|aay, input, value=TRUE)) test - scan(textConnection(z),what=0) test[seq(1, length(test), by=2)] Thanks again, baptiste 2009/11/18 Bert Gunter gunter.ber...@gene.com: The previous elegant solutions required the use of the gsubfn package. Nothing wrong with that, of course, but I'm always curious whether still relatively simple base R solutions can be found, as they are often (but not always!) much faster. And anyway, it seems to be in the spirit of your query to try such a solution. So here is one base R approach that I believe works. I'll break it up into 2 lines so you can see what's going on. ## Using your example... ## First replace everything but the number with spaces z - gsub([^[:digit:]E.+-], ,input) z [1] [2] 1.3770E-03 3.4644E-07 [3] 1.9412E-04 4.8840E-08 [4] [5] [6] 1.3770E-03 3.4644E-07 [7] 1.9412E-04 4.8840E-08 ## Now it can be scanned to a numeric via z-scan(textConnection(z),what=0) Read 8 items z [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 I believe this strategy is reasonably general, but I haven't checked it carefully and would appreciate folks pointing out where it trips up (e.g. perhaps with NA's). Best, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of baptiste auguie Sent: Wednesday, November 18, 2009 3:57 AM To: r-help Subject: [R] parsing numeric values Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
It only works if some text at the beginning has no digits, dots, E characters or sign characters. On Wed, Nov 18, 2009 at 12:44 PM, Bert Gunter gunter.ber...@gene.com wrote: The previous elegant solutions required the use of the gsubfn package. Nothing wrong with that, of course, but I'm always curious whether still relatively simple base R solutions can be found, as they are often (but not always!) much faster. And anyway, it seems to be in the spirit of your query to try such a solution. So here is one base R approach that I believe works. I'll break it up into 2 lines so you can see what's going on. ## Using your example... ## First replace everything but the number with spaces z - gsub([^[:digit:]E.+-], ,input) z [1] [2] 1.3770E-03 3.4644E-07 [3] 1.9412E-04 4.8840E-08 [4] [5] [6] 1.3770E-03 3.4644E-07 [7] 1.9412E-04 4.8840E-08 ## Now it can be scanned to a numeric via z-scan(textConnection(z),what=0) Read 8 items z [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 I believe this strategy is reasonably general, but I haven't checked it carefully and would appreciate folks pointing out where it trips up (e.g. perhaps with NA's). Best, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of baptiste auguie Sent: Wednesday, November 18, 2009 3:57 AM To: r-help Subject: [R] parsing numeric values Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing numeric values
Here is a slight variation: read.table(textConnection(grep(aa?[xy], input, value = TRUE)), +colClasses = c(NULL, NULL, numeric)) V3 V6 1 0.00137700 3.4644e-07 2 0.00019412 4.8840e-08 3 0.00137700 3.4644e-07 4 0.00019412 4.8840e-08 On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie baptiste.aug...@googlemail.com wrote: Hi, Thanks for the alternative approach. However, I should have made my example more complete in that other lines may also have numeric values, which I'm not interested in. Below is an updated problem, with my current solution, tc - textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08 lots of other material, including numeric values 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5 etc...) input - readLines(tc) close(tc) ## I want to retrieve the values for ## ax, ay, aax and aay only results - c( strapply(input, ax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, ay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, aax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, aay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind)) results Using the suggested base R solution, I've come up with this variation, z - `, grep(ax|ay|aax|aay, input, value=TRUE)) test - scan(textConnection(z),what=0) test[seq(1, length(test), by=2)] Thanks again, baptiste 2009/11/18 Bert Gunter gunter.ber...@gene.com: The previous elegant solutions required the use of the gsubfn package. Nothing wrong with that, of course, but I'm always curious whether still relatively simple base R solutions can be found, as they are often (but not always!) much faster. And anyway, it seems to be in the spirit of your query to try such a solution. So here is one base R approach that I believe works. I'll break it up into 2 lines so you can see what's going on. ## Using your example... ## First replace everything but the number with spaces z - gsub([^[:digit:]E.+-], ,input) z [1] [2] 1.3770E-03 3.4644E-07 [3] 1.9412E-04 4.8840E-08 [4] [5] [6] 1.3770E-03 3.4644E-07 [7] 1.9412E-04 4.8840E-08 ## Now it can be scanned to a numeric via z-scan(textConnection(z),what=0) Read 8 items z [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 I believe this strategy is reasonably general, but I haven't checked it carefully and would appreciate folks pointing out where it trips up (e.g. perhaps with NA's). Best, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of baptiste auguie Sent: Wednesday, November 18, 2009 3:57 AM To: r-help Subject: [R] parsing numeric values Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing
Re: [R] parsing numeric values
another useful trick that could come in handy, thanks! baptiste 2009/11/18 Gabor Grothendieck ggrothendi...@gmail.com: Here is a slight variation: read.table(textConnection(grep(aa?[xy], input, value = TRUE)), + colClasses = c(NULL, NULL, numeric)) V3 V6 1 0.00137700 3.4644e-07 2 0.00019412 4.8840e-08 3 0.00137700 3.4644e-07 4 0.00019412 4.8840e-08 On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie baptiste.aug...@googlemail.com wrote: Hi, Thanks for the alternative approach. However, I should have made my example more complete in that other lines may also have numeric values, which I'm not interested in. Below is an updated problem, with my current solution, tc - textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08 lots of other material, including numeric values 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5 etc...) input - readLines(tc) close(tc) ## I want to retrieve the values for ## ax, ay, aax and aay only results - c( strapply(input, ax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, ay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, aax += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind), strapply(input, aay += +(\\d+\\.\\d+E[-+]?\\d+), as.numeric, simplify = rbind)) results Using the suggested base R solution, I've come up with this variation, z - `, grep(ax|ay|aax|aay, input, value=TRUE)) test - scan(textConnection(z),what=0) test[seq(1, length(test), by=2)] Thanks again, baptiste 2009/11/18 Bert Gunter gunter.ber...@gene.com: The previous elegant solutions required the use of the gsubfn package. Nothing wrong with that, of course, but I'm always curious whether still relatively simple base R solutions can be found, as they are often (but not always!) much faster. And anyway, it seems to be in the spirit of your query to try such a solution. So here is one base R approach that I believe works. I'll break it up into 2 lines so you can see what's going on. ## Using your example... ## First replace everything but the number with spaces z - gsub([^[:digit:]E.+-], ,input) z [1] [2] 1.3770E-03 3.4644E-07 [3] 1.9412E-04 4.8840E-08 [4] [5] [6] 1.3770E-03 3.4644E-07 [7] 1.9412E-04 4.8840E-08 ## Now it can be scanned to a numeric via z-scan(textConnection(z),what=0) Read 8 items z [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 I believe this strategy is reasonably general, but I haven't checked it carefully and would appreciate folks pointing out where it trips up (e.g. perhaps with NA's). Best, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of baptiste auguie Sent: Wednesday, November 18, 2009 3:57 AM To: r-help Subject: [R] parsing numeric values Dear list, I'm seeking advice to extract some numeric values from a log file created by an external program. Consider the following example, input - readLines(textConnection( some text ax = 1.3770E-03 bx = 3.4644E-07 ay = 1.9412E-04 by = 4.8840E-08 other text aax = 1.3770E-03 bbx = 3.4644E-07 aay = 1.9412E-04 bby = 4.8840E-08)) ## this is what I want results - c(as.numeric(strsplit(grep(ax, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(ay, input,val=T), )[[1]][8]), as.numeric(strsplit(grep(aax, input,val=T), )[[1]][9]), as.numeric(strsplit(grep(aay, input,val=T), )[[1]][9]) ) ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 The use of strsplit is not ideal here as there is a different number of space characters in the lines containing ax and aax for instance (hence the indices 8 and 9 respectively). I tried to use gsubfn for a cleaner construct, strapply(input, ax += +([0-9.]+), c, simplify=rbind,combine=as.numeric) but I can't seem to find the correct regular expression to deal with the exponent. Any tips are welcome! Best regards, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide