Hi, Thanks for the alternative approach. However, I should have made my example more complete in that other lines may also have numeric values, which I'm not interested in. Below is an updated problem, with my current solution,
tc <- textConnection( "some text <ax> = 1.3770E-03 <bx> = 3.4644E-07 <ay> = 1.9412E-04 <by> = 4.8840E-08 other text <aax> = 1.3770E-03 <bbx> = 3.4644E-07 <aay> = 1.9412E-04 <bby> = 4.8840E-08 lots of other material, including numeric values 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5 etc...") input <- readLines(tc) close(tc) ## I want to retrieve the values for ## <ax>, <ay>, <aax> and <aay> only results <- c( strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, simplify = rbind), strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, simplify = rbind), strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, simplify = rbind), strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, simplify = rbind)) results Using the suggested base R solution, I've come up with this variation, z <- gsub("[^[:digit:]E.+-]"," ", grep("<ax>|<ay>|<aax>|<aay>", input, value=TRUE)) test <- scan(textConnection(z),what=0) test[seq(1, length(test), by=2)] Thanks again, baptiste 2009/11/18 Bert Gunter <gunter.ber...@gene.com>: > The previous elegant solutions required the use of the gsubfn package. > Nothing wrong with that, of course, but I'm always curious whether still > relatively simple base R solutions can be found, as they are often (but not > always!) much faster. And anyway, it seems to be in the spirit of your query > to try such a solution. So here is one base R approach that I believe works. > I'll break it up into 2 lines so you can see what's going on. > > ## Using your example... > ## First replace everything but the number with spaces > >> z <- gsub("[^[:digit:]E.+-]"," ",input) >> z > [1] " " > [2] " 1.3770E-03 3.4644E-07" > [3] " 1.9412E-04 4.8840E-08" > [4] "" > [5] " " > [6] " 1.3770E-03 3.4644E-07" > [7] " 1.9412E-04 4.8840E-08" > > ## Now it can be scanned to a numeric via > >> z<-scan(textConnection(z),what=0) > Read 8 items >> z > [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 > 1.9412e-04 4.8840e-08 > > ######## > I believe this strategy is reasonably general, but I haven't checked it > carefully and would appreciate folks pointing out where it trips up (e.g. > perhaps with NA's). > > Best, > > Bert Gunter > Genentech Nonclinical Biostatistics > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of baptiste auguie > Sent: Wednesday, November 18, 2009 3:57 AM > To: r-help > Subject: [R] parsing numeric values > > Dear list, > > I'm seeking advice to extract some numeric values from a log file > created by an external program. Consider the following example, > > input <- > readLines(textConnection( > "some text > <ax> = 1.3770E-03 <bx> = 3.4644E-07 > <ay> = 1.9412E-04 <by> = 4.8840E-08 > > other text > <aax> = 1.3770E-03 <bbx> = 3.4644E-07 > <aay> = 1.9412E-04 <bby> = 4.8840E-08")) > > ## this is what I want > results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]), > as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]), > as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]), > as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9]) > ) > > ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 > > The use of strsplit is not ideal here as there is a different number > of space characters in the lines containing <ax> and <aax> for > instance (hence the indices 8 and 9 respectively). > > I tried to use gsubfn for a cleaner construct, > > strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric) > > but I can't seem to find the correct regular expression to deal with > the exponent. > > > Any tips are welcome! > > > Best regards, > > baptiste > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.