On Tue, Jul 09, 2013 at 09:45:55AM +0000, PIKAL Petr wrote:
> Dear experts in regexpr.
> 
> I have this
> 
> dput(test[500:510])
> c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", 
> "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", 
> "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
> 
> and I want something like this
> 
> gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510])
>  [1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
> [11] "6,13"
> 
> but with 10,04 values instead of 0,04.
> 
> I tried
> gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
> 
> or other variations but without any success.
> 
> Please help.

The "1" in "10,04" is matched by ".*". In your example, all floating
comma numbers you're trying to extract are preceded by "pH ", so
replacing ".*" with ".*pH " should do what you want.

I'd be wary about that variation of having "RGLP 144006" in some
cases, though, it might be better to clean up this rubbish earlier
on (and it would be ideal to never have it generated in the first
place). Regular expressions can be useful to separate some chaff
from the wheat, but relying on that too much comes with a risk of
extracting something that is valid in some syntactic / technical
sense but not correct semantically. If you can't be 100% certain
that the number you want is (1) always preceded by "pH ", (2)
always a floating comma number and (3) will always contain an
integer and a fractional part (i.e. you'll never get ",09" rather
than "0,09", or "10" rather than "10,0"), you have to be prepared
for more difficulties, and you may want to consider a more systematic
approach to parsing your input.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jtt...@gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to