Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

David Winsemius Wed, 16 Sep 2009 07:53:52 -0700

That did not work on all three:

> strapply(x, "[A-Z]{3}[0-9]+")
[[1]]
NULL


[[2]]
[1] "CAA15575"

[[3]]
[1] "CAA17111"

But adding a "_" to the initiation pattern and a period to thetermination pattern makes it complete:


> library(gsubfn)
> strapply(x, "[A-Z_]{3}[0-9.]+")
[[1]]
[1] "YP_177963"

[[2]]
[1] "CAA15575"

[[3]]
[1] "CAA17111.1"

Maybe between the two of you and Jim Holtman, I can eventually learnhow to use regular expressions.


--
David.



On Sep 16, 2009, at 10:14 AM, Henrique Dallazuanna wrote:

Try this:

library(gsubfn)
strapply(x, "[A-Z]{3}[0-9]+")

On Wed, Sep 16, 2009 at 10:53 AM, Giulio Di Giovanni
<perimessagg...@hotmail.com> wrote:
Hi all,

I have thousands of strings like these ones:



"1159_1; YP_177963; PPE FAMILY PROTEIN"

"1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575"
"1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATEDEHYDROGENASE"
and various others..
I'm interested to extract the code for the protein (in thisexample: YP_177963, CAA15575, CAA17111).
I found only one common criterion to identify the protein codes inALL my strings:
I need a sequence of characters selected in this way:



start:
the first alphabetic capital letter followed after three charactersby a digit
end:

the last following digit before a non-digit character, or nothing.



Tricky, isn't it?
Well, I'm not an expert, and I played a lot with regularexpressions and sub() command with no big results. Also withsubstring.location in Hmisc package (but here I don't know how touse regular expressions).
Maybe there are other more useful functions or maybe is just amatter to use regular expression in a better way...
Can anybody help me?


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside

Reply via email to