[R] How to get values out of a string using regular expressions?

Joris Meys Fri, 28 May 2010 05:22:49 -0700

Dear all,

I have a vector of filenames which begins like this :
X <- c("OrthoP1_DNA_str.aln", "OrthoP10_DNA_str.aln",
"OrthoP100_DNA_str.aln",
"OrthoP101_DNA_str.aln", "OrthoP102_DNA_str.aln", "OrthoP103_DNA_str.aln",
"OrthoP104_DNA_str.aln", "OrthoP105_DNA_str.aln", "OrthoP106_DNA_str.aln",
"OrthoP107_DNA_str.aln")


using
grep("(\\d+)",X,perl=T,value=T)

I get the complete values back. Yet, I want a vector :

c(1,10,100,101,102,103,104,105,106,107)

In Perl, using the brackets allows for extracting only the numbers (using a
construct with $1 for those who know Perl).

I want to do the same in R, but can't find a way of doing that without
extensive string manipulations. Problem is that the length of the numbers
differ, so I can't use substr.
I tried
> strsplit(X,"\\d+")
[[1]]
[1] "OrthoP"       "_DNA_str.aln"
which gives me exactly what I want to throw away. So :
> strsplit(X,"\\D+")
[[1]]
[1] ""  "1"

[[2]]
[1] ""   "10"
gives something I can use, but it still requires a lot of list manipulation
afterwards to get the right vector. Is there an option or a function I'm
missing somewhere?

Cheers
Joris

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to get values out of a string using regular expressions?

Reply via email to