[R] regular expressions : extracting numbers

2007-07-30 Thread GOUACHE David
Hello all,

I have a vector of character strings, in which I have letters, numbers, and 
symbols. What I wish to do is obtain a vector of the same length with just the 
numbers.
A quick example -

extract of the original vector :
"lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" 
"rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"

and the type of thing I wish to end up with :
"2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""

or, instead of "", NA would be acceptable (actually it would almost be better 
for me)

Anyways, I've been battling with gsub() and things of the sort, but I'm 
drowning in the regular expressions, despite a few hours of looking at Perl 
tutorials...
So if anyone can help me out, it would be greatly appreciated!!

In advance, thanks very much.

David Gouache
Arvalis - Institut du Végétal
Station de La Minière
78280 Guyancourt
Tel: 01.30.12.96.22 / Port: 06.86.08.94.32

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Romain Francois
Bonjour David,

What about one of these :

R> gsub( "[^[:digit:]]", "", x )

or using perl regular expressions:

R> gsub( "\\D", "", x, perl = T )

Cheers,

Romain

GOUACHE David wrote:
> Hello all,
>
> I have a vector of character strings, in which I have letters, numbers, and 
> symbols. What I wish to do is obtain a vector of the same length with just 
> the numbers.
> A quick example -
>
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 
> 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
>
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
>
> or, instead of "", NA would be acceptable (actually it would almost be better 
> for me)
>
> Anyways, I've been battling with gsub() and things of the sort, but I'm 
> drowning in the regular expressions, despite a few hours of looking at Perl 
> tutorials...
> So if anyone can help me out, it would be greatly appreciated!!
>
> In advance, thanks very much.
>
> David Gouache
> Arvalis - Institut du Végétal
> Station de La Minière
> 78280 Guyancourt
> Tel: 01.30.12.96.22 / Port: 06.86.08.94.3


-- 
Mango Solutions
data analysis that delivers

Tel:  +44(0) 1249 467 467
Fax:  +44(0) 1249 467 468
Mob:  +44(0) 7813 526 123

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread jim holtman
Is this what you want:

> x
 [1] "lema, rb 2%"   "rb 2%" "rb 3%" "rb 4%"
"rb 3%" "rb 2%,mineuse"
 [7] "rb""rb""rb 12" "rb"
"rj 30%""rb"
[13] "rb""rb 25%""rb""rb"
"rb""rj, rb"
> gsub("[^0-9]*([0-9]*)[^0-9]*", "\\1", x)
 [1] "2"  "2"  "3"  "4"  "3"  "2"  ""   ""   "12" ""   "30" ""   ""
"25" ""   ""   ""   ""
>


On 7/30/07, GOUACHE David <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I have a vector of character strings, in which I have letters, numbers, and 
> symbols. What I wish to do is obtain a vector of the same length with just 
> the numbers.
> A quick example -
>
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 
> 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
>
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
>
> or, instead of "", NA would be acceptable (actually it would almost be better 
> for me)
>
> Anyways, I've been battling with gsub() and things of the sort, but I'm 
> drowning in the regular expressions, despite a few hours of looking at Perl 
> tutorials...
> So if anyone can help me out, it would be greatly appreciated!!
>
> In advance, thanks very much.
>
> David Gouache
> Arvalis - Institut du Végétal
> Station de La Minière
> 78280 Guyancourt
> Tel: 01.30.12.96.22 / Port: 06.86.08.94.32
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Vladimir Eremeev



GOUACHE David wrote:
> 
> Hello all,
> 
> I have a vector of character strings, in which I have letters, numbers,
> and symbols. What I wish to do is obtain a vector of the same length with
> just the numbers.
> A quick example -
> 
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb"
> "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
> 
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
> 
> or, instead of "", NA would be acceptable (actually it would almost be
> better for me)
> 

> chv<-scan(what="character",sep=" ") #then copy the text from your message
> to the clipboard and paste it to the R console
> chv
 [1] "lema, rb 2%"   "rb 2%" "rb 3%" "rb 4%"
 [5] "rb 3%" "rb 2%,mineuse" "rb""rb"   
 [9] "rb 12" "rb""rj 30%""rb"   
[13] "rb""rb 25%""rb""rb"   
[17] "rb""rj, rb"   

# actual replacements :

# replace non-digits with nothing
> chv.digits<-gsub("[^0-9]","",chv)
> chv.digits
 [1] "2"  "2"  "3"  "4"  "3"  "2"  ""   ""   "12" ""   "30" ""   ""   "25"
""  
[16] ""   ""   "" 

# replace empty strings with NA
> chv.digits[chv.digits==""]<-NA
> chv.digits
 [1] "2"  "2"  "3"  "4"  "3"  "2"  NA   NA   "12" NA   "30" NA   NA   "25"
NA  
[16] NA   NA   NA  

 
-- 
View this message in context: 
http://www.nabble.com/regular-expressions-%3A-extracting-numbers-tf4169660.html#a11862597
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Christian Ritz
Dear David,

does the following work for you?


sVec <- c("lema, rb 2%", "rb 2%", "rb 3%", "rb 4%", "rb 3%", "rb 2%,mineuse", 
"rb", "rb", 
"rb 12", "rb", "rj 30%", "rb", "rb", "rb 25%", "rb", "rb", "rb", "rj, rb")

reVec <- regexpr("[[:digit:]]+", sVec)
# see ?regex for details on '[:digit:]' and '+'

substr(sVec ,start = reVec, stop=reVec + attr(reVec, "match.length") - 1)
# see ?substr for details



Christian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Marc Schwartz
On Mon, 2007-07-30 at 13:58 +0200, GOUACHE David wrote:
> Hello all,
> 
> I have a vector of character strings, in which I have letters,
> numbers, and symbols. What I wish to do is obtain a vector of the same
> length with just the numbers.
> A quick example -
> 
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb"
> "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
> 
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
> 
> or, instead of "", NA would be acceptable (actually it would almost be
> better for me)
> 
> Anyways, I've been battling with gsub() and things of the sort, but
> I'm drowning in the regular expressions, despite a few hours of
> looking at Perl tutorials...
> So if anyone can help me out, it would be greatly appreciated!!
> 
> In advance, thanks very much.

Try this:

> Vec
 [1] "lema, rb 2%"   "rb 2%" "rb 3%" "rb 4%"
 [5] "rb 3%" "rb 2%,mineuse" "rb""rb"   
 [9] "rb 12" "rb""rj 30%""rb"   
[13] "rb""rb 25%""rb""rb"   
[17] "rb""rj, rb" 

> gsub("[^0-9]", "", Vec)
 [1] "2"  "2"  "3"  "4"  "3"  "2"  ""   ""   "12" ""   "30" ""   ""  
[14] "25" ""   ""   ""   ""  


The search pattern regex here is [^0-9] which says to replace anything
that is not (^) in the character range of 0 through 9.

See ?regex and/or http://www.regular-expressions.info/

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Gabor Grothendieck
I assume if you want the "" components to be NA then you really intend
the result to be a numeric vector.  The following replaces all non-digits
with "" (thereby removing them) and then uses as.numeric to convert the
result to numeric.  Just omit the conversion if you want a character
vector result:

s <- c("lema, rb 2%", "rb 2%", "rb 3%", "rb 4%", "rb 3%", "rb 2%,mineuse",
   "rb", "rb", "rb 12", "rb", "rj 30%", "rb", "rb", "rb 25%", "rb", "rb",
   "rb", "rj, rb")

as.numeric(gsub("[^[:digit:]]+", "", s))

On 7/30/07, GOUACHE David <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I have a vector of character strings, in which I have letters, numbers, and 
> symbols. What I wish to do is obtain a vector of the same length with just 
> the numbers.
> A quick example -
>
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 
> 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
>
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
>
> or, instead of "", NA would be acceptable (actually it would almost be better 
> for me)
>
> Anyways, I've been battling with gsub() and things of the sort, but I'm 
> drowning in the regular expressions, despite a few hours of looking at Perl 
> tutorials...
> So if anyone can help me out, it would be greatly appreciated!!
>
> In advance, thanks very much.
>
> David Gouache
> Arvalis - Institut du Végétal
> Station de La Minière
> 78280 Guyancourt
> Tel: 01.30.12.96.22 / Port: 06.86.08.94.32
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Jacques VESLOT
 > gsub(" ", "", gsub("%", "", gsub("[a-z]", "", c("tr3","jh40%qs  dqd"
[1] "3"  "40"


Jacques VESLOT

INRA - Biostatistique & Processus Spatiaux
Site Agroparc 84914 Avignon Cedex 9, France

Tel: +33 (0) 4 32 72 21 58
Fax: +33 (0) 4 32 72 21 84



GOUACHE David a écrit :
> Hello all,
>
> I have a vector of character strings, in which I have letters, numbers, and 
> symbols. What I wish to do is obtain a vector of the same length with just 
> the numbers.
> A quick example -
>
> extract of the original vector :
> "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 
> 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"
>
> and the type of thing I wish to end up with :
> "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""
>
> or, instead of "", NA would be acceptable (actually it would almost be better 
> for me)
>
> Anyways, I've been battling with gsub() and things of the sort, but I'm 
> drowning in the regular expressions, despite a few hours of looking at Perl 
> tutorials...
> So if anyone can help me out, it would be greatly appreciated!!
>
> In advance, thanks very much.
>
> David Gouache
> Arvalis - Institut du Végétal
> Station de La Minière
> 78280 Guyancourt
> Tel: 01.30.12.96.22 / Port: 06.86.08.94.32
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expressions : extracting numbers

2007-07-30 Thread Kuhn, Max
This might work:

> numOnly <- function(x) gsub("[^0-9]", "", x)
> numOnly("lema, rb 2%")
[1] "2"
> numOnly("rb")
[1] ""

Max

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of GOUACHE David
Sent: Monday, July 30, 2007 7:59 AM
To: r-help@stat.math.ethz.ch
Subject: [R] regular expressions : extracting numbers

Hello all,

I have a vector of character strings, in which I have letters, numbers, and 
symbols. What I wish to do is obtain a vector of the same length with just the 
numbers.
A quick example -

extract of the original vector :
"lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" 
"rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"

and the type of thing I wish to end up with :
"2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" ""

or, instead of "", NA would be acceptable (actually it would almost be better 
for me)

Anyways, I've been battling with gsub() and things of the sort, but I'm 
drowning in the regular expressions, despite a few hours of looking at Perl 
tutorials...
So if anyone can help me out, it would be greatly appreciated!!

In advance, thanks very much.

David Gouache
Arvalis - Institut du Végétal
Station de La Minière
78280 Guyancourt
Tel: 01.30.12.96.22 / Port: 06.86.08.94.32

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.