subject:"Re\: \[R\] extracting a matched string using regexpr"

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

Here are two ways to extract 5 digits.

In the first one \\1 refers to the portion matched between the
parentheses in the regular expression.

In the second one strapply is like apply where the object to be worked
on is the first argument (array for apply, string for strapply) the
second modifies it (which dimension for apply, regular expression for
strapply) and the last is a function which acts on each value
(typically each row or column for apply and each match for strapply).
In this case we use c as our function to just return all the results.
They are returned in a list with one component per string but here
test is just a single string so we get a list one long and we ask for
the contents of the first component using [[1]].

# 1 - sub
sub(.*(\\d{5}).*, \\1, test)

# 2 - strapply - see http://gsubfn.googlecode.com
library(gsubfn)
strapply(test, \\d{5}, c)[[1]]



On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote:
 Given a text like

 I want to be able to extract a matched regular expression from a piece of
 text.

 this apparently works, but is pretty ugly
 # some html
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 # a pattern to extract 5 digits
 pattern-[0-9]{5}
 # regexpr returns a start point[1] and an attribute match.length
 attr(,match.length)
 # get the substring from the start point to the stop point.. where stop =
 start +length-1

 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
 answer
 [1] 88958

 I tried using sub(pattern, replacement, x )  with a regexp that captured the
 group. I'd found an example of this in the mails
 but it didnt seem to work..

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Thanks I was looking at that package and reading your mails in the archive.
I think my tiny mind got twisted in the regexp..

On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 Here are two ways to extract 5 digits.

 In the first one \\1 refers to the portion matched between the
 parentheses in the regular expression.

 In the second one strapply is like apply where the object to be worked
 on is the first argument (array for apply, string for strapply) the
 second modifies it (which dimension for apply, regular expression for
 strapply) and the last is a function which acts on each value
 (typically each row or column for apply and each match for strapply).
 In this case we use c as our function to just return all the results.
 They are returned in a list with one component per string but here
 test is just a single string so we get a list one long and we ask for
 the contents of the first component using [[1]].

 # 1 - sub
 sub(.*(\\d{5}).*, \\1, test)

 # 2 - strapply - see http://gsubfn.googlecode.com
 library(gsubfn)
 strapply(test, \\d{5}, c)[[1]]



 On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a text like
 
  I want to be able to extract a matched regular expression from a piece of
  text.
 
  this apparently works, but is pretty ugly
  # some html
 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  # a pattern to extract 5 digits
  pattern-[0-9]{5}
  # regexpr returns a start point[1] and an attribute match.length
  attr(,match.length)
  # get the substring from the start point to the stop point.. where stop =
  start +length-1
 
 
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
  answer
  [1] 88958
 
  I tried using sub(pattern, replacement, x )  with a regexp that captured
 the
  group. I'd found an example of this in the mails
  but it didnt seem to work..


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

 test
[1]
/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 sub(.*(\\d{5}).*, \\1, test)
[1] /th
 sub(.*([0-9]{5}).*,\\1,test)
[1] 88958



I think the / in  the source throws something off.
as the group capture appears to not be working, except the bracket version
it did.


On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 Here are two ways to extract 5 digits.

 In the first one \\1 refers to the portion matched between the
 parentheses in the regular expression.

 In the second one strapply is like apply where the object to be worked
 on is the first argument (array for apply, string for strapply) the
 second modifies it (which dimension for apply, regular expression for
 strapply) and the last is a function which acts on each value
 (typically each row or column for apply and each match for strapply).
 In this case we use c as our function to just return all the results.
 They are returned in a list with one component per string but here
 test is just a single string so we get a list one long and we ask for
 the contents of the first component using [[1]].

 # 1 - sub
 sub(.*(\\d{5}).*, \\1, test)

 # 2 - strapply - see http://gsubfn.googlecode.com
 library(gsubfn)
 strapply(test, \\d{5}, c)[[1]]



 On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a text like
 
  I want to be able to extract a matched regular expression from a piece of
  text.
 
  this apparently works, but is pretty ugly
  # some html
 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  # a pattern to extract 5 digits
  pattern-[0-9]{5}
  # regexpr returns a start point[1] and an attribute match.length
  attr(,match.length)
  # get the substring from the start point to the stop point.. where stop =
  start +length-1
 
 
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
  answer
  [1] 88958
 
  I tried using sub(pattern, replacement, x )  with a regexp that captured
 the
  group. I'd found an example of this in the mails
  but it didnt seem to work..


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

That's not what I get:

 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 sub(.*(\\d{5}).*, \\1, test)
[1] 88958
 R.version.string
[1] R version 2.10.1 (2009-12-14)

I also got the above in R 2.11.0 patched as well.


On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote:
  test
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 sub(.*(\\d{5}).*, \\1, test)
 [1] /th
 sub(.*([0-9]{5}).*,\\1,test)
 [1] 88958


 I think the / in  the source throws something off.
 as the group capture appears to not be working, except the bracket version
 it did.

 On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 Here are two ways to extract 5 digits.

 In the first one \\1 refers to the portion matched between the
 parentheses in the regular expression.

 In the second one strapply is like apply where the object to be worked
 on is the first argument (array for apply, string for strapply) the
 second modifies it (which dimension for apply, regular expression for
 strapply) and the last is a function which acts on each value
 (typically each row or column for apply and each match for strapply).
 In this case we use c as our function to just return all the results.
 They are returned in a list with one component per string but here
 test is just a single string so we get a list one long and we ask for
 the contents of the first component using [[1]].

 # 1 - sub
 sub(.*(\\d{5}).*, \\1, test)

 # 2 - strapply - see http://gsubfn.googlecode.com
 library(gsubfn)
 strapply(test, \\d{5}, c)[[1]]



 On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
 wrote:
  Given a text like
 
  I want to be able to extract a matched regular expression from a piece
  of
  text.
 
  this apparently works, but is pretty ugly
  # some html
 
  test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  # a pattern to extract 5 digits
  pattern-[0-9]{5}
  # regexpr returns a start point[1] and an attribute match.length
  attr(,match.length)
  # get the substring from the start point to the stop point.. where stop
  =
  start +length-1
 
 
  answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
  answer
  [1] 88958
 
  I tried using sub(pattern, replacement, x )  with a regexp that captured
  the
  group. I'd found an example of this in the mails
  but it didnt seem to work..



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread David Winsemius



On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote:


Here are two ways to extract 5 digits.

In the first one \\1 refers to the portion matched between the
parentheses in the regular expression.

In the second one strapply is like apply where the object to be worked
on is the first argument (array for apply, string for strapply) the
second modifies it (which dimension for apply, regular expression for
strapply) and the last is a function which acts on each value
(typically each row or column for apply and each match for strapply).
In this case we use c as our function to just return all the results.
They are returned in a list with one component per string but here
test is just a single string so we get a list one long and we ask for
the contents of the first component using [[1]].

# 1 - sub
sub(.*(\\d{5}).*, \\1, test)

 test
[1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/ 
thth26m/th


I get different results than I expected given that \\d should be  
synonymous with [0-9]:


 sub(.*([0-9]{5}).*, \\1, test)
[1] 88958

 sub(.*(\\d{5}).*, \\1, test)
[1] /th

--
David.


# 2 - strapply - see http://gsubfn.googlecode.com
library(gsubfn)
strapply(test, \\d{5}, c)[[1]]



On Wed, May 5, 2010 at 5:13 PM, steven mosher  
mosherste...@gmail.com wrote:

Given a text like

I want to be able to extract a matched regular expression from a  
piece of

text.

this apparently works, but is pretty ugly
# some html
test-/trtrth88958/ththAbcdsef/thth67.8S/ 
thth68.9\nW/thth26m/th

# a pattern to extract 5 digits

pattern-[0-9]{5}

# regexpr returns a start point[1] and an attribute match.length
attr(,match.length)
# get the substring from the start point to the stop point.. where  
stop =

start +length-1


answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test) 
[1]+attr(regexpr(pattern,test),match.length)-1)

answer

[1] 88958

I tried using sub(pattern, replacement, x )  with a regexp that  
captured the

group. I'd found an example of this in the mails
but it didnt seem to work..


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Hmm.

I have R11 just downloaded fresh.

I'll reload a new session..and revert. I will note that I've had trouble
with \\d
which is why I was using [0-9]

MAC here.

On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 That's not what I get:

 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
 [1] 88958
  R.version.string
 [1] R version 2.10.1 (2009-12-14)

 I also got the above in R 2.11.0 patched as well.


 On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com
 wrote:
   test
  [1]
 
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
  [1] /th
  sub(.*([0-9]{5}).*,\\1,test)
  [1] 88958
 
 
  I think the / in  the source throws something off.
  as the group capture appears to not be working, except the bracket
 version
  it did.
 
  On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com
  wrote:
 
  Here are two ways to extract 5 digits.
 
  In the first one \\1 refers to the portion matched between the
  parentheses in the regular expression.
 
  In the second one strapply is like apply where the object to be worked
  on is the first argument (array for apply, string for strapply) the
  second modifies it (which dimension for apply, regular expression for
  strapply) and the last is a function which acts on each value
  (typically each row or column for apply and each match for strapply).
  In this case we use c as our function to just return all the results.
  They are returned in a list with one component per string but here
  test is just a single string so we get a list one long and we ask for
  the contents of the first component using [[1]].
 
  # 1 - sub
  sub(.*(\\d{5}).*, \\1, test)
 
  # 2 - strapply - see http://gsubfn.googlecode.com
  library(gsubfn)
  strapply(test, \\d{5}, c)[[1]]
 
 
 
  On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
  wrote:
   Given a text like
  
   I want to be able to extract a matched regular expression from a piece
   of
   text.
  
   this apparently works, but is pretty ugly
   # some html
  
  
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   # a pattern to extract 5 digits
   pattern-[0-9]{5}
   # regexpr returns a start point[1] and an attribute match.length
   attr(,match.length)
   # get the substring from the start point to the stop point.. where
 stop
   =
   start +length-1
  
  
  
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
   answer
   [1] 88958
  
   I tried using sub(pattern, replacement, x )  with a regexp that
 captured
   the
   group. I'd found an example of this in the mails
   but it didnt seem to work..
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

I am using Vista.  Another thing to try is strapply using the tcl
engine (assuming you do have tcltk capabilities) and the R engine.  On
Vista R 2.11.0 patched I get the same result:

 capabilities()[[tcltk]]
[1] TRUE
 strapply(test, \\d{5}, c, engine = tcl)[[1]]
[1] 88958
 strapply(test, \\d{5}, c, engine = R)[[1]]
[1] 88958

On Vista with R 2.9.2 I do get bad results:

 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 sub(.*(\\d{5}).*, \\1, test)
[1] 
/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 sub(.*(\\d{5}).*, \\1, test, extended = TRUE)
[1] 
/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
 R.version.string
[1] R version 2.9.2 Patched (2009-09-08 r49647)
 win.version()
[1] Windows Vista (build 6002) Service Pack 2


On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com wrote:
 Hmm.
 I have R11 just downloaded fresh.
 I'll reload a new session..and revert. I will note that I've had trouble
 with \\d
 which is why I was using [0-9]
 MAC here.

 On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 That's not what I get:

 
  test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
 [1] 88958
  R.version.string
 [1] R version 2.10.1 (2009-12-14)

 I also got the above in R 2.11.0 patched as well.


 On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com
 wrote:
   test
  [1]
 
  /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
  [1] /th
  sub(.*([0-9]{5}).*,\\1,test)
  [1] 88958
 
 
  I think the / in  the source throws something off.
  as the group capture appears to not be working, except the bracket
  version
  it did.
 
  On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
  ggrothendi...@gmail.com
  wrote:
 
  Here are two ways to extract 5 digits.
 
  In the first one \\1 refers to the portion matched between the
  parentheses in the regular expression.
 
  In the second one strapply is like apply where the object to be worked
  on is the first argument (array for apply, string for strapply) the
  second modifies it (which dimension for apply, regular expression for
  strapply) and the last is a function which acts on each value
  (typically each row or column for apply and each match for strapply).
  In this case we use c as our function to just return all the results.
  They are returned in a list with one component per string but here
  test is just a single string so we get a list one long and we ask for
  the contents of the first component using [[1]].
 
  # 1 - sub
  sub(.*(\\d{5}).*, \\1, test)
 
  # 2 - strapply - see http://gsubfn.googlecode.com
  library(gsubfn)
  strapply(test, \\d{5}, c)[[1]]
 
 
 
  On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com
  wrote:
   Given a text like
  
   I want to be able to extract a matched regular expression from a
   piece
   of
   text.
  
   this apparently works, but is pretty ugly
   # some html
  
  
   test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   # a pattern to extract 5 digits
   pattern-[0-9]{5}
   # regexpr returns a start point[1] and an attribute match.length
   attr(,match.length)
   # get the substring from the start point to the stop point.. where
   stop
   =
   start +length-1
  
  
  
   answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
   answer
   [1] 88958
  
   I tried using sub(pattern, replacement, x )  with a regexp that
   captured
   the
   group. I'd found an example of this in the mails
   but it didnt seem to work..
 
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Thnks,

perhaps we should report it

On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck
ggrothendi...@gmail.comwrote:

 I am using Vista.  Another thing to try is strapply using the tcl
 engine (assuming you do have tcltk capabilities) and the R engine.  On
 Vista R 2.11.0 patched I get the same result:

  capabilities()[[tcltk]]
 [1] TRUE
  strapply(test, \\d{5}, c, engine = tcl)[[1]]
 [1] 88958
  strapply(test, \\d{5}, c, engine = R)[[1]]
 [1] 88958

 On Vista with R 2.9.2 I do get bad results:

 
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test, extended = TRUE)
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  R.version.string
 [1] R version 2.9.2 Patched (2009-09-08 r49647)
  win.version()
 [1] Windows Vista (build 6002) Service Pack 2


 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com
 wrote:
  Hmm.
  I have R11 just downloaded fresh.
  I'll reload a new session..and revert. I will note that I've had trouble
  with \\d
  which is why I was using [0-9]
  MAC here.
 
  On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck 
 ggrothendi...@gmail.com
  wrote:
 
  That's not what I get:
 
  
  
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   sub(.*(\\d{5}).*, \\1, test)
  [1] 88958
   R.version.string
  [1] R version 2.10.1 (2009-12-14)
 
  I also got the above in R 2.11.0 patched as well.
 
 
  On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com
  wrote:
test
   [1]
  
  
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   sub(.*(\\d{5}).*, \\1, test)
   [1] /th
   sub(.*([0-9]{5}).*,\\1,test)
   [1] 88958
  
  
   I think the / in  the source throws something off.
   as the group capture appears to not be working, except the bracket
   version
   it did.
  
   On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
   ggrothendi...@gmail.com
   wrote:
  
   Here are two ways to extract 5 digits.
  
   In the first one \\1 refers to the portion matched between the
   parentheses in the regular expression.
  
   In the second one strapply is like apply where the object to be
 worked
   on is the first argument (array for apply, string for strapply) the
   second modifies it (which dimension for apply, regular expression for
   strapply) and the last is a function which acts on each value
   (typically each row or column for apply and each match for strapply).
   In this case we use c as our function to just return all the results.
   They are returned in a list with one component per string but here
   test is just a single string so we get a list one long and we ask for
   the contents of the first component using [[1]].
  
   # 1 - sub
   sub(.*(\\d{5}).*, \\1, test)
  
   # 2 - strapply - see http://gsubfn.googlecode.com
   library(gsubfn)
   strapply(test, \\d{5}, c)[[1]]
  
  
  
   On Wed, May 5, 2010 at 5:13 PM, steven mosher 
 mosherste...@gmail.com
   wrote:
Given a text like
   
I want to be able to extract a matched regular expression from a
piece
of
text.
   
this apparently works, but is pretty ugly
# some html
   
   
   
 test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
# a pattern to extract 5 digits
pattern-[0-9]{5}
# regexpr returns a start point[1] and an attribute match.length
attr(,match.length)
# get the substring from the start point to the stop point.. where
stop
=
start +length-1
   
   
   
   
 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
answer
[1] 88958
   
I tried using sub(pattern, replacement, x )  with a regexp that
captured
the
group. I'd found an example of this in the mails
but it didnt seem to work..
  
  
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

Yes, you could bring it up on the R-sig-mac or file a bug report.

On Wed, May 5, 2010 at 10:11 PM, steven mosher mosherste...@gmail.com wrote:
 Thnks,
 perhaps we should report it

 On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 I am using Vista.  Another thing to try is strapply using the tcl
 engine (assuming you do have tcltk capabilities) and the R engine.  On
 Vista R 2.11.0 patched I get the same result:

  capabilities()[[tcltk]]
 [1] TRUE
  strapply(test, \\d{5}, c, engine = tcl)[[1]]
 [1] 88958
  strapply(test, \\d{5}, c, engine = R)[[1]]
 [1] 88958

 On Vista with R 2.9.2 I do get bad results:

 
  test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test)
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  sub(.*(\\d{5}).*, \\1, test, extended = TRUE)
 [1]
 /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
  R.version.string
 [1] R version 2.9.2 Patched (2009-09-08 r49647)
  win.version()
 [1] Windows Vista (build 6002) Service Pack 2


 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com
 wrote:
  Hmm.
  I have R11 just downloaded fresh.
  I'll reload a new session..and revert. I will note that I've had trouble
  with \\d
  which is why I was using [0-9]
  MAC here.
 
  On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck
  ggrothendi...@gmail.com
  wrote:
 
  That's not what I get:
 
  
  
   test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   sub(.*(\\d{5}).*, \\1, test)
  [1] 88958
   R.version.string
  [1] R version 2.10.1 (2009-12-14)
 
  I also got the above in R 2.11.0 patched as well.
 
 
  On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com
  wrote:
    test
   [1]
  
  
   /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
   sub(.*(\\d{5}).*, \\1, test)
   [1] /th
   sub(.*([0-9]{5}).*,\\1,test)
   [1] 88958
  
  
   I think the / in  the source throws something off.
   as the group capture appears to not be working, except the bracket
   version
   it did.
  
   On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
   ggrothendi...@gmail.com
   wrote:
  
   Here are two ways to extract 5 digits.
  
   In the first one \\1 refers to the portion matched between the
   parentheses in the regular expression.
  
   In the second one strapply is like apply where the object to be
   worked
   on is the first argument (array for apply, string for strapply) the
   second modifies it (which dimension for apply, regular expression
   for
   strapply) and the last is a function which acts on each value
   (typically each row or column for apply and each match for
   strapply).
   In this case we use c as our function to just return all the
   results.
   They are returned in a list with one component per string but here
   test is just a single string so we get a list one long and we ask
   for
   the contents of the first component using [[1]].
  
   # 1 - sub
   sub(.*(\\d{5}).*, \\1, test)
  
   # 2 - strapply - see http://gsubfn.googlecode.com
   library(gsubfn)
   strapply(test, \\d{5}, c)[[1]]
  
  
  
   On Wed, May 5, 2010 at 5:13 PM, steven mosher
   mosherste...@gmail.com
   wrote:
Given a text like
   
I want to be able to extract a matched regular expression from a
piece
of
text.
   
this apparently works, but is pretty ugly
# some html
   
   
   
test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th
# a pattern to extract 5 digits
pattern-[0-9]{5}
# regexpr returns a start point[1] and an attribute match.length
attr(,match.length)
# get the substring from the start point to the stop point.. where
stop
=
start +length-1
   
   
   
   
answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1)
answer
[1] 88958
   
I tried using sub(pattern, replacement, x )  with a regexp that
captured
the
group. I'd found an example of this in the mails
but it didnt seem to work..
  
  
 
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

9 matches

Site Navigation

Mail list logo

Footer information