subject:"\[R\] extracting a matched string using regexpr"

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

Yes, you could bring it up on the R-sig-mac or file a bug report.

On Wed, May 5, 2010 at 10:11 PM, steven mosher  wrote:
> Thnks,
> perhaps we should report it
>
> On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck 
> wrote:
>>
>> I am using Vista.  Another thing to try is strapply using the tcl
>> engine (assuming you do have tcltk capabilities) and the R engine.  On
>> Vista R 2.11.0 patched I get the same result:
>>
>> > capabilities()[["tcltk"]]
>> [1] TRUE
>> > strapply(test, "\\d{5}", c, engine = "tcl")[[1]]
>> [1] "88958"
>> > strapply(test, "\\d{5}", c, engine = "R")[[1]]
>> [1] "88958"
>>
>> On Vista with R 2.9.2 I do get bad results:
>>
>> >
>> > test<-"88958Abcdsef67.8S68.9\nW26m"
>> > sub(".*(\\d{5}).*", "\\1", test)
>> [1]
>> "88958Abcdsef67.8S68.9\nW26m"
>> > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE)
>> [1]
>> "88958Abcdsef67.8S68.9\nW26m"
>> > R.version.string
>> [1] "R version 2.9.2 Patched (2009-09-08 r49647)"
>> > win.version()
>> [1] "Windows Vista (build 6002) Service Pack 2"
>>
>>
>> On Wed, May 5, 2010 at 6:20 PM, steven mosher 
>> wrote:
>> > Hmm.
>> > I have R11 just downloaded fresh.
>> > I'll reload a new session..and revert. I will note that I've had trouble
>> > with \\d
>> > which is why I was using [0-9]
>> > MAC here.
>> >
>> > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck
>> > 
>> > wrote:
>> >>
>> >> That's not what I get:
>> >>
>> >> >
>> >> >
>> >> > test<-"88958Abcdsef67.8S68.9\nW26m"
>> >> > sub(".*(\\d{5}).*", "\\1", test)
>> >> [1] "88958"
>> >> > R.version.string
>> >> [1] "R version 2.10.1 (2009-12-14)"
>> >>
>> >> I also got the above in R 2.11.0 patched as well.
>> >>
>> >>
>> >> On Wed, May 5, 2010 at 5:55 PM, steven mosher 
>> >> wrote:
>> >> >  test
>> >> > [1]
>> >> >
>> >> >
>> >> > "88958Abcdsef67.8S68.9\nW26m"
>> >> >> sub(".*(\\d{5}).*", "\\1", test)
>> >> > [1] ""
>> >> >> sub(".*([0-9]{5}).*","\\1",test)
>> >> > [1] "88958"
>> >> >>
>> >> >
>> >> > I think the "> >> > as the group capture appears to not be working, except the bracket
>> >> > version
>> >> > it did.
>> >> >
>> >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> Here are two ways to extract 5 digits.
>> >> >>
>> >> >> In the first one \\1 refers to the portion matched between the
>> >> >> parentheses in the regular expression.
>> >> >>
>> >> >> In the second one strapply is like apply where the object to be
>> >> >> worked
>> >> >> on is the first argument (array for apply, string for strapply) the
>> >> >> second modifies it (which dimension for apply, regular expression
>> >> >> for
>> >> >> strapply) and the last is a function which acts on each value
>> >> >> (typically each row or column for apply and each match for
>> >> >> strapply).
>> >> >> In this case we use c as our function to just return all the
>> >> >> results.
>> >> >> They are returned in a list with one component per string but here
>> >> >> test is just a single string so we get a list one long and we ask
>> >> >> for
>> >> >> the contents of the first component using [[1]].
>> >> >>
>> >> >> # 1 - sub
>> >> >> sub(".*(\\d{5}).*", "\\1", test)
>> >> >>
>> >> >> # 2 - strapply - see http://gsubfn.googlecode.com
>> >> >> library(gsubfn)
>> >> >> strapply(test, "\\d{5}", c)[[1]]
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher
>> >> >> 
>> >> >> wrote:
>> >> >> > Given a text like
>> >> >> >
>> >> >> > I want to be able to extract a matched regular expression from a
>> >> >> > piece
>> >> >> > of
>> >> >> > text.
>> >> >> >
>> >> >> > this apparently works, but is pretty ugly
>> >> >> > # some html
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > test<-"88958Abcdsef67.8S68.9\nW26m"
>> >> >> > # a pattern to extract 5 digits
>> >> >> >> pattern<-"[0-9]{5}"
>> >> >> > # regexpr returns a start point[1] and an attribute "match.length"
>> >> >> > attr(,"match.length)
>> >> >> > # get the substring from the start point to the stop point.. where
>> >> >> > stop
>> >> >> > =
>> >> >> > start +length-1
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>> >> >> >> answer
>> >> >> > [1] "88958"
>> >> >> >
>> >> >> > I tried using sub(pattern, replacement, x )  with a regexp that
>> >> >> > captured
>> >> >> > the
>> >> >> > group. I'd found an example of this in the mails
>> >> >> > but it didnt seem to work..
>> >> >
>> >> >
>> >
>> >
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Thnks,

perhaps we should report it

On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck
wrote:

> I am using Vista.  Another thing to try is strapply using the tcl
> engine (assuming you do have tcltk capabilities) and the R engine.  On
> Vista R 2.11.0 patched I get the same result:
>
> > capabilities()[["tcltk"]]
> [1] TRUE
> > strapply(test, "\\d{5}", c, engine = "tcl")[[1]]
> [1] "88958"
> > strapply(test, "\\d{5}", c, engine = "R")[[1]]
> [1] "88958"
>
> On Vista with R 2.9.2 I do get bad results:
>
> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> > sub(".*(\\d{5}).*", "\\1", test)
> [1]
> "88958Abcdsef67.8S68.9\nW26m"
> > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE)
> [1]
> "88958Abcdsef67.8S68.9\nW26m"
> > R.version.string
> [1] "R version 2.9.2 Patched (2009-09-08 r49647)"
> > win.version()
> [1] "Windows Vista (build 6002) Service Pack 2"
>
>
> On Wed, May 5, 2010 at 6:20 PM, steven mosher 
> wrote:
> > Hmm.
> > I have R11 just downloaded fresh.
> > I'll reload a new session..and revert. I will note that I've had trouble
> > with \\d
> > which is why I was using [0-9]
> > MAC here.
> >
> > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck <
> ggrothendi...@gmail.com>
> > wrote:
> >>
> >> That's not what I get:
> >>
> >> >
> >> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> >> > sub(".*(\\d{5}).*", "\\1", test)
> >> [1] "88958"
> >> > R.version.string
> >> [1] "R version 2.10.1 (2009-12-14)"
> >>
> >> I also got the above in R 2.11.0 patched as well.
> >>
> >>
> >> On Wed, May 5, 2010 at 5:55 PM, steven mosher 
> >> wrote:
> >> >  test
> >> > [1]
> >> >
> >> >
> "88958Abcdsef67.8S68.9\nW26m"
> >> >> sub(".*(\\d{5}).*", "\\1", test)
> >> > [1] ""
> >> >> sub(".*([0-9]{5}).*","\\1",test)
> >> > [1] "88958"
> >> >>
> >> >
> >> > I think the " >> > as the group capture appears to not be working, except the bracket
> >> > version
> >> > it did.
> >> >
> >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
> >> > 
> >> > wrote:
> >> >>
> >> >> Here are two ways to extract 5 digits.
> >> >>
> >> >> In the first one \\1 refers to the portion matched between the
> >> >> parentheses in the regular expression.
> >> >>
> >> >> In the second one strapply is like apply where the object to be
> worked
> >> >> on is the first argument (array for apply, string for strapply) the
> >> >> second modifies it (which dimension for apply, regular expression for
> >> >> strapply) and the last is a function which acts on each value
> >> >> (typically each row or column for apply and each match for strapply).
> >> >> In this case we use c as our function to just return all the results.
> >> >> They are returned in a list with one component per string but here
> >> >> test is just a single string so we get a list one long and we ask for
> >> >> the contents of the first component using [[1]].
> >> >>
> >> >> # 1 - sub
> >> >> sub(".*(\\d{5}).*", "\\1", test)
> >> >>
> >> >> # 2 - strapply - see http://gsubfn.googlecode.com
> >> >> library(gsubfn)
> >> >> strapply(test, "\\d{5}", c)[[1]]
> >> >>
> >> >>
> >> >>
> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher <
> mosherste...@gmail.com>
> >> >> wrote:
> >> >> > Given a text like
> >> >> >
> >> >> > I want to be able to extract a matched regular expression from a
> >> >> > piece
> >> >> > of
> >> >> > text.
> >> >> >
> >> >> > this apparently works, but is pretty ugly
> >> >> > # some html
> >> >> >
> >> >> >
> >> >> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> >> >> > # a pattern to extract 5 digits
> >> >> >> pattern<-"[0-9]{5}"
> >> >> > # regexpr returns a start point[1] and an attribute "match.length"
> >> >> > attr(,"match.length)
> >> >> > # get the substring from the start point to the stop point.. where
> >> >> > stop
> >> >> > =
> >> >> > start +length-1
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
> >> >> >> answer
> >> >> > [1] "88958"
> >> >> >
> >> >> > I tried using sub(pattern, replacement, x )  with a regexp that
> >> >> > captured
> >> >> > the
> >> >> > group. I'd found an example of this in the mails
> >> >> > but it didnt seem to work..
> >> >
> >> >
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

I am using Vista.  Another thing to try is strapply using the tcl
engine (assuming you do have tcltk capabilities) and the R engine.  On
Vista R 2.11.0 patched I get the same result:

> capabilities()[["tcltk"]]
[1] TRUE
> strapply(test, "\\d{5}", c, engine = "tcl")[[1]]
[1] "88958"
> strapply(test, "\\d{5}", c, engine = "R")[[1]]
[1] "88958"

On Vista with R 2.9.2 I do get bad results:

> test<-"88958Abcdsef67.8S68.9\nW26m"
> sub(".*(\\d{5}).*", "\\1", test)
[1] 
"88958Abcdsef67.8S68.9\nW26m"
> sub(".*(\\d{5}).*", "\\1", test, extended = TRUE)
[1] 
"88958Abcdsef67.8S68.9\nW26m"
> R.version.string
[1] "R version 2.9.2 Patched (2009-09-08 r49647)"
> win.version()
[1] "Windows Vista (build 6002) Service Pack 2"


On Wed, May 5, 2010 at 6:20 PM, steven mosher  wrote:
> Hmm.
> I have R11 just downloaded fresh.
> I'll reload a new session..and revert. I will note that I've had trouble
> with \\d
> which is why I was using [0-9]
> MAC here.
>
> On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck 
> wrote:
>>
>> That's not what I get:
>>
>> >
>> > test<-"88958Abcdsef67.8S68.9\nW26m"
>> > sub(".*(\\d{5}).*", "\\1", test)
>> [1] "88958"
>> > R.version.string
>> [1] "R version 2.10.1 (2009-12-14)"
>>
>> I also got the above in R 2.11.0 patched as well.
>>
>>
>> On Wed, May 5, 2010 at 5:55 PM, steven mosher 
>> wrote:
>> >  test
>> > [1]
>> >
>> > "88958Abcdsef67.8S68.9\nW26m"
>> >> sub(".*(\\d{5}).*", "\\1", test)
>> > [1] ""
>> >> sub(".*([0-9]{5}).*","\\1",test)
>> > [1] "88958"
>> >>
>> >
>> > I think the "> > as the group capture appears to not be working, except the bracket
>> > version
>> > it did.
>> >
>> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
>> > 
>> > wrote:
>> >>
>> >> Here are two ways to extract 5 digits.
>> >>
>> >> In the first one \\1 refers to the portion matched between the
>> >> parentheses in the regular expression.
>> >>
>> >> In the second one strapply is like apply where the object to be worked
>> >> on is the first argument (array for apply, string for strapply) the
>> >> second modifies it (which dimension for apply, regular expression for
>> >> strapply) and the last is a function which acts on each value
>> >> (typically each row or column for apply and each match for strapply).
>> >> In this case we use c as our function to just return all the results.
>> >> They are returned in a list with one component per string but here
>> >> test is just a single string so we get a list one long and we ask for
>> >> the contents of the first component using [[1]].
>> >>
>> >> # 1 - sub
>> >> sub(".*(\\d{5}).*", "\\1", test)
>> >>
>> >> # 2 - strapply - see http://gsubfn.googlecode.com
>> >> library(gsubfn)
>> >> strapply(test, "\\d{5}", c)[[1]]
>> >>
>> >>
>> >>
>> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
>> >> wrote:
>> >> > Given a text like
>> >> >
>> >> > I want to be able to extract a matched regular expression from a
>> >> > piece
>> >> > of
>> >> > text.
>> >> >
>> >> > this apparently works, but is pretty ugly
>> >> > # some html
>> >> >
>> >> >
>> >> > test<-"88958Abcdsef67.8S68.9\nW26m"
>> >> > # a pattern to extract 5 digits
>> >> >> pattern<-"[0-9]{5}"
>> >> > # regexpr returns a start point[1] and an attribute "match.length"
>> >> > attr(,"match.length)
>> >> > # get the substring from the start point to the stop point.. where
>> >> > stop
>> >> > =
>> >> > start +length-1
>> >> >>
>> >> >
>> >> >
>> >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>> >> >> answer
>> >> > [1] "88958"
>> >> >
>> >> > I tried using sub(pattern, replacement, x )  with a regexp that
>> >> > captured
>> >> > the
>> >> > group. I'd found an example of this in the mails
>> >> > but it didnt seem to work..
>> >
>> >
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

with a fresh restart


test<-"88958Abcdsef67.8S68.9\nW26m"
>
> test
[1]
"88958Abcdsef67.8S68.9\nW26m"
> sub(".*(\\d{5}).*", "\\1", test)
[1] ""
> sub(".*([0-9]{5}).*", "\\1", test)
[1] "88958"
> test2<-"aaa12345W"
> sub(".*(\\d{5}).*", "\\1", test2)
[1] "W"
>
> sub(".*(\\d{5}).*", "\\1", test2)
[1] "W"
> sub(".*([0-9]{5}).*", "\\1", test2)
[1] "12345"


Steve.



On Wed, May 5, 2010 at 3:20 PM, David Winsemius wrote:

>
> On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote:
>
>  Here are two ways to extract 5 digits.
>>
>> In the first one \\1 refers to the portion matched between the
>> parentheses in the regular expression.
>>
>> In the second one strapply is like apply where the object to be worked
>> on is the first argument (array for apply, string for strapply) the
>> second modifies it (which dimension for apply, regular expression for
>> strapply) and the last is a function which acts on each value
>> (typically each row or column for apply and each match for strapply).
>> In this case we use c as our function to just return all the results.
>> They are returned in a list with one component per string but here
>> test is just a single string so we get a list one long and we ask for
>> the contents of the first component using [[1]].
>>
>> # 1 - sub
>> sub(".*(\\d{5}).*", "\\1", test)
>>
> > test
> [1]
> "88958Abcdsef67.8S68.9\nW26m"
>
> I get different results than I expected given that "\\d" should be
> synonymous with "[0-9]":
>
>
> > sub(".*([0-9]{5}).*", "\\1", test)
> [1] "88958"
>
> > sub(".*(\\d{5}).*", "\\1", test)
> [1] ""
>
> --
> David.
>
>>
>> # 2 - strapply - see http://gsubfn.googlecode.com
>> library(gsubfn)
>> strapply(test, "\\d{5}", c)[[1]]
>>
>>
>>
>> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
>> wrote:
>>
>>> Given a text like
>>>
>>> I want to be able to extract a matched regular expression from a piece of
>>> text.
>>>
>>> this apparently works, but is pretty ugly
>>> # some html
>>>
>>> test<-"88958Abcdsef67.8S68.9\nW26m"
>>> # a pattern to extract 5 digits
>>>
 pattern<-"[0-9]{5}"

>>> # regexpr returns a start point[1] and an attribute "match.length"
>>> attr(,"match.length)
>>> # get the substring from the start point to the stop point.. where stop =
>>> start +length-1
>>>

 answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>>>
 answer

>>> [1] "88958"
>>>
>>> I tried using sub(pattern, replacement, x )  with a regexp that captured
>>> the
>>> group. I'd found an example of this in the mails
>>> but it didnt seem to work..
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

MAC bug?

On Wed, May 5, 2010 at 3:20 PM, David Winsemius wrote:

>
> On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote:
>
>  Here are two ways to extract 5 digits.
>>
>> In the first one \\1 refers to the portion matched between the
>> parentheses in the regular expression.
>>
>> In the second one strapply is like apply where the object to be worked
>> on is the first argument (array for apply, string for strapply) the
>> second modifies it (which dimension for apply, regular expression for
>> strapply) and the last is a function which acts on each value
>> (typically each row or column for apply and each match for strapply).
>> In this case we use c as our function to just return all the results.
>> They are returned in a list with one component per string but here
>> test is just a single string so we get a list one long and we ask for
>> the contents of the first component using [[1]].
>>
>> # 1 - sub
>> sub(".*(\\d{5}).*", "\\1", test)
>>
> > test
> [1]
> "88958Abcdsef67.8S68.9\nW26m"
>
> I get different results than I expected given that "\\d" should be
> synonymous with "[0-9]":
>
>
> > sub(".*([0-9]{5}).*", "\\1", test)
> [1] "88958"
>
> > sub(".*(\\d{5}).*", "\\1", test)
> [1] ""
>
> --
> David.
>
>>
>> # 2 - strapply - see http://gsubfn.googlecode.com
>> library(gsubfn)
>> strapply(test, "\\d{5}", c)[[1]]
>>
>>
>>
>> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
>> wrote:
>>
>>> Given a text like
>>>
>>> I want to be able to extract a matched regular expression from a piece of
>>> text.
>>>
>>> this apparently works, but is pretty ugly
>>> # some html
>>>
>>> test<-"88958Abcdsef67.8S68.9\nW26m"
>>> # a pattern to extract 5 digits
>>>
 pattern<-"[0-9]{5}"

>>> # regexpr returns a start point[1] and an attribute "match.length"
>>> attr(,"match.length)
>>> # get the substring from the start point to the stop point.. where stop =
>>> start +length-1
>>>

 answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>>>
 answer

>>> [1] "88958"
>>>
>>> I tried using sub(pattern, replacement, x )  with a regexp that captured
>>> the
>>> group. I'd found an example of this in the mails
>>> but it didnt seem to work..
>>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Hmm.

I have R11 just downloaded fresh.

I'll reload a new session..and revert. I will note that I've had trouble
with \\d
which is why I was using [0-9]

MAC here.

On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck
wrote:

> That's not what I get:
>
> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> > sub(".*(\\d{5}).*", "\\1", test)
> [1] "88958"
> > R.version.string
> [1] "R version 2.10.1 (2009-12-14)"
>
> I also got the above in R 2.11.0 patched as well.
>
>
> On Wed, May 5, 2010 at 5:55 PM, steven mosher 
> wrote:
> >  test
> > [1]
> >
> "88958Abcdsef67.8S68.9\nW26m"
> >> sub(".*(\\d{5}).*", "\\1", test)
> > [1] ""
> >> sub(".*([0-9]{5}).*","\\1",test)
> > [1] "88958"
> >>
> >
> > I think the " > as the group capture appears to not be working, except the bracket
> version
> > it did.
> >
> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck <
> ggrothendi...@gmail.com>
> > wrote:
> >>
> >> Here are two ways to extract 5 digits.
> >>
> >> In the first one \\1 refers to the portion matched between the
> >> parentheses in the regular expression.
> >>
> >> In the second one strapply is like apply where the object to be worked
> >> on is the first argument (array for apply, string for strapply) the
> >> second modifies it (which dimension for apply, regular expression for
> >> strapply) and the last is a function which acts on each value
> >> (typically each row or column for apply and each match for strapply).
> >> In this case we use c as our function to just return all the results.
> >> They are returned in a list with one component per string but here
> >> test is just a single string so we get a list one long and we ask for
> >> the contents of the first component using [[1]].
> >>
> >> # 1 - sub
> >> sub(".*(\\d{5}).*", "\\1", test)
> >>
> >> # 2 - strapply - see http://gsubfn.googlecode.com
> >> library(gsubfn)
> >> strapply(test, "\\d{5}", c)[[1]]
> >>
> >>
> >>
> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
> >> wrote:
> >> > Given a text like
> >> >
> >> > I want to be able to extract a matched regular expression from a piece
> >> > of
> >> > text.
> >> >
> >> > this apparently works, but is pretty ugly
> >> > # some html
> >> >
> >> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> >> > # a pattern to extract 5 digits
> >> >> pattern<-"[0-9]{5}"
> >> > # regexpr returns a start point[1] and an attribute "match.length"
> >> > attr(,"match.length)
> >> > # get the substring from the start point to the stop point.. where
> stop
> >> > =
> >> > start +length-1
> >> >>
> >> >
> >> >
> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
> >> >> answer
> >> > [1] "88958"
> >> >
> >> > I tried using sub(pattern, replacement, x )  with a regexp that
> captured
> >> > the
> >> > group. I'd found an example of this in the mails
> >> > but it didnt seem to work..
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread David Winsemius

On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote:

Here are two ways to extract 5 digits.

In the first one \\1 refers to the portion matched between the
parentheses in the regular expression.

In the second one strapply is like apply where the object to be worked
on is the first argument (array for apply, string for strapply) the
second modifies it (which dimension for apply, regular expression for
strapply) and the last is a function which acts on each value
(typically each row or column for apply and each match for strapply).
In this case we use c as our function to just return all the results.
They are returned in a list with one component per string but here
test is just a single string so we get a list one long and we ask for
the contents of the first component using [[1]].

# 1 - sub
sub(".*(\\d{5}).*", "\\1", test)

> test
[1] "88958Abcdsef67.8S68.9\nWth>26m"

I get different results than I expected given that "\\d" should be  
synonymous with "[0-9]":

> sub(".*([0-9]{5}).*", "\\1", test)
[1] "88958"

> sub(".*(\\d{5}).*", "\\1", test)
[1] ""

--
David.

# 2 - strapply - see http://gsubfn.googlecode.com
library(gsubfn)
strapply(test, "\\d{5}", c)[[1]]

On Wed, May 5, 2010 at 5:13 PM, steven mosher  
 wrote:

Given a text like

I want to be able to extract a matched regular expression from a  
piece of

text.

this apparently works, but is pretty ugly
# some html
test<-"88958Abcdsef67.8Sth>68.9\nW26m"

# a pattern to extract 5 digits

pattern<-"[0-9]{5}"

# regexpr returns a start point[1] and an attribute "match.length"
attr(,"match.length)
# get the substring from the start point to the stop point.. where  
stop =

start +length-1

answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test) 
[1]+attr(regexpr(pattern,test),"match.length")-1)

answer

[1] "88958"

I tried using sub(pattern, replacement, x )  with a regexp that  
captured the

group. I'd found an example of this in the mails
but it didnt seem to work..

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

That's not what I get:

> test<-"88958Abcdsef67.8S68.9\nW26m"
> sub(".*(\\d{5}).*", "\\1", test)
[1] "88958"
> R.version.string
[1] "R version 2.10.1 (2009-12-14)"

I also got the above in R 2.11.0 patched as well.


On Wed, May 5, 2010 at 5:55 PM, steven mosher  wrote:
>  test
> [1]
> "88958Abcdsef67.8S68.9\nW26m"
>> sub(".*(\\d{5}).*", "\\1", test)
> [1] ""
>> sub(".*([0-9]{5}).*","\\1",test)
> [1] "88958"
>>
>
> I think the " as the group capture appears to not be working, except the bracket version
> it did.
>
> On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck 
> wrote:
>>
>> Here are two ways to extract 5 digits.
>>
>> In the first one \\1 refers to the portion matched between the
>> parentheses in the regular expression.
>>
>> In the second one strapply is like apply where the object to be worked
>> on is the first argument (array for apply, string for strapply) the
>> second modifies it (which dimension for apply, regular expression for
>> strapply) and the last is a function which acts on each value
>> (typically each row or column for apply and each match for strapply).
>> In this case we use c as our function to just return all the results.
>> They are returned in a list with one component per string but here
>> test is just a single string so we get a list one long and we ask for
>> the contents of the first component using [[1]].
>>
>> # 1 - sub
>> sub(".*(\\d{5}).*", "\\1", test)
>>
>> # 2 - strapply - see http://gsubfn.googlecode.com
>> library(gsubfn)
>> strapply(test, "\\d{5}", c)[[1]]
>>
>>
>>
>> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
>> wrote:
>> > Given a text like
>> >
>> > I want to be able to extract a matched regular expression from a piece
>> > of
>> > text.
>> >
>> > this apparently works, but is pretty ugly
>> > # some html
>> >
>> > test<-"88958Abcdsef67.8S68.9\nW26m"
>> > # a pattern to extract 5 digits
>> >> pattern<-"[0-9]{5}"
>> > # regexpr returns a start point[1] and an attribute "match.length"
>> > attr(,"match.length)
>> > # get the substring from the start point to the stop point.. where stop
>> > =
>> > start +length-1
>> >>
>> >
>> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>> >> answer
>> > [1] "88958"
>> >
>> > I tried using sub(pattern, replacement, x )  with a regexp that captured
>> > the
>> > group. I'd found an example of this in the mails
>> > but it didnt seem to work..
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

 test
[1]
"88958Abcdsef67.8S68.9\nW26m"
> sub(".*(\\d{5}).*", "\\1", test)
[1] ""
> sub(".*([0-9]{5}).*","\\1",test)
[1] "88958"
>


I think the "wrote:

> Here are two ways to extract 5 digits.
>
> In the first one \\1 refers to the portion matched between the
> parentheses in the regular expression.
>
> In the second one strapply is like apply where the object to be worked
> on is the first argument (array for apply, string for strapply) the
> second modifies it (which dimension for apply, regular expression for
> strapply) and the last is a function which acts on each value
> (typically each row or column for apply and each match for strapply).
> In this case we use c as our function to just return all the results.
> They are returned in a list with one component per string but here
> test is just a single string so we get a list one long and we ask for
> the contents of the first component using [[1]].
>
> # 1 - sub
> sub(".*(\\d{5}).*", "\\1", test)
>
> # 2 - strapply - see http://gsubfn.googlecode.com
> library(gsubfn)
> strapply(test, "\\d{5}", c)[[1]]
>
>
>
> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
> wrote:
> > Given a text like
> >
> > I want to be able to extract a matched regular expression from a piece of
> > text.
> >
> > this apparently works, but is pretty ugly
> > # some html
> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> > # a pattern to extract 5 digits
> >> pattern<-"[0-9]{5}"
> > # regexpr returns a start point[1] and an attribute "match.length"
> > attr(,"match.length)
> > # get the substring from the start point to the stop point.. where stop =
> > start +length-1
> >>
> >
> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
> >> answer
> > [1] "88958"
> >
> > I tried using sub(pattern, replacement, x )  with a regexp that captured
> the
> > group. I'd found an example of this in the mails
> > but it didnt seem to work..
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Thanks I was looking at that package and reading your mails in the archive.
I think my tiny mind got twisted in the regexp..

On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck
wrote:

> Here are two ways to extract 5 digits.
>
> In the first one \\1 refers to the portion matched between the
> parentheses in the regular expression.
>
> In the second one strapply is like apply where the object to be worked
> on is the first argument (array for apply, string for strapply) the
> second modifies it (which dimension for apply, regular expression for
> strapply) and the last is a function which acts on each value
> (typically each row or column for apply and each match for strapply).
> In this case we use c as our function to just return all the results.
> They are returned in a list with one component per string but here
> test is just a single string so we get a list one long and we ask for
> the contents of the first component using [[1]].
>
> # 1 - sub
> sub(".*(\\d{5}).*", "\\1", test)
>
> # 2 - strapply - see http://gsubfn.googlecode.com
> library(gsubfn)
> strapply(test, "\\d{5}", c)[[1]]
>
>
>
> On Wed, May 5, 2010 at 5:13 PM, steven mosher 
> wrote:
> > Given a text like
> >
> > I want to be able to extract a matched regular expression from a piece of
> > text.
> >
> > this apparently works, but is pretty ugly
> > # some html
> >
> test<-"88958Abcdsef67.8S68.9\nW26m"
> > # a pattern to extract 5 digits
> >> pattern<-"[0-9]{5}"
> > # regexpr returns a start point[1] and an attribute "match.length"
> > attr(,"match.length)
> > # get the substring from the start point to the stop point.. where stop =
> > start +length-1
> >>
> >
> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
> >> answer
> > [1] "88958"
> >
> > I tried using sub(pattern, replacement, x )  with a regexp that captured
> the
> > group. I'd found an example of this in the mails
> > but it didnt seem to work..
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

2010-05-05 Thread Gabor Grothendieck

Here are two ways to extract 5 digits.

In the first one \\1 refers to the portion matched between the
parentheses in the regular expression.

In the second one strapply is like apply where the object to be worked
on is the first argument (array for apply, string for strapply) the
second modifies it (which dimension for apply, regular expression for
strapply) and the last is a function which acts on each value
(typically each row or column for apply and each match for strapply).
In this case we use c as our function to just return all the results.
They are returned in a list with one component per string but here
test is just a single string so we get a list one long and we ask for
the contents of the first component using [[1]].

# 1 - sub
sub(".*(\\d{5}).*", "\\1", test)

# 2 - strapply - see http://gsubfn.googlecode.com
library(gsubfn)
strapply(test, "\\d{5}", c)[[1]]

On Wed, May 5, 2010 at 5:13 PM, steven mosher  wrote:
> Given a text like
>
> I want to be able to extract a matched regular expression from a piece of
> text.
>
> this apparently works, but is pretty ugly
> # some html
> test<-"88958Abcdsef67.8S68.9\nW26m"
> # a pattern to extract 5 digits
>> pattern<-"[0-9]{5}"
> # regexpr returns a start point[1] and an attribute "match.length"
> attr(,"match.length)
> # get the substring from the start point to the stop point.. where stop =
> start +length-1
>>
> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>> answer
> [1] "88958"
>
> I tried using sub(pattern, replacement, x )  with a regexp that captured the
> group. I'd found an example of this in the mails
> but it didnt seem to work..

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] extracting a matched string using regexpr

2010-05-05 Thread steven mosher

Given a text like

I want to be able to extract a matched regular expression from a piece of
text.

this apparently works, but is pretty ugly
# some html
test<-"88958Abcdsef67.8S68.9\nW26m"
# a pattern to extract 5 digits
> pattern<-"[0-9]{5}"
# regexpr returns a start point[1] and an attribute "match.length"
attr(,"match.length)
# get the substring from the start point to the stop point.. where stop =
start +length-1
>
answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
> answer
[1] "88958"

I tried using sub(pattern, replacement, x )  with a regexp that captured the
group. I'd found an example of this in the mails
but it didnt seem to work..

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

Re: [R] extracting a matched string using regexpr

[R] extracting a matched string using regexpr

12 matches

Site Navigation

Mail list logo

Footer information