Re: [R] extracting a matched string using regexpr
Yes, you could bring it up on the R-sig-mac or file a bug report. On Wed, May 5, 2010 at 10:11 PM, steven mosher wrote: > Thnks, > perhaps we should report it > > On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck > wrote: >> >> I am using Vista. Another thing to try is strapply using the tcl >> engine (assuming you do have tcltk capabilities) and the R engine. On >> Vista R 2.11.0 patched I get the same result: >> >> > capabilities()[["tcltk"]] >> [1] TRUE >> > strapply(test, "\\d{5}", c, engine = "tcl")[[1]] >> [1] "88958" >> > strapply(test, "\\d{5}", c, engine = "R")[[1]] >> [1] "88958" >> >> On Vista with R 2.9.2 I do get bad results: >> >> > >> > test<-"88958Abcdsef67.8S68.9\nW26m" >> > sub(".*(\\d{5}).*", "\\1", test) >> [1] >> "88958Abcdsef67.8S68.9\nW26m" >> > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE) >> [1] >> "88958Abcdsef67.8S68.9\nW26m" >> > R.version.string >> [1] "R version 2.9.2 Patched (2009-09-08 r49647)" >> > win.version() >> [1] "Windows Vista (build 6002) Service Pack 2" >> >> >> On Wed, May 5, 2010 at 6:20 PM, steven mosher >> wrote: >> > Hmm. >> > I have R11 just downloaded fresh. >> > I'll reload a new session..and revert. I will note that I've had trouble >> > with \\d >> > which is why I was using [0-9] >> > MAC here. >> > >> > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck >> > >> > wrote: >> >> >> >> That's not what I get: >> >> >> >> > >> >> > >> >> > test<-"88958Abcdsef67.8S68.9\nW26m" >> >> > sub(".*(\\d{5}).*", "\\1", test) >> >> [1] "88958" >> >> > R.version.string >> >> [1] "R version 2.10.1 (2009-12-14)" >> >> >> >> I also got the above in R 2.11.0 patched as well. >> >> >> >> >> >> On Wed, May 5, 2010 at 5:55 PM, steven mosher >> >> wrote: >> >> > test >> >> > [1] >> >> > >> >> > >> >> > "88958Abcdsef67.8S68.9\nW26m" >> >> >> sub(".*(\\d{5}).*", "\\1", test) >> >> > [1] "" >> >> >> sub(".*([0-9]{5}).*","\\1",test) >> >> > [1] "88958" >> >> >> >> >> > >> >> > I think the "> >> > as the group capture appears to not be working, except the bracket >> >> > version >> >> > it did. >> >> > >> >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck >> >> > >> >> > wrote: >> >> >> >> >> >> Here are two ways to extract 5 digits. >> >> >> >> >> >> In the first one \\1 refers to the portion matched between the >> >> >> parentheses in the regular expression. >> >> >> >> >> >> In the second one strapply is like apply where the object to be >> >> >> worked >> >> >> on is the first argument (array for apply, string for strapply) the >> >> >> second modifies it (which dimension for apply, regular expression >> >> >> for >> >> >> strapply) and the last is a function which acts on each value >> >> >> (typically each row or column for apply and each match for >> >> >> strapply). >> >> >> In this case we use c as our function to just return all the >> >> >> results. >> >> >> They are returned in a list with one component per string but here >> >> >> test is just a single string so we get a list one long and we ask >> >> >> for >> >> >> the contents of the first component using [[1]]. >> >> >> >> >> >> # 1 - sub >> >> >> sub(".*(\\d{5}).*", "\\1", test) >> >> >> >> >> >> # 2 - strapply - see http://gsubfn.googlecode.com >> >> >> library(gsubfn) >> >> >> strapply(test, "\\d{5}", c)[[1]] >> >> >> >> >> >> >> >> >> >> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher >> >> >> >> >> >> wrote: >> >> >> > Given a text like >> >> >> > >> >> >> > I want to be able to extract a matched regular expression from a >> >> >> > piece >> >> >> > of >> >> >> > text. >> >> >> > >> >> >> > this apparently works, but is pretty ugly >> >> >> > # some html >> >> >> > >> >> >> > >> >> >> > >> >> >> > test<-"88958Abcdsef67.8S68.9\nW26m" >> >> >> > # a pattern to extract 5 digits >> >> >> >> pattern<-"[0-9]{5}" >> >> >> > # regexpr returns a start point[1] and an attribute "match.length" >> >> >> > attr(,"match.length) >> >> >> > # get the substring from the start point to the stop point.. where >> >> >> > stop >> >> >> > = >> >> >> > start +length-1 >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >> >> >> >> answer >> >> >> > [1] "88958" >> >> >> > >> >> >> > I tried using sub(pattern, replacement, x ) with a regexp that >> >> >> > captured >> >> >> > the >> >> >> > group. I'd found an example of this in the mails >> >> >> > but it didnt seem to work.. >> >> > >> >> > >> > >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Thnks, perhaps we should report it On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck wrote: > I am using Vista. Another thing to try is strapply using the tcl > engine (assuming you do have tcltk capabilities) and the R engine. On > Vista R 2.11.0 patched I get the same result: > > > capabilities()[["tcltk"]] > [1] TRUE > > strapply(test, "\\d{5}", c, engine = "tcl")[[1]] > [1] "88958" > > strapply(test, "\\d{5}", c, engine = "R")[[1]] > [1] "88958" > > On Vista with R 2.9.2 I do get bad results: > > > > test<-"88958Abcdsef67.8S68.9\nW26m" > > sub(".*(\\d{5}).*", "\\1", test) > [1] > "88958Abcdsef67.8S68.9\nW26m" > > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE) > [1] > "88958Abcdsef67.8S68.9\nW26m" > > R.version.string > [1] "R version 2.9.2 Patched (2009-09-08 r49647)" > > win.version() > [1] "Windows Vista (build 6002) Service Pack 2" > > > On Wed, May 5, 2010 at 6:20 PM, steven mosher > wrote: > > Hmm. > > I have R11 just downloaded fresh. > > I'll reload a new session..and revert. I will note that I've had trouble > > with \\d > > which is why I was using [0-9] > > MAC here. > > > > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck < > ggrothendi...@gmail.com> > > wrote: > >> > >> That's not what I get: > >> > >> > > >> > > test<-"88958Abcdsef67.8S68.9\nW26m" > >> > sub(".*(\\d{5}).*", "\\1", test) > >> [1] "88958" > >> > R.version.string > >> [1] "R version 2.10.1 (2009-12-14)" > >> > >> I also got the above in R 2.11.0 patched as well. > >> > >> > >> On Wed, May 5, 2010 at 5:55 PM, steven mosher > >> wrote: > >> > test > >> > [1] > >> > > >> > > "88958Abcdsef67.8S68.9\nW26m" > >> >> sub(".*(\\d{5}).*", "\\1", test) > >> > [1] "" > >> >> sub(".*([0-9]{5}).*","\\1",test) > >> > [1] "88958" > >> >> > >> > > >> > I think the " >> > as the group capture appears to not be working, except the bracket > >> > version > >> > it did. > >> > > >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck > >> > > >> > wrote: > >> >> > >> >> Here are two ways to extract 5 digits. > >> >> > >> >> In the first one \\1 refers to the portion matched between the > >> >> parentheses in the regular expression. > >> >> > >> >> In the second one strapply is like apply where the object to be > worked > >> >> on is the first argument (array for apply, string for strapply) the > >> >> second modifies it (which dimension for apply, regular expression for > >> >> strapply) and the last is a function which acts on each value > >> >> (typically each row or column for apply and each match for strapply). > >> >> In this case we use c as our function to just return all the results. > >> >> They are returned in a list with one component per string but here > >> >> test is just a single string so we get a list one long and we ask for > >> >> the contents of the first component using [[1]]. > >> >> > >> >> # 1 - sub > >> >> sub(".*(\\d{5}).*", "\\1", test) > >> >> > >> >> # 2 - strapply - see http://gsubfn.googlecode.com > >> >> library(gsubfn) > >> >> strapply(test, "\\d{5}", c)[[1]] > >> >> > >> >> > >> >> > >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher < > mosherste...@gmail.com> > >> >> wrote: > >> >> > Given a text like > >> >> > > >> >> > I want to be able to extract a matched regular expression from a > >> >> > piece > >> >> > of > >> >> > text. > >> >> > > >> >> > this apparently works, but is pretty ugly > >> >> > # some html > >> >> > > >> >> > > >> >> > > test<-"88958Abcdsef67.8S68.9\nW26m" > >> >> > # a pattern to extract 5 digits > >> >> >> pattern<-"[0-9]{5}" > >> >> > # regexpr returns a start point[1] and an attribute "match.length" > >> >> > attr(,"match.length) > >> >> > # get the substring from the start point to the stop point.. where > >> >> > stop > >> >> > = > >> >> > start +length-1 > >> >> >> > >> >> > > >> >> > > >> >> > > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) > >> >> >> answer > >> >> > [1] "88958" > >> >> > > >> >> > I tried using sub(pattern, replacement, x ) with a regexp that > >> >> > captured > >> >> > the > >> >> > group. I'd found an example of this in the mails > >> >> > but it didnt seem to work.. > >> > > >> > > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
I am using Vista. Another thing to try is strapply using the tcl engine (assuming you do have tcltk capabilities) and the R engine. On Vista R 2.11.0 patched I get the same result: > capabilities()[["tcltk"]] [1] TRUE > strapply(test, "\\d{5}", c, engine = "tcl")[[1]] [1] "88958" > strapply(test, "\\d{5}", c, engine = "R")[[1]] [1] "88958" On Vista with R 2.9.2 I do get bad results: > test<-"88958Abcdsef67.8S68.9\nW26m" > sub(".*(\\d{5}).*", "\\1", test) [1] "88958Abcdsef67.8S68.9\nW26m" > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE) [1] "88958Abcdsef67.8S68.9\nW26m" > R.version.string [1] "R version 2.9.2 Patched (2009-09-08 r49647)" > win.version() [1] "Windows Vista (build 6002) Service Pack 2" On Wed, May 5, 2010 at 6:20 PM, steven mosher wrote: > Hmm. > I have R11 just downloaded fresh. > I'll reload a new session..and revert. I will note that I've had trouble > with \\d > which is why I was using [0-9] > MAC here. > > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck > wrote: >> >> That's not what I get: >> >> > >> > test<-"88958Abcdsef67.8S68.9\nW26m" >> > sub(".*(\\d{5}).*", "\\1", test) >> [1] "88958" >> > R.version.string >> [1] "R version 2.10.1 (2009-12-14)" >> >> I also got the above in R 2.11.0 patched as well. >> >> >> On Wed, May 5, 2010 at 5:55 PM, steven mosher >> wrote: >> > test >> > [1] >> > >> > "88958Abcdsef67.8S68.9\nW26m" >> >> sub(".*(\\d{5}).*", "\\1", test) >> > [1] "" >> >> sub(".*([0-9]{5}).*","\\1",test) >> > [1] "88958" >> >> >> > >> > I think the "> > as the group capture appears to not be working, except the bracket >> > version >> > it did. >> > >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck >> > >> > wrote: >> >> >> >> Here are two ways to extract 5 digits. >> >> >> >> In the first one \\1 refers to the portion matched between the >> >> parentheses in the regular expression. >> >> >> >> In the second one strapply is like apply where the object to be worked >> >> on is the first argument (array for apply, string for strapply) the >> >> second modifies it (which dimension for apply, regular expression for >> >> strapply) and the last is a function which acts on each value >> >> (typically each row or column for apply and each match for strapply). >> >> In this case we use c as our function to just return all the results. >> >> They are returned in a list with one component per string but here >> >> test is just a single string so we get a list one long and we ask for >> >> the contents of the first component using [[1]]. >> >> >> >> # 1 - sub >> >> sub(".*(\\d{5}).*", "\\1", test) >> >> >> >> # 2 - strapply - see http://gsubfn.googlecode.com >> >> library(gsubfn) >> >> strapply(test, "\\d{5}", c)[[1]] >> >> >> >> >> >> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher >> >> wrote: >> >> > Given a text like >> >> > >> >> > I want to be able to extract a matched regular expression from a >> >> > piece >> >> > of >> >> > text. >> >> > >> >> > this apparently works, but is pretty ugly >> >> > # some html >> >> > >> >> > >> >> > test<-"88958Abcdsef67.8S68.9\nW26m" >> >> > # a pattern to extract 5 digits >> >> >> pattern<-"[0-9]{5}" >> >> > # regexpr returns a start point[1] and an attribute "match.length" >> >> > attr(,"match.length) >> >> > # get the substring from the start point to the stop point.. where >> >> > stop >> >> > = >> >> > start +length-1 >> >> >> >> >> > >> >> > >> >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >> >> >> answer >> >> > [1] "88958" >> >> > >> >> > I tried using sub(pattern, replacement, x ) with a regexp that >> >> > captured >> >> > the >> >> > group. I'd found an example of this in the mails >> >> > but it didnt seem to work.. >> > >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
with a fresh restart test<-"88958Abcdsef67.8S68.9\nW26m" > > test [1] "88958Abcdsef67.8S68.9\nW26m" > sub(".*(\\d{5}).*", "\\1", test) [1] "" > sub(".*([0-9]{5}).*", "\\1", test) [1] "88958" > test2<-"aaa12345W" > sub(".*(\\d{5}).*", "\\1", test2) [1] "W" > > sub(".*(\\d{5}).*", "\\1", test2) [1] "W" > sub(".*([0-9]{5}).*", "\\1", test2) [1] "12345" Steve. On Wed, May 5, 2010 at 3:20 PM, David Winsemius wrote: > > On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote: > > Here are two ways to extract 5 digits. >> >> In the first one \\1 refers to the portion matched between the >> parentheses in the regular expression. >> >> In the second one strapply is like apply where the object to be worked >> on is the first argument (array for apply, string for strapply) the >> second modifies it (which dimension for apply, regular expression for >> strapply) and the last is a function which acts on each value >> (typically each row or column for apply and each match for strapply). >> In this case we use c as our function to just return all the results. >> They are returned in a list with one component per string but here >> test is just a single string so we get a list one long and we ask for >> the contents of the first component using [[1]]. >> >> # 1 - sub >> sub(".*(\\d{5}).*", "\\1", test) >> > > test > [1] > "88958Abcdsef67.8S68.9\nW26m" > > I get different results than I expected given that "\\d" should be > synonymous with "[0-9]": > > > > sub(".*([0-9]{5}).*", "\\1", test) > [1] "88958" > > > sub(".*(\\d{5}).*", "\\1", test) > [1] "" > > -- > David. > >> >> # 2 - strapply - see http://gsubfn.googlecode.com >> library(gsubfn) >> strapply(test, "\\d{5}", c)[[1]] >> >> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher >> wrote: >> >>> Given a text like >>> >>> I want to be able to extract a matched regular expression from a piece of >>> text. >>> >>> this apparently works, but is pretty ugly >>> # some html >>> >>> test<-"88958Abcdsef67.8S68.9\nW26m" >>> # a pattern to extract 5 digits >>> pattern<-"[0-9]{5}" >>> # regexpr returns a start point[1] and an attribute "match.length" >>> attr(,"match.length) >>> # get the substring from the start point to the stop point.. where stop = >>> start +length-1 >>> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >>> answer >>> [1] "88958" >>> >>> I tried using sub(pattern, replacement, x ) with a regexp that captured >>> the >>> group. I'd found an example of this in the mails >>> but it didnt seem to work.. >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
MAC bug? On Wed, May 5, 2010 at 3:20 PM, David Winsemius wrote: > > On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote: > > Here are two ways to extract 5 digits. >> >> In the first one \\1 refers to the portion matched between the >> parentheses in the regular expression. >> >> In the second one strapply is like apply where the object to be worked >> on is the first argument (array for apply, string for strapply) the >> second modifies it (which dimension for apply, regular expression for >> strapply) and the last is a function which acts on each value >> (typically each row or column for apply and each match for strapply). >> In this case we use c as our function to just return all the results. >> They are returned in a list with one component per string but here >> test is just a single string so we get a list one long and we ask for >> the contents of the first component using [[1]]. >> >> # 1 - sub >> sub(".*(\\d{5}).*", "\\1", test) >> > > test > [1] > "88958Abcdsef67.8S68.9\nW26m" > > I get different results than I expected given that "\\d" should be > synonymous with "[0-9]": > > > > sub(".*([0-9]{5}).*", "\\1", test) > [1] "88958" > > > sub(".*(\\d{5}).*", "\\1", test) > [1] "" > > -- > David. > >> >> # 2 - strapply - see http://gsubfn.googlecode.com >> library(gsubfn) >> strapply(test, "\\d{5}", c)[[1]] >> >> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher >> wrote: >> >>> Given a text like >>> >>> I want to be able to extract a matched regular expression from a piece of >>> text. >>> >>> this apparently works, but is pretty ugly >>> # some html >>> >>> test<-"88958Abcdsef67.8S68.9\nW26m" >>> # a pattern to extract 5 digits >>> pattern<-"[0-9]{5}" >>> # regexpr returns a start point[1] and an attribute "match.length" >>> attr(,"match.length) >>> # get the substring from the start point to the stop point.. where stop = >>> start +length-1 >>> answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >>> answer >>> [1] "88958" >>> >>> I tried using sub(pattern, replacement, x ) with a regexp that captured >>> the >>> group. I'd found an example of this in the mails >>> but it didnt seem to work.. >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck wrote: > That's not what I get: > > > > test<-"88958Abcdsef67.8S68.9\nW26m" > > sub(".*(\\d{5}).*", "\\1", test) > [1] "88958" > > R.version.string > [1] "R version 2.10.1 (2009-12-14)" > > I also got the above in R 2.11.0 patched as well. > > > On Wed, May 5, 2010 at 5:55 PM, steven mosher > wrote: > > test > > [1] > > > "88958Abcdsef67.8S68.9\nW26m" > >> sub(".*(\\d{5}).*", "\\1", test) > > [1] "" > >> sub(".*([0-9]{5}).*","\\1",test) > > [1] "88958" > >> > > > > I think the " > as the group capture appears to not be working, except the bracket > version > > it did. > > > > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck < > ggrothendi...@gmail.com> > > wrote: > >> > >> Here are two ways to extract 5 digits. > >> > >> In the first one \\1 refers to the portion matched between the > >> parentheses in the regular expression. > >> > >> In the second one strapply is like apply where the object to be worked > >> on is the first argument (array for apply, string for strapply) the > >> second modifies it (which dimension for apply, regular expression for > >> strapply) and the last is a function which acts on each value > >> (typically each row or column for apply and each match for strapply). > >> In this case we use c as our function to just return all the results. > >> They are returned in a list with one component per string but here > >> test is just a single string so we get a list one long and we ask for > >> the contents of the first component using [[1]]. > >> > >> # 1 - sub > >> sub(".*(\\d{5}).*", "\\1", test) > >> > >> # 2 - strapply - see http://gsubfn.googlecode.com > >> library(gsubfn) > >> strapply(test, "\\d{5}", c)[[1]] > >> > >> > >> > >> On Wed, May 5, 2010 at 5:13 PM, steven mosher > >> wrote: > >> > Given a text like > >> > > >> > I want to be able to extract a matched regular expression from a piece > >> > of > >> > text. > >> > > >> > this apparently works, but is pretty ugly > >> > # some html > >> > > >> > > test<-"88958Abcdsef67.8S68.9\nW26m" > >> > # a pattern to extract 5 digits > >> >> pattern<-"[0-9]{5}" > >> > # regexpr returns a start point[1] and an attribute "match.length" > >> > attr(,"match.length) > >> > # get the substring from the start point to the stop point.. where > stop > >> > = > >> > start +length-1 > >> >> > >> > > >> > > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) > >> >> answer > >> > [1] "88958" > >> > > >> > I tried using sub(pattern, replacement, x ) with a regexp that > captured > >> > the > >> > group. I'd found an example of this in the mails > >> > but it didnt seem to work.. > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(".*(\\d{5}).*", "\\1", test) > test [1] "88958Abcdsef67.8S68.9\nWth>26m" I get different results than I expected given that "\\d" should be synonymous with "[0-9]": > sub(".*([0-9]{5}).*", "\\1", test) [1] "88958" > sub(".*(\\d{5}).*", "\\1", test) [1] "" -- David. # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, "\\d{5}", c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test<-"88958Abcdsef67.8Sth>68.9\nW26m" # a pattern to extract 5 digits pattern<-"[0-9]{5}" # regexpr returns a start point[1] and an attribute "match.length" attr(,"match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test) [1]+attr(regexpr(pattern,test),"match.length")-1) answer [1] "88958" I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
That's not what I get: > test<-"88958Abcdsef67.8S68.9\nW26m" > sub(".*(\\d{5}).*", "\\1", test) [1] "88958" > R.version.string [1] "R version 2.10.1 (2009-12-14)" I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher wrote: > test > [1] > "88958Abcdsef67.8S68.9\nW26m" >> sub(".*(\\d{5}).*", "\\1", test) > [1] "" >> sub(".*([0-9]{5}).*","\\1",test) > [1] "88958" >> > > I think the " as the group capture appears to not be working, except the bracket version > it did. > > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck > wrote: >> >> Here are two ways to extract 5 digits. >> >> In the first one \\1 refers to the portion matched between the >> parentheses in the regular expression. >> >> In the second one strapply is like apply where the object to be worked >> on is the first argument (array for apply, string for strapply) the >> second modifies it (which dimension for apply, regular expression for >> strapply) and the last is a function which acts on each value >> (typically each row or column for apply and each match for strapply). >> In this case we use c as our function to just return all the results. >> They are returned in a list with one component per string but here >> test is just a single string so we get a list one long and we ask for >> the contents of the first component using [[1]]. >> >> # 1 - sub >> sub(".*(\\d{5}).*", "\\1", test) >> >> # 2 - strapply - see http://gsubfn.googlecode.com >> library(gsubfn) >> strapply(test, "\\d{5}", c)[[1]] >> >> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher >> wrote: >> > Given a text like >> > >> > I want to be able to extract a matched regular expression from a piece >> > of >> > text. >> > >> > this apparently works, but is pretty ugly >> > # some html >> > >> > test<-"88958Abcdsef67.8S68.9\nW26m" >> > # a pattern to extract 5 digits >> >> pattern<-"[0-9]{5}" >> > # regexpr returns a start point[1] and an attribute "match.length" >> > attr(,"match.length) >> > # get the substring from the start point to the stop point.. where stop >> > = >> > start +length-1 >> >> >> > >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >> >> answer >> > [1] "88958" >> > >> > I tried using sub(pattern, replacement, x ) with a regexp that captured >> > the >> > group. I'd found an example of this in the mails >> > but it didnt seem to work.. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
test [1] "88958Abcdsef67.8S68.9\nW26m" > sub(".*(\\d{5}).*", "\\1", test) [1] "" > sub(".*([0-9]{5}).*","\\1",test) [1] "88958" > I think the "wrote: > Here are two ways to extract 5 digits. > > In the first one \\1 refers to the portion matched between the > parentheses in the regular expression. > > In the second one strapply is like apply where the object to be worked > on is the first argument (array for apply, string for strapply) the > second modifies it (which dimension for apply, regular expression for > strapply) and the last is a function which acts on each value > (typically each row or column for apply and each match for strapply). > In this case we use c as our function to just return all the results. > They are returned in a list with one component per string but here > test is just a single string so we get a list one long and we ask for > the contents of the first component using [[1]]. > > # 1 - sub > sub(".*(\\d{5}).*", "\\1", test) > > # 2 - strapply - see http://gsubfn.googlecode.com > library(gsubfn) > strapply(test, "\\d{5}", c)[[1]] > > > > On Wed, May 5, 2010 at 5:13 PM, steven mosher > wrote: > > Given a text like > > > > I want to be able to extract a matched regular expression from a piece of > > text. > > > > this apparently works, but is pretty ugly > > # some html > > > test<-"88958Abcdsef67.8S68.9\nW26m" > > # a pattern to extract 5 digits > >> pattern<-"[0-9]{5}" > > # regexpr returns a start point[1] and an attribute "match.length" > > attr(,"match.length) > > # get the substring from the start point to the stop point.. where stop = > > start +length-1 > >> > > > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) > >> answer > > [1] "88958" > > > > I tried using sub(pattern, replacement, x ) with a regexp that captured > the > > group. I'd found an example of this in the mails > > but it didnt seem to work.. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Thanks I was looking at that package and reading your mails in the archive. I think my tiny mind got twisted in the regexp.. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck wrote: > Here are two ways to extract 5 digits. > > In the first one \\1 refers to the portion matched between the > parentheses in the regular expression. > > In the second one strapply is like apply where the object to be worked > on is the first argument (array for apply, string for strapply) the > second modifies it (which dimension for apply, regular expression for > strapply) and the last is a function which acts on each value > (typically each row or column for apply and each match for strapply). > In this case we use c as our function to just return all the results. > They are returned in a list with one component per string but here > test is just a single string so we get a list one long and we ask for > the contents of the first component using [[1]]. > > # 1 - sub > sub(".*(\\d{5}).*", "\\1", test) > > # 2 - strapply - see http://gsubfn.googlecode.com > library(gsubfn) > strapply(test, "\\d{5}", c)[[1]] > > > > On Wed, May 5, 2010 at 5:13 PM, steven mosher > wrote: > > Given a text like > > > > I want to be able to extract a matched regular expression from a piece of > > text. > > > > this apparently works, but is pretty ugly > > # some html > > > test<-"88958Abcdsef67.8S68.9\nW26m" > > # a pattern to extract 5 digits > >> pattern<-"[0-9]{5}" > > # regexpr returns a start point[1] and an attribute "match.length" > > attr(,"match.length) > > # get the substring from the start point to the stop point.. where stop = > > start +length-1 > >> > > > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) > >> answer > > [1] "88958" > > > > I tried using sub(pattern, replacement, x ) with a regexp that captured > the > > group. I'd found an example of this in the mails > > but it didnt seem to work.. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(".*(\\d{5}).*", "\\1", test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, "\\d{5}", c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher wrote: > Given a text like > > I want to be able to extract a matched regular expression from a piece of > text. > > this apparently works, but is pretty ugly > # some html > test<-"88958Abcdsef67.8S68.9\nW26m" > # a pattern to extract 5 digits >> pattern<-"[0-9]{5}" > # regexpr returns a start point[1] and an attribute "match.length" > attr(,"match.length) > # get the substring from the start point to the stop point.. where stop = > start +length-1 >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >> answer > [1] "88958" > > I tried using sub(pattern, replacement, x ) with a regexp that captured the > group. I'd found an example of this in the mails > but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting a matched string using regexpr
Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test<-"88958Abcdsef67.8S68.9\nW26m" # a pattern to extract 5 digits > pattern<-"[0-9]{5}" # regexpr returns a start point[1] and an attribute "match.length" attr(,"match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) > answer [1] "88958" I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.