Re: [R] extracting a matched string using regexpr
Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Thanks I was looking at that package and reading your mails in the archive. I think my tiny mind got twisted in the regexp.. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
On May 5, 2010, at 5:35 PM, Gabor Grothendieck wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/ thth26m/th I get different results than I expected given that \\d should be synonymous with [0-9]: sub(.*([0-9]{5}).*, \\1, test) [1] 88958 sub(.*(\\d{5}).*, \\1, test) [1] /th -- David. # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/ thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test) [1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
I am using Vista. Another thing to try is strapply using the tcl engine (assuming you do have tcltk capabilities) and the R engine. On Vista R 2.11.0 patched I get the same result: capabilities()[[tcltk]] [1] TRUE strapply(test, \\d{5}, c, engine = tcl)[[1]] [1] 88958 strapply(test, \\d{5}, c, engine = R)[[1]] [1] 88958 On Vista with R 2.9.2 I do get bad results: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test, extended = TRUE) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th R.version.string [1] R version 2.9.2 Patched (2009-09-08 r49647) win.version() [1] Windows Vista (build 6002) Service Pack 2 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com wrote: Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Thnks, perhaps we should report it On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: I am using Vista. Another thing to try is strapply using the tcl engine (assuming you do have tcltk capabilities) and the R engine. On Vista R 2.11.0 patched I get the same result: capabilities()[[tcltk]] [1] TRUE strapply(test, \\d{5}, c, engine = tcl)[[1]] [1] 88958 strapply(test, \\d{5}, c, engine = R)[[1]] [1] 88958 On Vista with R 2.9.2 I do get bad results: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test, extended = TRUE) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th R.version.string [1] R version 2.9.2 Patched (2009-09-08 r49647) win.version() [1] Windows Vista (build 6002) Service Pack 2 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com wrote: Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting a matched string using regexpr
Yes, you could bring it up on the R-sig-mac or file a bug report. On Wed, May 5, 2010 at 10:11 PM, steven mosher mosherste...@gmail.com wrote: Thnks, perhaps we should report it On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: I am using Vista. Another thing to try is strapply using the tcl engine (assuming you do have tcltk capabilities) and the R engine. On Vista R 2.11.0 patched I get the same result: capabilities()[[tcltk]] [1] TRUE strapply(test, \\d{5}, c, engine = tcl)[[1]] [1] 88958 strapply(test, \\d{5}, c, engine = R)[[1]] [1] 88958 On Vista with R 2.9.2 I do get bad results: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test, extended = TRUE) [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th R.version.string [1] R version 2.9.2 Patched (2009-09-08 r49647) win.version() [1] Windows Vista (build 6002) Service Pack 2 On Wed, May 5, 2010 at 6:20 PM, steven mosher mosherste...@gmail.com wrote: Hmm. I have R11 just downloaded fresh. I'll reload a new session..and revert. I will note that I've had trouble with \\d which is why I was using [0-9] MAC here. On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: That's not what I get: test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] 88958 R.version.string [1] R version 2.10.1 (2009-12-14) I also got the above in R 2.11.0 patched as well. On Wed, May 5, 2010 at 5:55 PM, steven mosher mosherste...@gmail.com wrote: test [1] /trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th sub(.*(\\d{5}).*, \\1, test) [1] /th sub(.*([0-9]{5}).*,\\1,test) [1] 88958 I think the / in the source throws something off. as the group capture appears to not be working, except the bracket version it did. On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here are two ways to extract 5 digits. In the first one \\1 refers to the portion matched between the parentheses in the regular expression. In the second one strapply is like apply where the object to be worked on is the first argument (array for apply, string for strapply) the second modifies it (which dimension for apply, regular expression for strapply) and the last is a function which acts on each value (typically each row or column for apply and each match for strapply). In this case we use c as our function to just return all the results. They are returned in a list with one component per string but here test is just a single string so we get a list one long and we ask for the contents of the first component using [[1]]. # 1 - sub sub(.*(\\d{5}).*, \\1, test) # 2 - strapply - see http://gsubfn.googlecode.com library(gsubfn) strapply(test, \\d{5}, c)[[1]] On Wed, May 5, 2010 at 5:13 PM, steven mosher mosherste...@gmail.com wrote: Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test-/trtrth88958/ththAbcdsef/thth67.8S/thth68.9\nW/thth26m/th # a pattern to extract 5 digits pattern-[0-9]{5} # regexpr returns a start point[1] and an attribute match.length attr(,match.length) # get the substring from the start point to the stop point.. where stop = start +length-1 answer-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),match.length)-1) answer [1] 88958 I tried using sub(pattern, replacement, x ) with a regexp that captured the group. I'd found an example of this in the mails but it didnt seem to work.. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.