Re: [R] Regular expressions: offsets of groups
Ok, we decided to have a shot at modifying gregexpr. Let's see how it works out. If anybody is interested in discussing this please contact me. R-help doesn't seem like the right place for further discussion. Is there a default place for discussing things like that? Thanks everybody for your responses! Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Wed, Sep 29, 2010 at 1:58 PM, Michael Bedward wrote: > How is your C coding ? Bill ? Anyone else ? I could have a got at > writing some prototype code to test in the next few days, though if > someone else with decent C skills is itching to do it please speak up. We have a skilled C- and R-programmer who could work on it. I'll talk to him. Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
I'd definitely be a customer for it Titus. And it does seem like an obvious hole in regex processing in R that cries out to be filled. Um, ggregexpr isn't the sexiest of function names :) Perhaps we can think of something a little easier ? How is your C coding ? Bill ? Anyone else ? I could have a got at writing some prototype code to test in the next few days, though if someone else with decent C skills is itching to do it please speak up. Michael On 29 September 2010 20:08, Titus von der Malsburg wrote: > Bill, Michael, > > good to see I'm not the only one who sees potential for improvements > in the regexpr domain. Adding a subpattern argument is certainly a > step in the right direction and would make my life much easier. > However, in my application I need to know not only the position of one > group but also the position of the overall match in the original > string. The ideal solution would provide positions and match lengths > for the whole pattern and for all groups if desired. Only this would > solve all related issues. One possibility is to have a subpattern > argument that accepts a vector of numbers (0 refers to the whole > pattern): > > > gregexpr("a+(b+)", "abcdaabbc", subpattern=c(0,1)) > [[1]]: > [[1]][[1]]: > [1] 1 5 > attr(, "match.length"): > [1] 2 4 > [[1]][[2]]: > [1] 2 7 > attr(, "match.length"): > [1] 1 2 > > A weakness of this solution is that the structure of the return values > changes if length(subpattern)>1. An alternative is to have a separate > function, say ggregepxr for group gregexpr, that returns a list of > lists as in the above example. This function would always return > positions and match lengths of the whole pattern (group 0) and all > groups. The original gregexpr could still have the subpattern > argument but it would only accept single numbers. This way the return > format of gregexpr remains the same. > > Best, > > Titus > > > On Wed, Sep 29, 2010 at 2:42 AM, Michael Bedward > wrote: >> Ah, that's interesting - thanks Bill. That's certainly on the right >> track for me (Titus, you too ?) especially if the subpattern argument >> accepted a vector of multiple group indices. >> >> As you say, this is straightforward in C. I'd be happy to (try to) >> make a patch for the R sources if there was some consensus on the best >> way to implement it, ie. as a new R function or by extending existing >> function(s). > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
Bill, Michael, good to see I'm not the only one who sees potential for improvements in the regexpr domain. Adding a subpattern argument is certainly a step in the right direction and would make my life much easier. However, in my application I need to know not only the position of one group but also the position of the overall match in the original string. The ideal solution would provide positions and match lengths for the whole pattern and for all groups if desired. Only this would solve all related issues. One possibility is to have a subpattern argument that accepts a vector of numbers (0 refers to the whole pattern): > gregexpr("a+(b+)", "abcdaabbc", subpattern=c(0,1)) [[1]]: [[1]][[1]]: [1] 1 5 attr(, "match.length"): [1] 2 4 [[1]][[2]]: [1] 2 7 attr(, "match.length"): [1] 1 2 A weakness of this solution is that the structure of the return values changes if length(subpattern)>1. An alternative is to have a separate function, say ggregepxr for group gregexpr, that returns a list of lists as in the above example. This function would always return positions and match lengths of the whole pattern (group 0) and all groups. The original gregexpr could still have the subpattern argument but it would only accept single numbers. This way the return format of gregexpr remains the same. Best, Titus On Wed, Sep 29, 2010 at 2:42 AM, Michael Bedward wrote: > Ah, that's interesting - thanks Bill. That's certainly on the right > track for me (Titus, you too ?) especially if the subpattern argument > accepted a vector of multiple group indices. > > As you say, this is straightforward in C. I'd be happy to (try to) > make a patch for the R sources if there was some consensus on the best > way to implement it, ie. as a new R function or by extending existing > function(s). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
Ah, that's interesting - thanks Bill. That's certainly on the right track for me (Titus, you too ?) especially if the subpattern argument accepted a vector of multiple group indices. As you say, this is straightforward in C. I'd be happy to (try to) make a patch for the R sources if there was some consensus on the best way to implement it, ie. as a new R function or by extending existing function(s). Michael On 29 September 2010 01:46, William Dunlap wrote: > > S+ has a subpattern=number argument to regexpr and > related functions. It means that the text matched > by the subpattern'th parenthesized expression in the > pattern will be considered the matched text. E.g., > to find runs of b's that come immediately after a's: > > > gregexpr("a+(b+)", "abcdaabbc", subpattern=1) > [[1]]: > [1] 2 7 > attr(, "match.length"): > [1] 1 2 > > or to find bc's that come after 2 or more ab's > > gregexpr("(ab){2,}bc", "abbcabababbcabcababbc", subpattern=1) > > regexpr() and strsplit() have this argument in S+ 8.1 but > gregexpr() is not yet in a released version of S+. > > subpattern=0, the default, means to use the entire > pattern. regexpr allows subpattern=-1, which means > to return a list with one element for each subpattern. > I don't know if the extra complexity is worth it. > (gregexpr does not allow subpattern=-1.) > > The usual C regexec() returns this information. > Perhaps it would be handy to have it in R. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
> -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Bedward > Sent: Tuesday, September 28, 2010 12:46 AM > To: Titus von der Malsburg > Cc: r-help@r-project.org > Subject: Re: [R] Regular expressions: offsets of groups > > What Titus wants to do is akin to retrieving capturing groups from a > Matcher object in Java. I also thought there must be an existing, > elegant solution to this some time ago and searched for it, including > looking at the sources (albeit with not much expertise) but came up > blank. > > I also looked at the stringr package (which is nice) but it doesn't > quite do it either. S+ has a subpattern=number argument to regexpr and related functions. It means that the text matched by the subpattern'th parenthesized expression in the pattern will be considered the matched text. E.g., to find runs of b's that come immediately after a's: > gregexpr("a+(b+)", "abcdaabbc", subpattern=1) [[1]]: [1] 2 7 attr(, "match.length"): [1] 1 2 or to find bc's that come after 2 or more ab's > gregexpr("(ab){2,}bc", "abbcabababbcabcababbc", subpattern=1) regexpr() and strsplit() have this argument in S+ 8.1 but gregexpr() is not yet in a released version of S+. subpattern=0, the default, means to use the entire pattern. regexpr allows subpattern=-1, which means to return a list with one element for each subpattern. I don't know if the extra complexity is worth it. (gregexpr does not allow subpattern=-1.) The usual C regexec() returns this information. Perhaps it would be handy to have it in R. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Michael > > On 28 September 2010 01:48, Titus von der Malsburg > wrote: > > Dear list! > > > >> gregexpr("a+(b+)", "abcdaabbc") > > [[1]] > > [1] 1 5 > > attr(,"match.length") > > [1] 2 4 > > > > What I want is the offsets of the matches for the group (b+), i.e. 2 > > and 7, not the offsets of the complete matches. Is there a way in R > > to get that? > > > > I know about gsubgn and strapply, but they only give me the strings > > matched by groups not their offsets. > > > > I could write something myself that first takes the above matches > > ("ab" and "aabb") and then searches again using only the group (b+). > > For this to work, I'd have to parse the regular expression > and search > > several times (> 2, for nested groups) instead of just > once. But I'm > > sure there is a better way to do this. > > > > Thanks for any suggestion! > > > > Titus > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Tue, Sep 28, 2010 at 6:52 AM, Titus von der Malsburg wrote: > On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward > wrote: >> What Titus wants to do is akin to retrieving capturing groups from a >> Matcher object in Java. > > Precisely. Here's the description: > > http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int) > > Gabor's lookbehind trick solves some special cases but it's not the The only limitation is that in the regular expressions supported by R you cannot have repitition in the (<=...) portion but none of your examples -- neither the one you gave nor the one below require that since if the prior expression ends in X+ you can just use X.Are you sure it does not cover all your actual situations? If you truly do have situations where that require repetition a gregexpr plus gsubfn will do it in one line. Parenthesize the portion of the regular expression you want to capture and replace every character in it with X (or some other character that does not otherwise occur). Then find the positions and lengths of strings of X. > gregexpr("X+", gsubfn("a(b+)", ~ gsub(".", "X", x), "abcdaabbcbbb")) [[1]] [1] 1 5 attr(,"match.length") [1] 1 2 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Tue, Sep 28, 2010 at 9:46 AM, Michael Bedward wrote: > What Titus wants to do is akin to retrieving capturing groups from a > Matcher object in Java. Precisely. Here's the description: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#start(int) Gabor's lookbehind trick solves some special cases but it's not the kind of general solution I'm looking for. Let me explain what I'm trying to achieve here. I'm working on a package that provides tools for processing and analyzing eye movements (we're doing reading research). In most situations, eye movements consist of fixations where the eyes are relatively stationary and saccades, quick movements between fixations. A common way to represent eye movements is as strings of symbols, where each symbol corresponds to a fixation on a particular region. AABC means two fixations followed by a fixation on B and then C. When people analyze eye movements it's often necessary to find specific events in the eye movement record like: fixations on the word C preceded by fixations on words D-F and followed by fixations on words A-C. This event can be specified using this regexpr: "[D-F]+(C)[A-C]+" The group (in parenthesis) indicates the substring for which I'd like to know the position in the overall string. Another application is the extraction of subsequences from a sequence of fixations. Note that in some situations people might have to use more groups in their regexprs and that groups can be nested. In this case the user would have to indicate for which group he/she wants to know the offset. I'm not an expert for regexpr engines but I'm pretty sure the necessary information is available in the engine. Gabor, I see you're the author of gsubfn (fantastic package!). Do you see a relatively simple way to expose information about group offsets and their corresponding match lengths? I think this could be useful for other applications as well. At least it seems Michael could use it, too. We can cook up something for ourselves but a general solution would benefit the larger community. Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
What Titus wants to do is akin to retrieving capturing groups from a Matcher object in Java. I also thought there must be an existing, elegant solution to this some time ago and searched for it, including looking at the sources (albeit with not much expertise) but came up blank. I also looked at the stringr package (which is nice) but it doesn't quite do it either. Michael On 28 September 2010 01:48, Titus von der Malsburg wrote: > Dear list! > >> gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not the offsets of the complete matches. Is there a way in R > to get that? > > I know about gsubgn and strapply, but they only give me the strings > matched by groups not their offsets. > > I could write something myself that first takes the above matches > ("ab" and "aabb") and then searches again using only the group (b+). > For this to work, I'd have to parse the regular expression and search > several times (> 2, for nested groups) instead of just once. But I'm > sure there is a better way to do this. > > Thanks for any suggestion! > > Titus > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Mon, Sep 27, 2010 at 1:34 PM, Titus von der Malsburg wrote: > On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck > wrote: >> Try this zero width negative look behind expression: >> >>> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) >> [[1]] >> [1] 2 7 >> attr(,"match.length") >> [1] 1 2 > > Thanks Gabor, but this gives me the same result as > > gregexpr("b+", "abcdaabbc", perl = TRUE) > > which is wrong if the string is "abcdaabbcbbb". > Sorry, try this: > gregexpr("(?<=a)b+", "abcdaabbcbbb", perl = TRUE) [[1]] [1] 2 7 attr(,"match.length") [1] 1 2 Note that it does not give the same answer as: > gregexpr("b+", "abcdaabbcbbb", perl = TRUE) [[1]] [1] 2 7 10 attr(,"match.length") [1] 1 2 3 gregexpr("(?<=a)b+", "abcdaabbcbbb", perl = TRUE) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
You've tried: gregexpr("b+", "abcdaabbc") On Mon, Sep 27, 2010 at 12:48 PM, Titus von der Malsburg wrote: > Dear list! > > > gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not the offsets of the complete matches. Is there a way in R > to get that? > > I know about gsubgn and strapply, but they only give me the strings > matched by groups not their offsets. > > I could write something myself that first takes the above matches > ("ab" and "aabb") and then searches again using only the group (b+). > For this to work, I'd have to parse the regular expression and search > several times (> 2, for nested groups) instead of just once. But I'm > sure there is a better way to do this. > > Thanks for any suggestion! > > Titus > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
You could do this: gregexpr("ab+", "abcdaabbcbb")[[1]] + 1 On Mon, Sep 27, 2010 at 2:25 PM, Titus von der Malsburg wrote: > On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna > wrote: > > You've tried: > > > > gregexpr("b+", "abcdaabbc") > > But this would match the third occurrence of b+ in "abcdaabbcbb". But > in this example I'm only interested in b+ if it's preceded by a+. > > Titus > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck wrote: > Try this zero width negative look behind expression: > >> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) > [[1]] > [1] 2 7 > attr(,"match.length") > [1] 1 2 Thanks Gabor, but this gives me the same result as gregexpr("b+", "abcdaabbc", perl = TRUE) which is wrong if the string is "abcdaabbcbbb". Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg wrote: > Dear list! > >> gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not the offsets of the complete matches. Is there a way in R > to get that? > > I know about gsubgn and strapply, but they only give me the strings > matched by groups not their offsets. > > I could write something myself that first takes the above matches > ("ab" and "aabb") and then searches again using only the group (b+). > For this to work, I'd have to parse the regular expression and search > several times (> 2, for nested groups) instead of just once. But I'm > sure there is a better way to do this. > Try this zero width negative look behind expression: > gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE) [[1]] [1] 2 7 attr(,"match.length") [1] 1 2 See ?regexp for more info. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna wrote: > You've tried: > > gregexpr("b+", "abcdaabbc") But this would match the third occurrence of b+ in "abcdaabbcbb". But in this example I'm only interested in b+ if it's preceded by a+. Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
Thank you Jim, but just as the solution that I discussed, your proposal involves deconstructing the pattern and searching several times. I'm looking for a general and efficient solution. Internally, the regexpr engine has all necessary information after one pass through the string. What I need is an interface that exposes this information. Titus On Mon, Sep 27, 2010 at 6:43 PM, jim holtman wrote: > try this: > >> x <- gregexpr("a+(b+)", "abcdaabbcaaacaaab") >> justA <- gregexpr("a+", "abcdaabbcaaacaaab") >> # find matches in 'x' for 'justA' >> indx <- which(justA[[1]] %in% x[[1]]) >> # now determine where 'b' starts >> justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx] > [1] 2 7 17 >> > > > On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg > wrote: >> Dear list! >> >>> gregexpr("a+(b+)", "abcdaabbc") >> [[1]] >> [1] 1 5 >> attr(,"match.length") >> [1] 2 4 >> >> What I want is the offsets of the matches for the group (b+), i.e. 2 >> and 7, not the offsets of the complete matches. Is there a way in R >> to get that? >> >> I know about gsubgn and strapply, but they only give me the strings >> matched by groups not their offsets. >> >> I could write something myself that first takes the above matches >> ("ab" and "aabb") and then searches again using only the group (b+). >> For this to work, I'd have to parse the regular expression and search >> several times (> 2, for nested groups) instead of just once. But I'm >> sure there is a better way to do this. >> >> Thanks for any suggestion! >> >> Titus >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regular expressions: offsets of groups
try this: > x <- gregexpr("a+(b+)", "abcdaabbcaaacaaab") > justA <- gregexpr("a+", "abcdaabbcaaacaaab") > # find matches in 'x' for 'justA' > indx <- which(justA[[1]] %in% x[[1]]) > # now determine where 'b' starts > justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx] [1] 2 7 17 > On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg wrote: > Dear list! > >> gregexpr("a+(b+)", "abcdaabbc") > [[1]] > [1] 1 5 > attr(,"match.length") > [1] 2 4 > > What I want is the offsets of the matches for the group (b+), i.e. 2 > and 7, not the offsets of the complete matches. Is there a way in R > to get that? > > I know about gsubgn and strapply, but they only give me the strings > matched by groups not their offsets. > > I could write something myself that first takes the above matches > ("ab" and "aabb") and then searches again using only the group (b+). > For this to work, I'd have to parse the regular expression and search > several times (> 2, for nested groups) instead of just once. But I'm > sure there is a better way to do this. > > Thanks for any suggestion! > > Titus > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regular expressions: offsets of groups
Dear list! > gregexpr("a+(b+)", "abcdaabbc") [[1]] [1] 1 5 attr(,"match.length") [1] 2 4 What I want is the offsets of the matches for the group (b+), i.e. 2 and 7, not the offsets of the complete matches. Is there a way in R to get that? I know about gsubgn and strapply, but they only give me the strings matched by groups not their offsets. I could write something myself that first takes the above matches ("ab" and "aabb") and then searches again using only the group (b+). For this to work, I'd have to parse the regular expression and search several times (> 2, for nested groups) instead of just once. But I'm sure there is a better way to do this. Thanks for any suggestion! Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.