Bill, Michael, good to see I'm not the only one who sees potential for improvements in the regexpr domain. Adding a subpattern argument is certainly a step in the right direction and would make my life much easier. However, in my application I need to know not only the position of one group but also the position of the overall match in the original string. The ideal solution would provide positions and match lengths for the whole pattern and for all groups if desired. Only this would solve all related issues. One possibility is to have a subpattern argument that accepts a vector of numbers (0 refers to the whole pattern):
> gregexpr("a+(b+)", "abcdaabbc", subpattern=c(0,1)) [[1]]: [[1]][[1]]: [1] 1 5 attr(, "match.length"): [1] 2 4 [[1]][[2]]: [1] 2 7 attr(, "match.length"): [1] 1 2 A weakness of this solution is that the structure of the return values changes if length(subpattern)>1. An alternative is to have a separate function, say ggregepxr for group gregexpr, that returns a list of lists as in the above example. This function would always return positions and match lengths of the whole pattern (group 0) and all groups. The original gregexpr could still have the subpattern argument but it would only accept single numbers. This way the return format of gregexpr remains the same. Best, Titus On Wed, Sep 29, 2010 at 2:42 AM, Michael Bedward <michael.bedw...@gmail.com> wrote: > Ah, that's interesting - thanks Bill. That's certainly on the right > track for me (Titus, you too ?) especially if the subpattern argument > accepted a vector of multiple group indices. > > As you say, this is straightforward in C. I'd be happy to (try to) > make a patch for the R sources if there was some consensus on the best > way to implement it, ie. as a new R function or by extending existing > function(s). ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.