Gabor Grothendieck <ggrothendieck <at> gmail.com> writes: > > Use a zero lookaround expression. It will not consume its match. See ?regexp > > > gregexpr("a(?=a)", "aaa", perl = TRUE) > [[1]] > [1] 1 2 > attr(,"match.length") > [1] 1 1
I wonder how you would count the number of occurrences of, for example, 'aba' or 'a.a' (*) in the string "ababacababab" using simple lookahead? In Perl, there is a modifier '/g' to do that, and in Python one could apply the function 'findall'. When I had this task, I wrote a small function findall(), see below, but I would be glad to see a solution with lookahead only. Regards Hans Werner (*) or anything more complex ---- findall <- function(apat, atxt) { stopifnot(length(apat) == 1, length(atxt) == 1) pos <- c() # positions of matches i <- 1; n <- nchar(atxt) found <- regexpr(apat, substr(atxt, i, n), perl=TRUE) while (found > 0) { pos <- c(pos, i + found - 1) i <- i + found found <- regexpr(apat, substr(atxt, i, n), perl=TRUE) } return(pos) } ---- > On Sun, Dec 20, 2009 at 1:43 AM, Jonathan <jonsleepy <at> gmail.com> wrote: > > Last one for you guys: > > > > The command: > > > > length(gregexpr('cus','hocus pocus')[[1]]) > > [1] 2 > > > > returns the number of times the substring 'cus' appears in 'hocus pocus' > > (which is two) > > > > It's returning the number of **disjoint** matches. So: > > > > length(gregexpr('aa','aaa')[[1]]) > > [1] 1 > > > > returns 1. > > > > **What I want to do:** > > I'm looking for a way to count all occurrences of the substring, including > > overlapping sets (so 'aa' would be found in 'aaa' two times, because the > > middle 'a' gets counted twice). > > > > Any ideas would be much appreciated!! > > > > Signing off and thanks for all the great assistance, > > Jonathan > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.