Try this: > findall("aba", "ababacababab") [1] 1 3 7 9 > gregexpr("a(?=ba)", "ababacababab", perl = TRUE) [[1]] [1] 1 3 7 9 attr(,"match.length") [1] 1 1 1 1
> findall("a.a", "ababacababab") [1] 1 3 5 7 9 > gregexpr("a(?=.a)", "ababacababab", perl = TRUE) [[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 1 1 1 1 1 On Sun, Dec 20, 2009 at 7:22 AM, Hans W Borchers <hwborch...@googlemail.com> wrote: > Gabor Grothendieck <ggrothendieck <at> gmail.com> writes: >> >> Use a zero lookaround expression. It will not consume its match. See >> ?regexp >> >> > gregexpr("a(?=a)", "aaa", perl = TRUE) >> [[1]] >> [1] 1 2 >> attr(,"match.length") >> [1] 1 1 > > I wonder how you would count the number of occurrences of, for example, > 'aba' or 'a.a' (*) in the string "ababacababab" using simple lookahead? > > In Perl, there is a modifier '/g' to do that, and in Python one could > apply the function 'findall'. > > When I had this task, I wrote a small function findall(), see below, but > I would be glad to see a solution with lookahead only. > > Regards > Hans Werner > > (*) or anything more complex > > ---- > findall <- function(apat, atxt) { > stopifnot(length(apat) == 1, length(atxt) == 1) > pos <- c() # positions of matches > i <- 1; n <- nchar(atxt) > found <- regexpr(apat, substr(atxt, i, n), perl=TRUE) > while (found > 0) { > pos <- c(pos, i + found - 1) > i <- i + found > found <- regexpr(apat, substr(atxt, i, n), perl=TRUE) > } > return(pos) > } > ---- > >> On Sun, Dec 20, 2009 at 1:43 AM, Jonathan <jonsleepy <at> gmail.com> wrote: >> > Last one for you guys: >> > >> > The command: >> > >> > length(gregexpr('cus','hocus pocus')[[1]]) >> > [1] 2 >> > >> > returns the number of times the substring 'cus' appears in 'hocus pocus' >> > (which is two) >> > >> > It's returning the number of **disjoint** matches. So: >> > >> > length(gregexpr('aa','aaa')[[1]]) >> > [1] 1 >> > >> > returns 1. >> > >> > **What I want to do:** >> > I'm looking for a way to count all occurrences of the substring, including >> > overlapping sets (so 'aa' would be found in 'aaa' two times, because the >> > middle 'a' gets counted twice). >> > >> > Any ideas would be much appreciated!! >> > >> > Signing off and thanks for all the great assistance, >> > Jonathan >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.