Re: [R] Regex: workaround for variable length negative lookbehind

Gabor Grothendieck Sun, 30 Nov 2008 12:27:46 -0800

Try this:

> vec <- c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")


> grep("^(?!(.)\\1{1,}$).*(.)\\2{1,}$", vec, perl = TRUE)
[1] 2 3 5

The (?...) succeeds only if the string is not all the same
character and since that consumes no characters it
restarts at the beginning to match anything followed
by repeated characters to the end.

On Sun, Nov 30, 2008 at 2:33 PM, Stefan Th. Gries <[EMAIL PROTECTED]> wrote:
> Hi all
>
> I have the following regular expression problem: I want to find
> complete elements of a vector that end in a repeated character but
> where the repetition doesn't make up the whole word. That is, for the
> vector vec:
>
> vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")
>
> I would like to get
> "baaa"
> "bbaa"
> "baamm"
>
> >From tools where negative lookbehind can involve variable lengths, one
> would think this would work:
>
> grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)
>
> But then R doesn't like it that much ... I also know I can get it like this:
>
> whole.word.rep <- grep("^(.)\\1{1,}$", vec, perl=T) # 1 6
> rep.at.end <- grep("(.)\\1{1,}$", vec, perl=T) # 1 2 3 5 6
> setdiff(rep.at.end, whole.word.rep) # 2 3 5
>
> But is there a one-line grep thingy to do this?
>
> Thx for any pointers,
> STG
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex: workaround for variable length negative lookbehind

Reply via email to