Sorry I took so long getting back to this, but the paying job needs to take priority.
The regular expression "(?<!un)(?<!non)muta" looks for a string that matches "muta" then looks at the characters immediately before it to see if they match either "un" or "non" in which case it makes it a not match. More specifically the regular expression engine steps through the string and at each point tries the match, so at a given point it will first see if "un" is before that point, if it is then this point can't match and it moves the checking point, if it is not "un" then it moves to the next negative look behind and sees if "non" is just before the point. If neither "un" or "non" are just before the point then it starts matching characters after the point to see if they match "muta". So the next pattern is "(?!muta)non|un", the (?!muta) is a negative look ahead which starts at the point and checks forward to see that the next characters are not "muta" (but does not include them in the match), in this case it is a no-op because you are saying that you want to match at a point where the next characters are not "muta" but are "non" and since the next set of characters cannot be both this is the same as just matching "non", also you need to be aware of the operator precedence, in that pattern the (?!muta) part only applied to the "non", not the "un". To match "nonmuta" or "unmuta" a simple pattern would just be "(non|un)muta" or "(no|u)nmuta". You could use the positive lookbehind (you would still need an "or"), but it would be overkill for a grep command. The difference in the positive look ahead/behind is more important for replacing where the look ahead/behind is needed for the match to happen, but is not captured as part of the match to be replaced. On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller <pjmiller...@yahoo.com> wrote: > Hi Greg, > > This is quite helpful. Not so good yet with regular expressions in general or > Perl-like regular expressions. Found the help page though, and think I was > able to determine how the code works as well as how I would select only > instances where "muta" is preceeded by either "non" or "un". > >> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) > [1] "mutation" "nonmutated" "unmutated" "verymutated" "other" > >> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE) > [1] 1 4 > >> grep("(?!muta)non|un", tmp, perl=TRUE) > [1] 2 3 > > Did I get the second grep right? > > If so, do you have any sense of why it seems to fail when I apply it to my > data? > >> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", >> names(KRASyn), perl=TRUE)]) > > Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) : > 'x' must be numeric > > Thanks, > > Paul > -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.