[R] Negating two identical characters with regular expressions

Michael Young Sun, 05 Jun 2011 13:13:25 -0700

Hello all,

Let's say I have a character string
"Race-ethnicity-----coding information"


I want to extract all text before the multiple dashes, including the word
"ethnicity."

I wrote a handy function to extract the first matched text:

grepcut <- function(pattern,x){
start.and.length <- regexpr(pattern,x)
substring(x,start.and.length,start.and.length
+attr(start.and.length,"match.length")-1)}

grepcut("^[^-]+","Race-ethnicity-----coding information")

The above grepcut, of course, returns only the string "Race"  What I really
want is a to create a class of two dashes in a row and then negate that. Is
it possible to create a class of repeated characters?  If so, it might be
further complicated that "-" is a special character in brackets and can only
go first or last.

Can anyone help me out?

Thanks,
Michael Young

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Negating two identical characters with regular expressions

Reply via email to