Here is one way using a single pattern (so can be used in a substitution), it uses Perl's positive look ahead patters:
> test <- > c("SHRT","5HRT","M1TCH","M1TCH5","LONG3RS","NONUMBER","TOOLOOOONGG","ooops.3") > > sub( '(?=[a-zA-Z]{0,8}[0-9])[a-zA-Z0-9]{5,9}', 'xxx', test, perl=TRUE) [1] "SHRT" "5HRT" "xxx" "xxx" "xxx" [6] "NONUMBER" "TOOLOOOONGG" "ooops.3" > -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Marc Schwartz > Sent: Monday, June 08, 2009 6:33 PM > To: Barry Rowlingson > Cc: r-help@r-project.org; Tan, Richard > Subject: Re: [R] Regex question to find a string that contains 5-9 > alpha-numeric characters, at least one of which is a number > > > On Jun 8, 2009, at 5:27 PM, Barry Rowlingson wrote: > > > On Mon, Jun 8, 2009 at 10:40 PM, Tan, Richard<r...@panagora.com> > > wrote: > >> Hi, > >> > >> This is not exactly an R question but I am trying to use gsub to > >> replace > >> a string that contains 5-9 alpha-numeric characters, at least one of > >> which is a number. Is there a good way to write it in a one line > >> regex? > > > > The only way I can think of is to spell out all the possible > > expressions, somethinglike: > > > > [0-9][a-z0-9]{4} | [a-z0-9][0-9][a-z0-9]{3} | > > [a-z0-9]{2}[0-9][a-z0-9]{2} .... and so on. That is, have a regex > > component for every possible 5, 6, 7, 8, and 9 character expression > > with [0-9] in each place. I'm not sure this qualifies as 'good', > > though.. > > > > Better to do it in two stages, one to check for 5-9 alphanumerics, > > and then another to check for a number. > > > > Here's something on a test vector 's': > > > >> cbind(s,grepl("^[A-Z0-9]{5,9}$",s),grepl("[0-9]",s)) > > s > > [1,] "SHRT" "FALSE" "FALSE" > > [2,] "5HRT" "FALSE" "TRUE" > > [3,] "M1TCH" "TRUE" "TRUE" > > [4,] "M1TCH5" "TRUE" "TRUE" > > [5,] "LONG3RS" "TRUE" "TRUE" > > [6,] "NONUMBER" "TRUE" "FALSE" > > [7,] "TOOLOOOONGG" "FALSE" "FALSE" > > > > The ones you want give two TRUE values. Extending to lower-case is > > left as an exercise... > > > > Barry > > > I was trying to think of a way to do this with only a single grep(), > but it has been too long of a day. > > So here is a bit of a simplification on the two stage approach: > > > vec > [1] "SHRT" "5HRT" "M1TCH" "M1TCH5" > "LONG3RS" "NONUMBER" "TOOLOOOONGG" > > > > grep("[0-9]", vec[grep("^[[:alnum:]]{5,9}$", vec)], value = TRUE) > [1] "M1TCH" "M1TCH5" "LONG3RS" > > > HTH, > > Marc Schwartz > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.