But I do get the incorrect result on R 2.14.0 on linux: > sub('[[:digit:]]{1,2}', '', '9ewww') [1] "www"
And also: > sub('[[:digit:]]{1,2}', '', '9ewww') [1] "www" > sub('[[:digit:]]{1,2}', '', 'ewww9') [1] "ww9" > sub('\\d{1,2}', '', 'ewww9') [1] "ww9" But: > sub('\\d', '', 'ewww9') [1] "ewww" > sub('\\d*', '', '9ewww') [1] "ewww" So it seems to be something about the way the curly braces are handled, but only with certain groups: > sub('e{1,2}', '', '9ewww') [1] "9www" > sub('9{1,2}', '', '9ewww') [1] "ewww" But, as Prof. Ripley's email suggests, perl=TRUE solves the problem. (I was trying out various combinations when it appeared in my inbox.) > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base On Fri, Dec 9, 2011 at 9:25 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 09/12/2011 9:20 AM, Jannis wrote: >> >> Dear R users, >> >> >> the way I understand the documentation of sub() and regexp the following >> code: >> >> >> >> sub('[[:digit:]]{1,2}', '', '9ewww') >> >> >> >> ... should yield: >> >> 'ewww' >> >> >> It returns, however: >> >> 'www' >> >> >> Why is this the case? My code should just substitute 1 (minimum) or up to >> 2 (maximum) digits, i.e. numbers and not the 'e' in the string. Do I >> misinterpret something here? > > > I get your expected output of "ewww" running 2.14.0 or 2.14.0-patched on > Windows. So it's not a universal problem... > > Duncan Murdoch > >> >> Thanks for any ideas >> Jannis >> >> >> > sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: i686-pc-linux-gnu (32-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] >> LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C >> LC_NAME=C [9] LC_ADDRESS=C >> LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 >> LC_IDENTIFICATION=C >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.