I'm not very good at character encoding / etc so this might be user error. The following code is meant to replace extended ASCII characters, in particular a non-breaking space, with "", and it works in R-4-1-branch
> R.version.string [1] "R version 4.1.2 Patched (2022-01-04 r81445)" > gsub("[\x7f-\xff]", "", "fo\xa0o") [1] "foo" but fails in R-devel > R.version.string [1] "R Under development (unstable) (2022-01-04 r81445)" > gsub("[\x7f-\xff]", "", "fo\xa0o") Error in gsub("[\177-\xff]", "", "fo\xa0o") : invalid regular expression '[-�]', reason 'Invalid character range' In addition: Warning message: In gsub("[\177-\xff]", "", "fo\xa0o") : TRE pattern compilation error 'Invalid character range' There are other oddities, too, like > gsub("[[:alnum:]]", "", "fo\xa0o") # R-4-1-branch [1] "\xfc\xbe\x8c\x86\x84\xbc" > gsub("[[:alnum:]]", "", "fo\xa0o") # R-devel [1] "<>" The R-devel sessionInfo is > sessionInfo() R Under development (unstable) (2022-01-04 r81445) Platform: x86_64-apple-darwin19.6.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.2.0 (I have built my own R on macOS; similar behavior is observed on a Linux machine) Any hints welcome, Martin Morgan ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel