Re: [R-pkg-devel] UTF-8 and raw strings in package code

Mark Bravington Sat, 29 Nov 2025 15:58:40 -0800

> Wouldn't the obvious thing be to not use an r string here? 

"Obvious" in terms of keeping RCMD CHECK happy, certainly, but it'd be 
antithetical to clear code--- the string I included in the post would become 
incomprehensible to the maintainer (me).


IME raw strings in R are under-appreciated and little-known. They have lots of 
uses besides regexes, whatever the intention(s) may or may not have been! EG I 
use raw strings for formatted multi-line comments, and documentation, and  
templated bits of text. Nicer code results.

Anyway, I'd be perfectly happy with Duncan Murdoch's suggestion of making UTF-8 
legit in R & NAMESPACE generally. I suggested the minor incremental change of 
"only raw strings" (i) because that's the only thing that affects me ATM, and 
(ii) just in case there were unwelcome implications of UTF-8 for (iii) strings 
in general, or (iv) legal variable names etc.

cheers
Mark


On Sun, Nov 30, 2025, at 03:44, Jeff Newmiller via R-package-devel wrote:
> Wouldn't the obvious thing be to not use an r string here? Using r strings 
> does not imply the use of non-ascii characters (AFAIK they are intended for 
> regex patterns), and using regular strings does not imply you cannot use 
> Unicode (with \uxxxx).
> 
> At some point I would think that accepting Unicode in package source code 
> would become acceptable... but supporting Unicode in data objects does not 
> implicitly suggest that allowing Unicode in source code has to be supported 
> so your arguments don't IMO really bring any weight to the discussion.
> 
> On November 29, 2025 2:55:52 AM PST, Mark Bravington 
> <[email protected]> wrote:
> >Hi--- My package 'lyxport' has R code with several raw strings (see ?Quotes) 
> >which contain UTF-8 characters (FWIW: in order to deal with wacky legacy 
> >Latex characters). For example, one of the strings is:
> >
> >  converto <- r"--{
> >      Ä   \"A ä   \"a Á   \'A á   \'a Ȧ   \.A ȧ   \.a Ā   \=A
> >      ā   \=a Â   \^A â   \^a À   \`A à   \`a Ą   \k{A} ą   \k{a}
> ><snipped>
> >      Ŋ   {\NG} Ø   {\O}  ø   {\o}  œ   {\oe} Œ   {\OE} ß   {\ss} þ   {\th}
> >      Þ   {\TH}
> >    }--"
> >
> >RCMD CHECK is not happy, and gives a Warning:
> >
> >"Portable packages must use only ASCII characters in their R code and 
> >NAMESPACE directives, except perhaps in comments. Use \uxxxx escapes for 
> >other characters."
> >
> >and indeed that is as stated in "Writing R extensions", section 1.1.5 
> >("Package subdirectories") and section 1.6.3, "Encoding issues".
> >
> >But I wonder if this is still sensible now that
> >
> >(i) R has raw strings (since ~R 4.0);
> >(ii) the DESCRIPTION file explicitly says "Encoding: UTF-8"; and 
> >(iii) R >= 4.2 pretty much now enforces UTF-8 in Windows (and UTF-8 could 
> >even be a "requirement" of this package, if that helped).
> >
> >With "normal" strings then maybe the \uxxxx thing is reasonable; but 
> >shouldn't the contents of raw strings be exempt? You can't put \uxxxx into a 
> >raw string, for obvious reasons...
> >
> >cheers
> >Mark
> >
> >
> >PS Of course, there are ways around the Warning (eg storing the strings as 
> >files elsewhere in the package, and reading those files during the code) but 
> >they are tedious, harder to maintain, and reduce clarity (imagine using 
> >\uxxxx in the above!). Since I don't particularly care whether the package 
> >goes on CRAN or not (it's living quite happily in R-universe), I've no plans 
> >to change my code, but I would prefer to avoid Warnings that then have to be 
> >explained to would-be users. And I am probably not the only person affected.
> >
> >PPS The package has been working fine on Windows, Macs, and Linux.
> >
> >______________________________________________
> >[email protected] mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 
> --
> Sent from my phone. Please excuse my brevity.
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] UTF-8 and raw strings in package code

Reply via email to