> Wouldn't the obvious thing be to not use an r string here? "Obvious" in terms of keeping RCMD CHECK happy, certainly, but it'd be antithetical to clear code--- the string I included in the post would become incomprehensible to the maintainer (me).
IME raw strings in R are under-appreciated and little-known. They have lots of uses besides regexes, whatever the intention(s) may or may not have been! EG I use raw strings for formatted multi-line comments, and documentation, and templated bits of text. Nicer code results. Anyway, I'd be perfectly happy with Duncan Murdoch's suggestion of making UTF-8 legit in R & NAMESPACE generally. I suggested the minor incremental change of "only raw strings" (i) because that's the only thing that affects me ATM, and (ii) just in case there were unwelcome implications of UTF-8 for (iii) strings in general, or (iv) legal variable names etc. cheers Mark On Sun, Nov 30, 2025, at 03:44, Jeff Newmiller via R-package-devel wrote: > Wouldn't the obvious thing be to not use an r string here? Using r strings > does not imply the use of non-ascii characters (AFAIK they are intended for > regex patterns), and using regular strings does not imply you cannot use > Unicode (with \uxxxx). > > At some point I would think that accepting Unicode in package source code > would become acceptable... but supporting Unicode in data objects does not > implicitly suggest that allowing Unicode in source code has to be supported > so your arguments don't IMO really bring any weight to the discussion. > > On November 29, 2025 2:55:52 AM PST, Mark Bravington > <[email protected]> wrote: > >Hi--- My package 'lyxport' has R code with several raw strings (see ?Quotes) > >which contain UTF-8 characters (FWIW: in order to deal with wacky legacy > >Latex characters). For example, one of the strings is: > > > > converto <- r"--{ > > Ä \"A ä \"a Á \'A á \'a Ȧ \.A ȧ \.a Ā \=A > > ā \=a  \^A â \^a À \`A à \`a Ą \k{A} ą \k{a} > ><snipped> > > Ŋ {\NG} Ø {\O} ø {\o} œ {\oe} Œ {\OE} ß {\ss} þ {\th} > > Þ {\TH} > > }--" > > > >RCMD CHECK is not happy, and gives a Warning: > > > >"Portable packages must use only ASCII characters in their R code and > >NAMESPACE directives, except perhaps in comments. Use \uxxxx escapes for > >other characters." > > > >and indeed that is as stated in "Writing R extensions", section 1.1.5 > >("Package subdirectories") and section 1.6.3, "Encoding issues". > > > >But I wonder if this is still sensible now that > > > >(i) R has raw strings (since ~R 4.0); > >(ii) the DESCRIPTION file explicitly says "Encoding: UTF-8"; and > >(iii) R >= 4.2 pretty much now enforces UTF-8 in Windows (and UTF-8 could > >even be a "requirement" of this package, if that helped). > > > >With "normal" strings then maybe the \uxxxx thing is reasonable; but > >shouldn't the contents of raw strings be exempt? You can't put \uxxxx into a > >raw string, for obvious reasons... > > > >cheers > >Mark > > > > > >PS Of course, there are ways around the Warning (eg storing the strings as > >files elsewhere in the package, and reading those files during the code) but > >they are tedious, harder to maintain, and reduce clarity (imagine using > >\uxxxx in the above!). Since I don't particularly care whether the package > >goes on CRAN or not (it's living quite happily in R-universe), I've no plans > >to change my code, but I would prefer to avoid Warnings that then have to be > >explained to would-be users. And I am probably not the only person affected. > > > >PPS The package has been working fine on Windows, Macs, and Linux. > > > >______________________________________________ > >[email protected] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-package-devel > > -- > Sent from my phone. Please excuse my brevity. > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
