Duncan Murdoch <murdoch.dun...@gmail.com> writes: > On 29/01/2016 10:35 AM, Daniel Bastos wrote: >> Here's how I plot a graph. >> >> plot(c(1,2,3), main = "graph ç") >> >> The main-string has a UTF-8 character "ç". I believe I'm using the >> windows device. It opens up on my screen. (The window says ``R >> Graphics: Device 2 (ACTIVE)''.) How can I tell it to use my encoding of >> choice? > > As far as I know that's impossible. R uses the system encoding, and I > don't think any Windows versions use UTF-8 code pages. They use > UTF-16 for wide characters, and some 8 bit encoding for byte-sized > characters. R will use whatever 8 bit code page Windows chooses.
You seem to be correct. Here's what Microsoft has to say. ``[...] UTF-16 [...] is the most common encoding of Unicode and the one used for native Unicode encoding on Windows operating systems.''[1] They also claim that ``[w]hile Unicode-enabled functions in Windows use UTF-16, it is also possible to work with data encoded in UTF-8 or UTF-7, which are supported in Windows as multibyte character set code pages.''[1] But I couldn't verify the claim. The documentation of setlocale[2] says the ``set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. If you provide a code page value of UTF-7 or UTF-8, setlocale will fail, returning NULL.''[2] That seems to be correct as per the following C code. printf("locale: %s\n", setlocale(LC_ALL, "UTF-8")); And [3] makes me think that _wsetlocale behaves the same way: ``_wsetlocale [...] is a wide-character version of setlocale; the arguments and return values of _wsetlocale are wide-character strings.'' The following program seems to confirm it. int main(int argc, char *argv[]) { printf("locale: %s\n", _wsetlocale(LC_ALL, (const wchar_t *) "UTF-8")); return 0; } [...] (*) A workaround Since R comes with iconv(), the following might be a safe way to translate UTF-8 into the current system locale, displaying correctly plot's titles on Windows systems. iconv("utf8-string", from="UTF-8", to=localeToCharset(Sys.getlocale("LC_CTYPE"))) (*) References [1] MSDN Unicode https://msdn.microsoft.com/en-us/library/windows/desktop/dd374081(v=vs.85).aspx [2] MSDN setlocale https://msdn.microsoft.com/en-us/library/x99tb11d.aspx [3] MSDN Locales and Code Pages https://msdn.microsoft.com/en-us/library/8w60z792.aspx ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.