On Fri, Sep 13, 2019 at 1:46 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote:
> On 9/13/19 1:33 PM, Ray Donnelly wrote: > > On Fri, Sep 13, 2019 at 11:53 AM Tomas Kalibera <tomas.kalib...@gmail.com> > wrote: > >> On 9/13/19 11:37 AM, IAGO GINÉ VÁZQUEZ wrote: >> > But if I type >> > >"會" >> > the output is >> > [1] "會" >> > so seemingly it can be represented. Or, am I wrong? >> >> In RGui you can print the string, because RGui is a Windows Unicode >> application (uses UTF16-LE and bypasses the C runtime for strings). But >> it is just the gui, R itself (and hence also packages) use the current >> native encoding as defined by the C runtime. RGui will make sure R gets >> the string in UTF-8, but as soon as you do anything even slightly >> non-trivial, which includes formatting, the string will be converted to >> the current native encoding. Some R functions allow you to do certain >> things in UTF-8 without conversion to native encoding, you'd have to >> read very carefully the documentation for each function - but for >> practical use, you either need to live with the misinterpretation of >> some characters, or use Windows in the locale where your characters can >> be represented (e.g. Chinese locale when working with Chinese strings), >> or use Linux/maOS. On Linux/macOS the current native encoding can be >> UTF-8, so there is no problem. On Windows, with the current toolchain >> based on mingw, this is not possible. >> > > mingw-w64 is capable of processing utf-8 (it can process bytes after all). > Can you explain what you mean here? Would any other compiler on Windows not > suffer from this problem? > > The problem is using UTF-8 as the current locale as understood by the C > runtime/C library. By default mingw uses msvcrt, which does not allow UTF-8 > as current locale (via setlocale()). Now mingw also allows to build with > UCRT (recently), and I hope one day we will be able to use it, but it is > not yet the default, msys2 does not use it yet for its mingw_ packages and > we need also the external packages . Note that R (CRAN, and also BIOC) > provide binary versions of all packages for Windows, they need to build > them and they need all library dependencies. All of those would have to be > rebuilt with UCRT, which will be a huge task. Fixing R on its own to > support UTF-8 natively on Windows when the C runtime allows it won't be > hard, because R already can do it on Unix, but the problem is all the > dependencies. > Thanks. We build R for the Anaconda Distribution and are considering our options around our Windows compilers, including the UCRT (and clang, possibly from MSYS2, possibly from conda-forge, or a hybrid of some sort if necessary). > Tomas > > > > > >> >> >> Best >> Tomas >> >> > >> > Best >> > Iago >> > ------------------------------------------------------------------------ >> > *De:* Tomas Kalibera <tomas.kalib...@gmail.com> >> > *Enviat el:* divendres, 13 de setembre de 2019 11:24 >> > *Per a:* IAGO GINÉ VÁZQUEZ <i.g...@pssjd.org>; r-devel@r-project.org >> > <r-devel@r-project.org> >> > *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2 >> > -windows 10 >> > On 9/13/19 11:01 AM, IAGO GINÉ VÁZQUEZ wrote: >> > > I have a chinese character on a data frame, but the output of >> > printing it is its UTF-8 code. Concretely, the character is 會 and the >> > code is U+6703. Following the code I arrive to the instruction >> > > >> > >> base::format.default("會") >> > > which prints >> > > >> > > [1] "<U+6703>" >> > > >> > > I do not know which is the extent of this behaviour either if it >> > follows on most recent versions of R. >> > > >> > > Is it expected? >> > >> > If you are running this on Windows in an encoding where the character >> > cannot be represented (e.g. non-Chinese locale), then yes, this is >> > expected behavior. >> > >> > On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the >> > character will be formatted/displayed properly. >> > >> > Best >> > Tomas >> > >> > > >> > > Thank you! >> > > >> > > Iago >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > R-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel