On Wed, Apr 10, 2019 at 12:19 PM Tomáš Bořil <bor...@gmail.com> wrote:
>
> Minimalistic example:
> Let's type "ř" (LATIN SMALL LETTER R WITH CARON) in RGui console:
> > "ř"
> [1] "r"
>
> Although the script is in UTF-8, the characters are replaced by
> "simplified" substitutes uncontrollably (depending on OS locale). The
> same goes with simply entering the code statements in R Console.
>
> The problem does not occur on OS with UTF-8 locale (Mac OS, Linux...)

I think this is a "feature" of win_iconv that is bundled with base R
on Windows (./src/extra/win_iconv). The character from your example is
not part of the latin1 (iso-8859-1) set, however, win-iconv seems to
do so anyway:

> x <- "\U00159"
> print(x)
[1] "ř"
> iconv(x, 'UTF-8', 'iso-8859-1')
[1] "r"

On MacOS, iconv tells us this character cannot be represented as latin1:

> x <- "\U00159"
> print(x)
[1] "ř"
> iconv(x, 'UTF-8', 'iso-8859-1')
[1] NA

I'm actually not sure why base-R needs win_iconv (but I'm not an
encoding expert at all). Perhaps we could try to unbundle it and use
the standard libiconv provided by the Rtools toolchain bundle to get
more consistent results.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to