Re: [R] Encoding issue

Ivan Krylov Mon, 05 Nov 2018 11:35:01 -0800

On Mon, 5 Nov 2018 08:36:13 -0500 (EST)
Sebastien Bihorel <[email protected]> wrote:


> [1] "râs"

Interesting. This is what I get if I decode the bytes 72 e2 80 99 73 0a
as latin-1 instead of UTF-8. They look like there is only three
characters, but, actually, there is more:

$ perl -CSD -Mcharnames=:full -MEncode=decode \
 -E'for (split //, decode latin1 => pack "H*", "72e28099730a")
 { say ord, " ", $_, " ", charnames::viacode(ord) }'
114 r LATIN SMALL LETTER R
226 â LATIN SMALL LETTER A WITH CIRCUMFLEX
128  PADDING CHARACTER
153  SINGLE GRAPHIC CHARACTER INTRODUCER
115 s LATIN SMALL LETTER S
10 
 LINE FEED

Does it help if you explicitly specify the file encoding by passing
fileEncoding="UTF-8" argument to scan()?

-- 
Best regards,
Ivan

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Encoding issue

Reply via email to