On 15/07/2011 1:42 PM, Sverre Stausland wrote:
>>>
>>>  >    funny.g<- "\u1E21"
>>>  >    funny.g
>>
>>  [1] "ḡ"
>>
>>>  >    data.frame (funny.g) ->    funny.g
>>>  >    funny.g$funny.g
>>
>>  [1] ḡ
>>  Levels:<U+1E21>
>
>  I think the problem is in the data.frame code, not in writing. Data.frames
>  try to display things in a readable way, and since you're on Windows where
>  UTF-8 is not really supported, the code helpfully changes that character to
>  the "<U+1E21>" string. for display.

I thought the data.frame function didn't alter the unicode coding,
since funny.g$funny.g above still displays the right unicode character
(although it does list the levels as<U+1E21>).

>  You should be able to write the Unicode character to file if you use lower
>  level methods such as cat(), on a connection opened using the file()
>  function with the encoding set explicitly.

I'm sorry, but I don't understand what it means "to use cat() on a
connection opened using the file() function". Could you please clarify
that?


I just checked on how R does it. We use UTF-8 encodings in the help pages, regardless of what kind of system you're running on.

It converts the strings to UTF-8 internally first (your funny.g is already encoded that way; see Encoding(funny.g)) then uses

writeLines( ..., useBytes=TRUE)

to write it. The useBytes argument says not to try to make the file readable on the local system, just write out the bytes.

Another way to do it is to get your strings in the UTF-8 encoding, convert them to raw vectors, and use writeBin() to write those out. For example,

funny.g<- "\u1E21"
rawstuff<- charToRaw(funny.g)
writeBin(rawstuff, "funny.g.txt")


All of this appears hard, because you're thinking of UTF-8 as text, but on Windows, R thinks of it as a binary encoding. Modern Windows systems can handle UTF-8, but not all programs on them can.

Duncan Murdoch

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to