Re: [R] Writing Unicode Text into Text File from R (in Windows)

2014-02-18 Thread Majid Einian
On Tue, Feb 4, 2014 at 4:18 PM, Duncan Murdoch  wrote:
>
> On 14-02-04 5:49 AM, Majid Einian wrote:
>>
>> Dear R Helpers,
>>
>> See the Code:
>>
>> a <- intToUtf8(1777)
>> show(a)
>> zz <- file(description="test.txt",open="w",encoding="UTF-8")
>> cat(a, file = zz)
>> close(zz)
>>
>> in a Unicode aware environment (such as RGui console or RStudio Console)
>> you will see this as output:
>>
>> [1] "۱"
>>
>>
>> but the character is not written correctly in the file test.txt (which is
>> encoded in UTF-8 without BOM) :
>>
>> 
>>
>> The problem seems to be this: R changes text to the locale of system (for
>> me this is Arabic Windows (Codepage 1256) that does not have a relevant
>> code for U+06F1, then changes it back to UTF-8 and writes it into file.
>> What do I miss here?
>>   How can I write a Unicode string into a text file correctly?
>
>
> There are a lot of places in R where it converts strings to the local 
> encoding, perhaps too many. On the other hand, maybe Windows should be 
> offering UTF-8 locales by now.

I would like to see that happen too! I have no such problem on Linux.

>
> I haven't tested in your locale, but I believe writeLines() to a connection 
> declared to be in a UTF-8 encoding will maintain the encoding.

writeLines() does change the encoding to system encoding and then back
to unicode just like cat().

>  You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a 
> BOM on input; I forget whether it will write one on output.  If it doesn't, 
> you can always write one explicitly.
>

I have no problem with BOM being there or not.

> I was hoping to make some progress on this before R 3.1.0 so that more cases 
> of writing strings to UTF-8 files would work, but time is running out.

I hope we see this happen soon :)

Majid Einian

>
> Duncan Murdoch
>
>>
>>
>> Majid Einian,
>> Economics Researcher, Monetary and Banking Research Institute, Central Bank
>> of Islamic Republic of Iran, Tehran, IRAN
>> and
>> PhD Candidate in "Economics", Graduate School of Management and
>> Economics, Sharif University of Technology, Tehran, IRAN
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing Unicode Text into Text File from R (in Windows)

2014-02-04 Thread Duncan Murdoch

On 14-02-04 5:49 AM, Majid Einian wrote:

Dear R Helpers,

See the Code:

a <- intToUtf8(1777)
show(a)
zz <- file(description="test.txt",open="w",encoding="UTF-8")
cat(a, file = zz)
close(zz)

in a Unicode aware environment (such as RGui console or RStudio Console)
you will see this as output:

[1] "Û±"


but the character is not written correctly in the file test.txt (which is
encoded in UTF-8 without BOM) :



The problem seems to be this: R changes text to the locale of system (for
me this is Arabic Windows (Codepage 1256) that does not have a relevant
code for U+06F1, then changes it back to UTF-8 and writes it into file.
What do I miss here?
  How can I write a Unicode string into a text file correctly?


There are a lot of places in R where it converts strings to the local 
encoding, perhaps too many. On the other hand, maybe Windows should be 
offering UTF-8 locales by now.


I haven't tested in your locale, but I believe writeLines() to a 
connection declared to be in a UTF-8 encoding will maintain the 
encoding.  You can declare a file to be in encoding "UTF-8-BOM" if you 
want to ignore a BOM on input; I forget whether it will write one on 
output.  If it doesn't, you can always write one explicitly.


I was hoping to make some progress on this before R 3.1.0 so that more 
cases of writing strings to UTF-8 files would work, but time is running out.


Duncan Murdoch




Majid Einian,
Economics Researcher, Monetary and Banking Research Institute, Central Bank
of Islamic Republic of Iran, Tehran, IRAN
and
PhD Candidate in "Economics", Graduate School of Management and
Economics, Sharif University of Technology, Tehran, IRAN

[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Writing Unicode Text into Text File from R (in Windows)

2014-02-04 Thread Majid Einian
Dear R Helpers,

See the Code:

a <- intToUtf8(1777)
show(a)
zz <- file(description="test.txt",open="w",encoding="UTF-8")
cat(a, file = zz)
close(zz)

in a Unicode aware environment (such as RGui console or RStudio Console)
you will see this as output:

[1] "Û±"


but the character is not written correctly in the file test.txt (which is
encoded in UTF-8 without BOM) :



The problem seems to be this: R changes text to the locale of system (for
me this is Arabic Windows (Codepage 1256) that does not have a relevant
code for U+06F1, then changes it back to UTF-8 and writes it into file.
What do I miss here?
 How can I write a Unicode string into a text file correctly?


Majid Einian,
Economics Researcher, Monetary and Banking Research Institute, Central Bank
of Islamic Republic of Iran, Tehran, IRAN
and
PhD Candidate in "Economics", Graduate School of Management and
Economics, Sharif University of Technology, Tehran, IRAN

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.