Hi Duncan,
Thanks, this solves my problem.
Regards, Hilmar

Duncan Murdoch schrieb:
On 18/04/2009 1:18 PM, Hilmar Berger wrote:
Hi all,

I have problems reading Unicode (UTF-16) coded tables in R 2.8.1 under Windows Vista.

Imagine the following table:

a    b    c    d
X    1,2    1,3    1,4
Y    2,2    2,3    2,4
Z    3,2    3,3    3,4

Usually I would use the following code to read the table:

t = read.table("test.txt", header=T, sep="\t",dec=",")

This works well if I create the table using Notepad (the text will be in UTF-8 or ASCII, then).

I haven't tried 2.8.1 (which is obsolete, since yesterday :-), but in 2.9.0 it works fine if I use the fileEncoding argument to read.table.

Duncan Murdoch


However, If I use e.g. OpenOffice scalc to create a spreadsheet holding the same data and save this data as text (using tabs as separators, no quotes and using Unicode encoding) the command above gives this:

 > t = read.table("test.csv", header=T, sep="\t",dec=",")
 > t
  ÿþa
1  NA
2  NA
3  NA

I tried to play with the "encoding" parameter but that would not change anything.

The file from OpenOffice is in UTF-16, as shown by hexdump:
$ hexdump test.csv
0000000 feff 0061 0009 0062 0009 0063 0009 0064
0000010 000d 000a 0058 0009 0031 002c 0032 0009
0000020 0031 002c 0033 0009 0031 002c 0034 000d
0000030 000a 0059 0009 0032 002c 0032 0009 0032
0000040 002c 0033 0009 0032 002c 0034 000d 000a
0000050 005a 0009 0033 002c 0032 0009 0033 002c
0000060 0033 0009 0033 002c 0034 000d 000a
000006e

I tried to read the file using file/readLines, which seemed to work after specifying the encoding:

 > a = file("test.csv",open="r", encoding="UTF-16")
 > b = readLines(a)
 > b
[1] "a\tb\tc\td" "X\t1,2\t1,3\t1,4" "Y\t2,2\t2,3\t2,4" "Z\t3,2\t3,3\t3,4"

Looking at the code of readtable.R in R-2.8.1. and R-2.9.0 it seems that the encoding does not get passed through in the second call to scan() appearing in the code.

I'm not sure if this is a bug or if I'm doing something wrong here.

Regards,
Hilmar

------------------
My system  and R settings are:

 > sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached):
[1] tools_2.8.1

 > Sys.info()
sysname release version nodename "Windows" "Vista" "build 6001, Service Pack 1" "PC" machine login user "x86" > options("encoding")
$encoding
[1] "native.enc"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to