Hi Duncan,
Thanks, this solves my problem.
Regards, Hilmar
Duncan Murdoch schrieb:
On 18/04/2009 1:18 PM, Hilmar Berger wrote:
Hi all,
I have problems reading Unicode (UTF-16) coded tables in R 2.8.1
under Windows Vista.
Imagine the following table:
a b c d
X 1,2 1,3 1,4
Y 2,2 2,3 2,4
Z 3,2 3,3 3,4
Usually I would use the following code to read the table:
t = read.table("test.txt", header=T, sep="\t",dec=",")
This works well if I create the table using Notepad (the text will be
in UTF-8 or ASCII, then).
I haven't tried 2.8.1 (which is obsolete, since yesterday :-), but in
2.9.0 it works fine if I use the fileEncoding argument to read.table.
Duncan Murdoch
However, If I use e.g. OpenOffice scalc to create a spreadsheet
holding the same data and save this data as text (using tabs as
separators, no quotes and using Unicode encoding) the command above
gives this:
> t = read.table("test.csv", header=T, sep="\t",dec=",")
> t
ÿþa
1 NA
2 NA
3 NA
I tried to play with the "encoding" parameter but that would not
change anything.
The file from OpenOffice is in UTF-16, as shown by hexdump:
$ hexdump test.csv
0000000 feff 0061 0009 0062 0009 0063 0009 0064
0000010 000d 000a 0058 0009 0031 002c 0032 0009
0000020 0031 002c 0033 0009 0031 002c 0034 000d
0000030 000a 0059 0009 0032 002c 0032 0009 0032
0000040 002c 0033 0009 0032 002c 0034 000d 000a
0000050 005a 0009 0033 002c 0032 0009 0033 002c
0000060 0033 0009 0033 002c 0034 000d 000a
000006e
I tried to read the file using file/readLines, which seemed to work
after specifying the encoding:
> a = file("test.csv",open="r", encoding="UTF-16")
> b = readLines(a)
> b
[1] "a\tb\tc\td" "X\t1,2\t1,3\t1,4" "Y\t2,2\t2,3\t2,4"
"Z\t3,2\t3,3\t3,4"
Looking at the code of readtable.R in R-2.8.1. and R-2.9.0 it seems
that the encoding does not get passed through in the second call to
scan() appearing in the code.
I'm not sure if this is a bug or if I'm doing something wrong here.
Regards,
Hilmar
------------------
My system and R settings are:
> sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32
locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.8.1
> Sys.info()
sysname
release version nodename
"Windows" "Vista" "build
6001, Service Pack 1" "PC"
machine
login user
"x86"
> options("encoding")
$encoding
[1] "native.enc"
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.