Seems to work fine on my machine:

> data1 <- read.table("";,
+       header = TRUE, sep = "\t", encoding="UTF-8", check.names=FALSE)
> data1
  אחת שתיים שלוש
1  12    97    6
2 123   354   44
3   6     1    3
> colnames(data1)
[1] "אחת"   "שתיים" "שלוש"
> colnames(data1)[1]
[1] "אחת"
> strsplit(colnames(data1)[1], "")[[1]][1]
[1] "א"
> data1[,"שתיים"]
[1]  97 354   1
> lm(`שתיים` ~ `שלוש`, data=data1)

lm(formula = שתיים ~ שלוש, data = data1)

(Intercept)         שלוש
     12.406        7.826

> sessionInfo()
R version 2.10.1 (2009-12-14)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
                           sysname                            release
                           "Linux"            ""
                           version                           nodename
"#1 SMP 2010-01-27 08:20:11 +0100"                       "linux-46fj"
                           machine                              login
                            "i686"                          "unknown"


On Thu, Mar 18, 2010 at 6:42 PM, William Dunlap <> wrote:
> I tried this on R 2.11.0 unstable (2010-03-07 r51225) using
> encoding="UTF-8" and check.names=FALSE in read.table().
> It seemed to basically work, except that the data.frame/matrix printing
> routine wants to print the Unicode codes for the characters
> in the names:
>   > data1 <- read.table("";,
>       header = TRUE, sep = "\t", encoding="UTF-8", check.names=FALSE)
>   > data1 # I see Unicode codes, presumably the correct ones
>     <U+05D0><U+05D7><U+05EA> <U+05E9><U+05EA><U+05D9><U+05D9><U+05DD>
>   1                       12                                       97
>   2                      123                                      354
>   3                        6                                        1
>     <U+05E9><U+05DC><U+05D5><U+05E9>
>   1                                6
>   2                               44
>   3                                3
>   > colnames(data1) # I see Hebrew strings (in R the first starts with aleph)
>   [1] "אחת"   "שתיים" "שלוש"
>   > colnames(data)[1]
>   [1] "אחת"
>   > strsplit(colnames(data)[1], "")[[1]][1]
>   [1] "א"
>   > data1[,"שתיים"]
>   [1]  97 354   1
> I'm writing this in Outlook in the English (American) locale
> and the copy-n-paste from the R gui window to the Outlook window
> of the Hebrew letters reversed the whole line of them (reversing
> the characters in each name and the names in the line), which I
> why I showed a subset of the names and a substring of the first name.
> However, when I try to use lm() with this data.frame then I run into
> trouble, which is probably the same problem as I see in the
> data.frame printing:
>   > lm(`שתיים` ~ `שלוש`)
>   Error: \uxxxx sequences not supported inside backticks (line 1)
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap
>> -----Original Message-----
>> From:
>> [] On Behalf Of Tal Galili
>> Sent: Thursday, March 18, 2010 2:41 PM
>> To:
>> Subject: [R] How to read.table with “Hebrew” column names (in R)?
>> (I am reposting this question after a few months without a
>> solution...)
>> Hi all,
>> I am trying to read a .txt file, with Hebrew column names, but without
>> success.
>> I uploaded an example file to:
>> And tried the command:
>> read.table("";, header =
>> T, sep = "\t")
>> This returns me with:
>>   X.....ª X...ª...... X...Å“....
>> 1      12          97         6
>> 2     123         354        44
>> 3       6           1         3
>> Instead of:
>> × ×—×ª ×©×ª×™×™×    ×©×œ×•×©
>> 12  97  6
>> 123 354 44
>> 6   1   3
>>  Trying to use something like:
>> read.table("",fileEncodin
>> g ="iso8859-8")
>> Has resulted in:
>>  V1
>> 1  ?
>> Warning messages:
>> 1: In read.table("";, fileEncoding
>> = "iso8859-8") :
>>   invalid input found on input connection
>> ''
>> 2: In read.table("";, fileEncoding
>> = "iso8859-8") :
>>   incomplete final line found by readTableHeader on
>> ''
>> While also trying this:
>> Sys.setlocale("LC_ALL", "en_US.UTF-8")
>> Or this:
>> Sys.setlocale("LC_ALL",
>> "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8")
>> Get's me this:
>> [1] ""
>> Warning message:
>> In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
>>   OS reports request to set locale to "en_US.UTF-8" cannot be honored
>> My output for:
>> l10n_info()
>> Is:
>> $MBCS
>> [1] FALSE
>> $`UTF-8`
>> [1] FALSE
>> $`Latin-1`
>> [1] TRUE
>> $codepage
>> [1] 1252
>> And for:
>> Sys.getlocale()
>> Is:
>> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
>> Finally, here is the > sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> i386-pc-mingw32
>> locale:
>> [1] LC_COLLATE=English_United States.1255  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.1
>> Any suggestion or clarification will be appreciated.
>> Best,
>> Tal
>> ----------------Contact
>> Details:-------------------------------------------------------
>> Contact me: |  972-52-7275845
>> Read me: (Hebrew) |
>> (Hebrew) |
>> (English)
>> --------------------------------------------------------------
>> --------------------------------
>>       [[alternative HTML version deleted]]
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Reply via email to