My understanding (which could be wrong) is that when you source a file, it first gets translated to your native locale and then parsed. When you parse a character vector, it does not get translated.
In your locale, every "я" character (U+044F) gets replaced by the byte "\xFF": > iconv("\u044f", "UTF-8", "Windows-1251") [1] "\xff" I suspect that particular value causes trouble for the R parser, which uses a stack of previously-seen characters (include/Defn.h): LibExtern char R_ParseContext[PARSE_CONTEXT_SIZE] INI_as(""); And at various places checks whether the context character is EOF. That character is defined as #define R_EOF -1 Which, when cast to a char, is 0xFF. I suspect that your example is revealing two bugs: 1) The R parser seems to have trouble with native characters encoded as 0xFF. It's possible that, since R strings can't contain 0x00, this can be fixed by changing the definition of R_EOF to #define R_EOF 0 2) The other bug is that, as I understand the situation, "source" will fail if the file contains a character that cannot be represented in your native locale. This is a harder bug to tackle because of the way file() and the other connection methods are designed, where they translate the input to your native locale. I don't know if it's possible to override this behavior, and have them translate input to UTF-8 instead. Patrick --- On Mon Aug 28 11:27:07 CEST 2017 Владимир Панфилов <vladimirpanfi...@gmail.com> wrote: Hello, I do not have an account on R Bugzilla, so I will post my bug report here. I want to report a very old bug in base R *source()* function. It relates to sourcing some R scripts in UTF-8 encoding on Windows machines. For some reason if the UTF-8 script is containing cyrillic letter *"я"*, the script execution is interrupted directly on this letter (btw the same scripts are sourcing fine when they are encoded in the systems CP1251 encoding). Let's consider the following script that prints random russian words: >/ />/ />/ *print("Осень")print("Ёжик")print("трясина")print("тест")* / When this script is sourced we get INCOMPLETE_STRING error: >/ />/ />/ />/ />/ *source('D:/R code/test_cyr_letter.R', encoding = 'UTF-8', echo=TRUE)Error />/ in source("D:/R code/test_cyr_letter.R", encoding = "UTF-8", echo = TRUE) />/ : D:/R code/test_cyr_letter.R:3:7: unexpected INCOMPLETE_STRING2: />/ print("Ёжик")3: print("тр ^* / Note that this bug is not triggered when the same file is executed using *eval(parse(...))*: >/ />/ />/ />/ *> eval(parse('D:/R code/test_cyr_letter.R', encoding="UTF-8"))[1] />/ "Осень"[1] "Ёжик"[1] "трясина"[1] "тест"* / I made some reserach and noticed that *source* and *parse* functions have similar parts of code for reading files. After analyzing code of *source()* function I found out that commenting one line from it fixes this bug and the overrided function works fine. See this part of *source()* function code: *... * >/ />/ *filename<- file* />/ />/ * file<- file(filename, "r")* />/ />/ * # on.exit(close(file)) #### COMMENT THIS LINE ####* />/ />/ * if (isTRUE(keep.source)) {* />/ />/ * lines<- scan(file, what="character", encoding = encoding, sep />>/ = "\n")* />/ />/ * on.exit()* />/ />/ * close(file)* />/ />/ * srcfile<- srcfilecopy(filename, lines, />>/ file.mtime(filename)[1], * />/ />/ * isFile = TRUE)* />/ />/ * } * />/ />/ *...* />/ />/ /I do not fully understand this weird behaviour, so I ask help of R Core developers to fix this annoying bug that prevents using unicode scripts with cyrillic on Windows. Maybe you should make that part of *source()* function read files like *parse()* function? *Session and encoding info:* >/ > sessionInfo() />/ R version 3.4.1 (2017-06-30) />/ Platform: x86_64-w64-mingw32/x64 (64-bit) />/ Running under: Windows 7 x64 (build 7601) Service Pack 1 />/ Matrix products: default />/ locale: />/ [1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251 />/ LC_MONETARY=Russian_Russia.1251 />/ [4] LC_NUMERIC=C LC_TIME=Russian_Russia.1251 />/ attached base packages: />/ [1] stats graphics grDevices utils datasets methods base />/ loaded via a namespace (and not attached): />/ [1] compiler_3.4.1 tools_3.4.1 / >/ > l10n_info() />/ $MBCS />/ [1] FALSE />/ $`UTF-8` />/ [1] FALSE />/ $`Latin-1` />/ [1] FALSE />/ $codepage />/ [1] 1251/ [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel