Thanks for your reply, Ista, and your advice. I will re-post to r-help. Best,
Scott -- Scott Kostyshak Assistant Professor of Economics University of Florida https://people.clas.ufl.edu/skostyshak/ On Tue, May 01, 2018 at 07:15:30PM +0000, Ista Zahn wrote: > Hi Scott, > > This question is appropriate for the r-help mailing list, but probably > off-topic here on r-devel. > > Best, > Ista > > On Tue, May 1, 2018 at 2:57 PM, Scott Kostyshak <skostys...@ufl.edu> wrote: > > I have very little knowledge about file encodings and would like to > > learn more. > > > > I've read the following pages to learn more: > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__stat.ethz.ch_R-2Dmanual_R-2Ddevel_library_base_html_Encoding.html&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=HegPJMcZ_5R6vYtdQLgIsh-M6ElOlewHPBZxe8IPSlI&e= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_4806823_how-2Dto-2Ddetect-2Dthe-2Dright-2Dencoding-2Dfor-2Dread-2Dcsv&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=KGDvHJrfkvqbwyKnIiY0V45HtN-W4Rpq4ZBXfIFaFMk&e= > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.r-2Dproject.org_Encodings-5Fand-5FR.html&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=Ka1kGiCw3w22tOLfA50AyrKsMT-La14TQdutJJkdE04&e= > > > > The last one, in particular, has been very helpful. I would be > > interested in any further references that you suggest. > > > > I attach a file that reproduces the issue I would like to learn more > > about. I do not know if the file encoding will be correctly preserved > > through email, so I also provide the file (temporarily) on Dropbox here: > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_3lbgebk7b5uaia7_encoding-5Fexport-5Fissue.R-3Fdl-3D0&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=58a7qB9IHt3s2ZLDglGEHwWARuo8xvSlH_z8G5jDaUY&e= > > > > The file gives an error when using "source()" with the > > argument echo = TRUE: > > > > > source("encoding_export_issue.R", echo = TRUE) > > Error in nchar(dep, "c") : invalid multibyte string, element 1 > > In addition: Warning message: > > In grepl("^[[:blank:]]*$", dep[1L]) : > > input string 1 is invalid in this locale > > > > The problem comes from the "รก" character in the .R file. The file > > appears to be encoded as "iso-8859-1": > > > > $ file --mime-encoding encoding_export_issue.R > > encoding_export_issue.R: iso-8859-1 > > > > Note that for me: > > > > > getOption("encoding") > > [1] "native.enc" > > > > so "native.enc" is used for the "encoding" argument of source(). > > > > The following two calls succeed: > > > > > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown") > > > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1") > > > > Is this file a valid "iso-8859-1" encoded file? Why does source() fail > > in the case of encoding set to "native.enc"? Is it because of the > > settings to UTF-8 in my locale (see info on my system at the bottom of > > this email). > > > > I'm guessing it would be a bad idea to put > > > > options(encoding = "unknown") > > > > in my .Rprofile, because it is difficult to always correctly guess the > > encoding of files? Is there a reason why setting it to "unknown" would > > lead to more problems than leaving it set to "native.enc"? > > > > I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below > > is my session info and locale info for my system with the 3.4.3 version: > > > >> sessionInfo() > > R version 3.4.3 (2017-11-30) > > Platform: x86_64-pc-linux-gnu (64-bit) > > Running under: Ubuntu 16.04.3 LTS > > > > Matrix products: default > > BLAS: /usr/lib/libblas/libblas.so.3.6.0 > > LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > loaded via a namespace (and not attached): > > [1] compiler_3.4.3 > > > >> Sys.getlocale() > > [1] > > "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" > > > > Thanks for your time, > > > > Scott > > > > > > -- > > Scott Kostyshak > > Assistant Professor of Economics > > University of Florida > > https://people.clas.ufl.edu/skostyshak/ > > > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo&m=ACEONMqZnQSCcvrFpo-yWr9piCPKCvJhsdRJKdTd6b0&s=clTC2qDlDodnBISRH9uJKb0wKUZHErA7oJ98a6QZS8g&e= > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel