Perhaps this is a good advert for Linux :) I'm afraid I don't use Windows but I know we've had some hassles with students' machines and encodings before.
Stack Overflow isn't very encouraging (see this, for example <https://stackoverflow.com/questions/46728047/r-rstudio-console-encoding-windows>) but perhaps you could try setting to a windows-native encoding that has the relevant characters? --M On Mon, 2 Jul 2018 at 12:29 Holger Mitterer <holger.mitte...@um.edu.mt> wrote: > Dear Martin and Jan, > > > > thanks for the quick replies. > > What systems are you working on? > > > > I work on Windows, and I am not able to get it to work. > > get locale gives “English_United States.1252” for all. > > > > The Problem is that setlocale does not accept anything containing UTF-8, > > so I can do Sys.setlocale("LC_ALL", "German") > > but this only changes it to German_Germany.1252, or change it back to my > original by: > > Sys.setlocale("LC_CTYPE", "English_US.1252") > > > > > > Anything containg UTF is rejected, both > > Sys.setlocale("LC_ALL", "en_GB.UTF-8") > > or the more restrictive > > Sys.setlocale("LC_CTYPE", "en_GB.UTF-8") > > > > return the message: > > OS reports request to set locale to "en_GB.UTF-8" cannot be honored > > > > I have not found a way to change the Character type local to anything that > contains UTF8. > > Any ideas? > > > > Holger > > > > > > > > > > *From:* Martin Corley <martin.cor...@ed.ac.uk> > *Sent:* Monday, July 2, 2018 12:25 PM > *To:* Holger Mitterer <holger.mitte...@um.edu.mt> > *Cc:* ling-r-lang-l@mailman.ucsd.edu > *Subject:* Re: [R-lang] getting non-Ascii characters in and out of R > unchanged > > > > What does Sys.getlocale() return? > > > > I can read your Malti examples fine in a UTF-8 environment... > > > > > library(readxl) > > > df <- read_excel('~/tmp/test_holg.xlsx') > > > df > > # A tibble: 2 x 1 > > Sentence > > *<chr>* > > 1 Mario iħobb jimmaġina? > > 2 Anita tisfen il-ballet? > > > Sys.getlocale() > > [1] > "LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C" > > Best > > > > --M > > > > On Mon, 2 Jul 2018 at 11:15 Holger Mitterer <holger.mitte...@um.edu.mt> > wrote: > > Hello List, > > > > I have a very simple programming task, which is adding ERP codes to input > files for an EEG experiment. > > That is not the problem. > > > > However, the data contains non-ascii characters (see below) and I do not > manage > > to get them in and out of R without changes. The files are originally > .xlsx, but both readxlsx and readxl packages > seem to ‘normalize’ the input (so that the “ħ” becomes an “h”), and the > original character is lost. > > > > If I save the file as csv in Excel, and use read.csv, one of two things > happens: > > > > I use fileEncoding = “UTF-8” and again, the special characters are > converted to their nearest ASCII neighbor. > > I do not use fileEncoding and the non-ascii characters get garbled. > > > > Any ideas how to get non-American characters in and out of R without such > changes? > > > > Best, > > Holger > > > > PS: An example of the input file: > > > > Run order Filename Condition > OnsetADJ OnsetChange Question Answer > > 1 MALTESEINCOR0032Idealista MALINCONG 2.263 > 2.728 Mario iħobb jimmaġina? Iva > > 2 MALTESEINCOR0032Grazzjuż MALINCONG 1.615 > 2.048 Anita tisfen il-ballet? Iva > > 3 SEMANOM20036gallettinaCor SEMCONG > 1.901 > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - > > - Prof. Holger Mitterer PhD (Maastricht) > > - Department of Cognitive Science > > - Faculty of Media and Knowledge Sciences > > - University of Malta > > - +356 2340 3127 <+356%202340%203127> > > > > -- > > Martin Corley > University of Edinburgh > -- Martin Corley University of Edinburgh
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.