Just as an update on encoding (which may or may not be of interest). I changed the read.csv command for three .csv files I was reading to specify the encoding to be
encoding="CP1252" and all 3 files were read in without problems on linux. Last night I swapped the analysis back on to my windows machine, and one of the reads stopped part way through with a message about illegal characters. I checked around where the read stopped but couldn't see what the problem was. Dropping the encoding argument to "file" worked around the problem. I now have an if then else which tests what system I am on. Painful but at least it is system independent. Thanks again David On Tue, 22 Jan 2008, Prof Brian Ripley wrote: > On Wed, 23 Jan 2008, David Scott wrote: > >> On Tue, 22 Jan 2008, Prof Brian Ripley wrote: >> >>> On Wed, 23 Jan 2008, David Scott wrote: >>> >>>> >>>> I have encountered a problem with reading a .csv file on a linux box. I >>>> can read the file on my windows machine (under XP) but on the linux box >>>> it >>>> gives : >>>> >>>>> patients <- read.csv("../Patients.csv", header = FALSE, >>>> + col.names = patientsNames) >>>> Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, >>>> na.strings = character(0)) : >>>> invalid multibyte string >>>> Calls: read.csv -> read.table -> type.convert >>>> Execution halted >>>> >>>> I am running R 2.6.1 on both machines. I tried on another linux box >>>> running 2.5.1 and got the same problem >>>> >>>> I am guessing it is something to do with the character encoding. On the >>>> linux box I have >>>> >>>> LANG=en_US.UTF-8 >>> >>> So what encoding is the .csv file in? Consider the example at the end of >>> ?file >>> >>> ## examples of use of encodings >>> cat(x, file = file("foo", "w", encoding="UTF-8")) >>> # read a 'Windows Unicode' file including names >>> A <- read.table(file("students", encoding="UCS-2LE")) >>> >>> and adapt accordingly (encoding = "CP1252" is the most likely value if >>> this works in English-language Windows). >>> >> >> >> Thanks Brian for the super-quick, super-helpful reply. The encoding you >> suggested worked. >> >> I found a workaround myself too---I guessed that some plus/minus signs >> might be the problem and replaced them and could read in the file. >> That is just a kludge so I am using the encoding specification. >> >> I am a total dunce when it comes to encodings though. How do you find the >> encoding of a file? > > You ask the person who gave it to you. You can't in general tell, and e.g. > ISO-8859-1 and ISO-8859-2 are only distinguishable by someone who can read > the contents (if it is a human language). If you have just the odd symbol > (e.g. degree sign or plus/minus) you can be completely stuck. > > 'file' on Linux can usually guess if a file is UTF-8 or ISO-8859-?, but not > of course what ? is. But guesses are based on statistical patterns and are > good for text but not so good for data. > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > _________________________________________________________________ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: [EMAIL PROTECTED] Graduate Officer, Department of Statistics Director of Consulting, Department of Statistics ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.