Re: [R] read.spss, locale and encodings
Hans Ekbrand wrote: I must be missing something obvious here: According to the help page for read.spss, the reencode option is only active when R is run under a UTF-8 locale. Not in my version: reencode: logical: should character strings be re-encoded to the current locale. The default, 'NA', means to do so in a UTF-8 locale, only. Alternatively character, specifying an encoding to assume. read.spss can only import the SPSS file when run under a iso88591(5) locale, under a UTF-8 locale I get: Error in read.spss(wo.sav) : error reading system-file header In addition: Warning message: In read.spss(wo.sav) : wo.sav: position 143: Variable name begins with invalid character So, does it help with reencode=Latin1? Presumably this comes from assuming UTF-8 when it isn't. This is under Debian GNU/Linux, the stable release. foreign is version 8.27 8.34 is used in the current prerelease. AFAIR, some issues with encodings were fixed recently. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote: Hans Ekbrand wrote: I must be missing something obvious here: According to the help page for read.spss, the reencode option is only active when R is run under a UTF-8 locale. Not in my version: reencode: logical: should character strings be re-encoded to the current locale. The default, 'NA', means to do so in a UTF-8 locale, only. Alternatively character, specifying an encoding to assume. OK, thanks for that correction, but the problem isn't solved, since read.spss fails, see below. When read.spss succeeds, the options is not useful, since then the current locale is iso88591(5). So, does it help with reencode=Latin1? Presumably this comes from assuming UTF-8 when it isn't. Sys.getlocale() [1] LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C test - read.spss(wo.sav, to.data.frame=TRUE, reencode=Latin1) Error in read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : error reading system-file header In addition: Warning message: In read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : wo.sav: position 143: Variable name begins with invalid character Using another version of the dataset, where I have successfully encoded the names to UTF-8, here is the problematic variable name: names(Workorientation.2005.Swe)[143] [1] KÖN1 8.34 is used in the current prerelease. AFAIR, some issues with encodings were fixed recently. Someone running foreign 8.34 that is willing to test my SPSS-file? -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
Hans Ekbrand wrote: On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote: Hans Ekbrand wrote: I must be missing something obvious here: According to the help page for read.spss, the reencode option is only active when R is run under a UTF-8 locale. Not in my version: reencode: logical: should character strings be re-encoded to the current locale. The default, 'NA', means to do so in a UTF-8 locale, only. Alternatively character, specifying an encoding to assume. OK, thanks for that correction, but the problem isn't solved, since read.spss fails, see below. When read.spss succeeds, the options is not useful, since then the current locale is iso88591(5). So, does it help with reencode=Latin1? Presumably this comes from assuming UTF-8 when it isn't. Sys.getlocale() [1] LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C test - read.spss(wo.sav, to.data.frame=TRUE, reencode=Latin1) Error in read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : error reading system-file header In addition: Warning message: In read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : wo.sav: position 143: Variable name begins with invalid character Using another version of the dataset, where I have successfully encoded the names to UTF-8, here is the problematic variable name: names(Workorientation.2005.Swe)[143] [1] KÖN1 8.34 is used in the current prerelease. AFAIR, some issues with encodings were fixed recently. Someone running foreign 8.34 that is willing to test my SPSS-file? Someone with an SPSS file problem willing to help test the prereleases? :-) You could start by placing it somewhere accessible... -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
On Wed, Apr 08, 2009 at 04:17:51PM +0200, Peter Dalgaard wrote: Hans Ekbrand wrote: Someone running foreign 8.34 that is willing to test my SPSS-file? Someone with an SPSS file problem willing to help test the prereleases? :-) http://sociologi.cjb.net/temp/test.sav -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
Hans Ekbrand wrote: On Wed, Apr 08, 2009 at 04:17:51PM +0200, Peter Dalgaard wrote: Hans Ekbrand wrote: Someone running foreign 8.34 that is willing to test my SPSS-file? Someone with an SPSS file problem willing to help test the prereleases? :-) http://sociologi.cjb.net/temp/test.sav No joy. read.spss(~/Desktop/downloads/test.sav, reencode = latin1) Error in read.spss(~/Desktop/downloads/test.sav, reencode = latin1) : error reading system-file header In addition: Warning message: In read.spss(~/Desktop/downloads/test.sav, reencode = latin1) : ~/Desktop/downloads/test.sav: position 143: Variable name begins with invalid character (I suppose the actual culprit could be number 144 which does indeed start with an A-ring (ÅLDKAT)) Apparently, you can work around it like this lc - Sys.setlocale(LC_CTYPE) Sys.setlocale(LC_CTYPE, da_DK) x - read.spss(~/Desktop/downloads/test.sav, reencode = latin1) Sys.setlocale(LC_CTYPE, lc) -- which doesn't strike me as particularly logical, but whatever works -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
On Wed, Apr 08, 2009 at 07:12:23PM +0200, Peter Dalgaard wrote: Apparently, you can work around it like this lc - Sys.setlocale(LC_CTYPE) Sys.setlocale(LC_CTYPE, da_DK) x - read.spss(~/Desktop/downloads/test.sav, reencode = latin1) Sys.setlocale(LC_CTYPE, lc) -- which doesn't strike me as particularly logical, but whatever works THANKS a lot Peter! This works perfectly! I had been struggling with this problem way too long... -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GnuPG key: 1024D/7050614E Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E Learn about secure email at http://www.gnupg.org signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.