Re: [R] read.spss, locale and encodings

2009-04-08 Thread Peter Dalgaard

Hans Ekbrand wrote:

I must be missing something obvious here:

According to the help page for read.spss, the reencode option is only
active when R is run under a UTF-8 locale.


Not in my version:

reencode: logical: should character strings be re-encoded to the
  current locale.  The default, 'NA', means to do so in a UTF-8
  locale, only.  Alternatively character, specifying an
  encoding to assume.




read.spss can only import the SPSS file when run under a iso88591(5)
locale, under a UTF-8 locale I get:

Error in read.spss(wo.sav) : error reading system-file header
In addition: Warning message:
In read.spss(wo.sav) :
  wo.sav: position 143: Variable name begins with invalid character


So, does it help with reencode=Latin1? Presumably this comes from 
assuming UTF-8 when it isn't.



This is under Debian GNU/Linux, the stable release.

foreign is version 8.27


8.34 is used in the current prerelease. AFAIR, some issues with 
encodings were fixed recently.


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss, locale and encodings

2009-04-08 Thread Hans Ekbrand
On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote:
 Hans Ekbrand wrote:
 I must be missing something obvious here:

 According to the help page for read.spss, the reencode option is only
 active when R is run under a UTF-8 locale.

 Not in my version:

 reencode: logical: should character strings be re-encoded to the
   current locale.  The default, 'NA', means to do so in a UTF-8
   locale, only.  Alternatively character, specifying an
   encoding to assume.

OK, thanks for that correction, but the problem isn't solved, since
read.spss fails, see below. When read.spss succeeds, the options is
not useful, since then the current locale is iso88591(5).

 So, does it help with reencode=Latin1? Presumably this comes from  
 assuming UTF-8 when it isn't.

 Sys.getlocale()
[1] 
LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C
 test - read.spss(wo.sav, to.data.frame=TRUE, reencode=Latin1)
Error in read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : 
  error reading system-file header
In addition: Warning message:
In read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) :
  wo.sav: position 143: Variable name begins with invalid character

Using another version of the dataset, where I have successfully
encoded the names to UTF-8, here is the problematic variable name:

names(Workorientation.2005.Swe)[143]
[1] KÖN1

 8.34 is used in the current prerelease. AFAIR, some issues with
 encodings were fixed recently.

Someone running foreign 8.34 that is willing to test my SPSS-file?

-- 
Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net
Q. What is that strange attachment in this mail?
A. My digital signature, see www.gnupg.org for info on how you could
 use it to ensure that this mail is from me and has not been
 altered on the way to you.


signature.asc
Description: Digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss, locale and encodings

2009-04-08 Thread Peter Dalgaard

Hans Ekbrand wrote:

On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote:

Hans Ekbrand wrote:

I must be missing something obvious here:

According to the help page for read.spss, the reencode option is only
active when R is run under a UTF-8 locale.

Not in my version:

reencode: logical: should character strings be re-encoded to the
  current locale.  The default, 'NA', means to do so in a UTF-8
  locale, only.  Alternatively character, specifying an
  encoding to assume.


OK, thanks for that correction, but the problem isn't solved, since
read.spss fails, see below. When read.spss succeeds, the options is
not useful, since then the current locale is iso88591(5).

So, does it help with reencode=Latin1? Presumably this comes from  
assuming UTF-8 when it isn't.



Sys.getlocale()

[1] 
LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C

test - read.spss(wo.sav, to.data.frame=TRUE, reencode=Latin1)
Error in read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : 
  error reading system-file header

In addition: Warning message:
In read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) :
  wo.sav: position 143: Variable name begins with invalid character

Using another version of the dataset, where I have successfully
encoded the names to UTF-8, here is the problematic variable name:

names(Workorientation.2005.Swe)[143]
[1] KÖN1


8.34 is used in the current prerelease. AFAIR, some issues with
encodings were fixed recently.


Someone running foreign 8.34 that is willing to test my SPSS-file?


Someone with an SPSS file problem willing to help test the prereleases? :-)

You could start by placing it somewhere accessible...

--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss, locale and encodings

2009-04-08 Thread Hans Ekbrand
On Wed, Apr 08, 2009 at 04:17:51PM +0200, Peter Dalgaard wrote:
 Hans Ekbrand wrote:
 Someone running foreign 8.34 that is willing to test my SPSS-file?

 Someone with an SPSS file problem willing to help test the prereleases? :-)

http://sociologi.cjb.net/temp/test.sav

-- 
Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net
GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E


signature.asc
Description: Digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss, locale and encodings

2009-04-08 Thread Peter Dalgaard

Hans Ekbrand wrote:

On Wed, Apr 08, 2009 at 04:17:51PM +0200, Peter Dalgaard wrote:

Hans Ekbrand wrote:

Someone running foreign 8.34 that is willing to test my SPSS-file?

Someone with an SPSS file problem willing to help test the prereleases? :-)


http://sociologi.cjb.net/temp/test.sav


No joy.

 read.spss(~/Desktop/downloads/test.sav, reencode = latin1)
Error in read.spss(~/Desktop/downloads/test.sav, reencode = latin1) :
  error reading system-file header
In addition: Warning message:
In read.spss(~/Desktop/downloads/test.sav, reencode = latin1) :
  ~/Desktop/downloads/test.sav: position 143: Variable name begins with 
invalid character


(I suppose the actual culprit could be number 144 which does indeed 
start with an A-ring (ÅLDKAT))


Apparently, you can work around it like this

lc - Sys.setlocale(LC_CTYPE)
Sys.setlocale(LC_CTYPE, da_DK)
x - read.spss(~/Desktop/downloads/test.sav, reencode = latin1)
Sys.setlocale(LC_CTYPE, lc)

-- which doesn't strike me as particularly logical, but whatever works

--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.spss, locale and encodings

2009-04-08 Thread Hans Ekbrand
On Wed, Apr 08, 2009 at 07:12:23PM +0200, Peter Dalgaard wrote:
 Apparently, you can work around it like this

 lc - Sys.setlocale(LC_CTYPE)
 Sys.setlocale(LC_CTYPE, da_DK)
 x - read.spss(~/Desktop/downloads/test.sav, reencode = latin1)
 Sys.setlocale(LC_CTYPE, lc)

 -- which doesn't strike me as particularly logical, but whatever works

THANKS a lot Peter! This works perfectly! I had been struggling with
this problem way too long...

-- 
Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net
GnuPG key: 1024D/7050614E
Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E
Learn about secure email at http://www.gnupg.org


signature.asc
Description: Digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.