Re: [R] Encoding() and strsplit()
See the 'R Internals' manual. ASCII characters are not marked as Latin-1 nor UTF-8. On Fri, 7 Nov 2008, Heinz Tuechler wrote: Dear All, Encoding() goes beyond my understanding. See the example. I would expect from reading the help for Encoding() that strsplit preserves the encoding for each resulting element, but for simple letters it gets lost. Also it seems that an Encoding() cannot be declared for simple letters. They remain in any case unknown. In paste() latin1 seems to dominate unknown. What kind of characteristic of an object is the encoding? It does not show up as attribute and also str() does not give me any hint. Where can I find some explanation regarding encoding? Thanks Heinz ### Encoding() and strsplit u - 'abcäöü' Encoding(u) [1] latin1 Encoding(u) - 'latin1' # to be sure about encoding us - strsplit(u, '')[[1]] # split in single strings Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 Encoding(us) - rep('latin1', length(us)) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 pus - paste(us[1], us[5], sep='') Encoding(pus) [1] latin1 Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Encoding() and strsplit()
At 09:15 07.11.2008, Prof Brian Ripley wrote: See the 'R Internals' manual. Thank you, now I understand a little more. My real problem, however is a data frame produced by spss.get(). Is there a simple possibility to mark all characters in that data.frame (except ASCII characters), including levels of factors to latin1? Heinz Tüchler ASCII characters are not marked as Latin-1 nor UTF-8. On Fri, 7 Nov 2008, Heinz Tuechler wrote: Dear All, Encoding() goes beyond my understanding. See the example. I would expect from reading the help for Encoding() that strsplit preserves the encoding for each resulting element, but for simple letters it gets lost. Also it seems that an Encoding() cannot be declared for simple letters. They remain in any case unknown. In paste() latin1 seems to dominate unknown. What kind of characteristic of an object is the encoding? It does not show up as attribute and also str() does not give me any hint. Where can I find some explanation regarding encoding? Thanks Heinz ### Encoding() and strsplit u - 'abcäöü' Encoding(u) [1] latin1 Encoding(u) - 'latin1' # to be sure about encoding us - strsplit(u, '')[[1]] # split in single strings Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 Encoding(us) - rep('latin1', length(us)) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 pus - paste(us[1], us[5], sep='') Encoding(pus) [1] latin1 Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Encoding() and strsplit()
Dear All, Encoding() goes beyond my understanding. See the example. I would expect from reading the help for Encoding() that strsplit preserves the encoding for each resulting element, but for simple letters it gets lost. Also it seems that an Encoding() cannot be declared for simple letters. They remain in any case unknown. In paste() latin1 seems to dominate unknown. What kind of characteristic of an object is the encoding? It does not show up as attribute and also str() does not give me any hint. Where can I find some explanation regarding encoding? Thanks Heinz ### Encoding() and strsplit u - 'abcäöü' Encoding(u) [1] latin1 Encoding(u) - 'latin1' # to be sure about encoding us - strsplit(u, '')[[1]] # split in single strings Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 Encoding(us) - rep('latin1', length(us)) Encoding(us) [1] unknown unknown unknown latin1 latin1 latin1 pus - paste(us[1], us[5], sep='') Encoding(pus) [1] latin1 Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.0 year = 2008 month = 11 day = 04 svn rev = 46830 language = R version.string = R version 2.8.0 Patched (2008-11-04 r46830) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.