Dear All,

Encoding() goes beyond my understanding. See the example. I would expect from reading the help for Encoding() that strsplit preserves the encoding for each resulting element, but for simple letters it gets lost. Also it seems that an Encoding() cannot be declared for simple letters. They remain in any case "unknown". In paste() "latin1" seems to dominate "unknown". What kind of characteristic of an object is the encoding? It does not show up as attribute and also str() does not give me any hint.
Where can I find some explanation regarding encoding?

Thanks

Heinz

###   Encoding() and strsplit
u <- 'abcäöü'
Encoding(u)
[1] "latin1"
Encoding(u) <- 'latin1' # to be sure about encoding
us <- strsplit(u, '')[[1]] # split in single strings
Encoding(us)
[1] "unknown" "unknown" "unknown" "latin1"  "latin1"  "latin1"
Encoding(us) <- rep('latin1', length(us))
Encoding(us)
[1] "unknown" "unknown" "unknown" "latin1"  "latin1"  "latin1"
pus <- paste(us[1], us[5], sep='')
Encoding(pus)
[1] "latin1"

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 8.0
 year = 2008
 month = 11
 day = 04
 svn rev = 46830
 language = R
 version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
.GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to