Hi there. I'm working with some utf-8 incoded csv files which gives me data frames with utf-8 encoded headers. This means when I write things like
dat$proporciĆ³n
in an R script and then source it, I have to make sure the R script is incoded using utf-8 (and not latin1) and then I also have to explicitly tell R that the encoding is utf-8 every time I source the file, that is, I need to type
source("sr.R", encoding="utf-8").

Sure, I could eliminate accents and so forth from the headers by renaming the data frame columns, and I have done and do do this, but I shouldn't be required to do this just to avoid encoding issues. We're living in the 21st century and imho Unicode-based encodings should be the de facto standard these days. I'm aware that R is pretty clever and stores the encoding along with the string value in all character objects and then converts on the fly as necessary. However, Almost everything I work with is in utf-8 or ASCII (which is compatible with utf-8 anyway), so I'd like R to behave as though it does everything natively in utf-8 so I don't have to worry about it. Is there something in Rprofile.site or the user Rprofile or an environment variable I can set or some other way to instruct R to always assume that input stream encodings will be utf-8 unless otherwise specified? This way, I would only ever have to supply an encoding or fileEncoding argument to specify "latin1" if I happ en to encounter it.

Many thanks,
Andrew.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to