Hi! I'm wondering what's the use of the 'encoding' argument to readLines(x), as opposed to readLines(file(x, encoding=)). The same question applies to read.table()'s 'encoding' vs 'fileEncoding' arguments. AFAIK only the latter is able to re-encode the read text into the internal representation used by R (let's say when reading files in encodings other than latin1 and UTF-8). But then what's the purpose of the former?
?readLines says: encoding: encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8: it is not used to re-encode the input. To do the latter, specify the encoding as part of the connection ‘con’ or via ‘options(encoding=)’: see the example under ‘file’. But if I have a UTF-8 text file to read, couldn't I use readLines(file(x, encoding="UTF-8")) instead of readLines(x, encoding="UTF-8") In my experience resulting character strings are marked as UTF-8 where needed as well. The reason I'm asking this is because I need to decide whether I should allow users of a tm source plug-in to pass both (à la 'encoding' vs 'fileEncoding') or whether I could safely skip the first one. Thanks for your help ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.