Hi!

I'm wondering what's the use of the 'encoding' argument to readLines(x),
as opposed to readLines(file(x, encoding=)). The same question applies
to read.table()'s 'encoding' vs 'fileEncoding' arguments. AFAIK only the
latter is able to re-encode the read text into the internal
representation used by R (let's say when reading files in encodings
other than latin1 and UTF-8). But then what's the purpose of the former?

?readLines says:
encoding: encoding to be assumed for input strings.  It is used to mark
          character strings as known to be in Latin-1 or UTF-8: it is
          not used to re-encode the input.  To do the latter, specify
          the encoding as part of the connection ‘con’ or via
          ‘options(encoding=)’: see the example under ‘file’.

But if I have a UTF-8 text file to read, couldn't I use
readLines(file(x, encoding="UTF-8"))
instead of
readLines(x, encoding="UTF-8")

In my experience resulting character strings are marked as UTF-8 where
needed as well.

The reason I'm asking this is because I need to decide whether I should
allow users of a tm source plug-in to pass both (à la 'encoding' vs
'fileEncoding') or whether I could safely skip the first one.


Thanks for your help

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to