On 11/10/2010 3:36 PM, Dennis Fisher wrote:
Colleagues,

[R 2.11; OS X]

I am processing a file on the fly that contains the following text:
        XXXáá
[email clients may display this differently -- the string is three X's followed 
by two instances of the letter a with an acute accent]
I read the file with:
        X       <- readLines(FILENAME)
In this instance, the text of interest is on line 213.  When I examine line 
213, it reads:
        XXX\xe1\xe1
This makes sense because the unicode mapping for á [a-acute] is U+00E1.

That's not what it's saying: it's saying you have three X's followed by two unrecognized characters with hex codes E1. I imagine the original file is encoded using Latin1, because that's how á is encoded there.

The problem arises when I attempt to manipulate the text in the file.  For 
example:
        >  grep("XXX", X[213])
        integer(0)
        Warning message:
        In grep("XXX", X[213]) : input string 1 is invalid in this locale
Worse, yet:
        >  tolower(X[213])
        Error in tolower(X[213]) : invalid multibyte string 1

I am focussing on resolving the first problem, i.e., identifying a line 
containing XXX.  If I can do so, I can remove the offending lines before I 
execute the tolower command.
However, I am stumped as to how to resolve either problem.

Any help would be appreciated.

You need to declare the encoding of the file when you read it if it's not in the default encoding for your locale, or re-encode it. See ?readLines.

Duncan Murdoch


Thanks.

Dennis

Dennis Fisher MD
P<  (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to