Colleagues, [R 2.11; OS X]
I am processing a file on the fly that contains the following text: XXXáá [email clients may display this differently -- the string is three X's followed by two instances of the letter a with an acute accent] I read the file with: X <- readLines(FILENAME) In this instance, the text of interest is on line 213. When I examine line 213, it reads: XXX\xe1\xe1 This makes sense because the unicode mapping for á [a-acute] is U+00E1. The problem arises when I attempt to manipulate the text in the file. For example: > grep("XXX", X[213]) integer(0) Warning message: In grep("XXX", X[213]) : input string 1 is invalid in this locale Worse, yet: > tolower(X[213]) Error in tolower(X[213]) : invalid multibyte string 1 I am focussing on resolving the first problem, i.e., identifying a line containing XXX. If I can do so, I can remove the offending lines before I execute the tolower command. However, I am stumped as to how to resolve either problem. Any help would be appreciated. Thanks. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.