"manstey" <[EMAIL PROTECTED]> writes: > 1. Here is my input data file, line 2: > gn1:1,1.2 R")$I73YT R")[EMAIL PROTECTED]
Your program is reading this using the 'utf-8' encoding. When it does so, all the characters you show above will be read in happily as you see them (so long as you view them with the 'utf-8' encoding), and converted to Unicode characters representing the same thing. Do you have any other information that might indicate this is *not* utf-8 encoded data? > 2. Here is my output data file, line 2: > u'gn', u'1', u'1', u'1', u'2', u'-', u'R")$I73YT', u'R")$IYT', > u'R")$IYT', u'@', u'ncfsa', u'nc', '', '', '', u'f', u's', u'a', '', > '', '', '', '', '', '', '', u'B.:R")$I^YT', u'b.:cv)cv^yc', '\xc9\x94' As you can see, reading the file with 'utf-8' encoding and writing it out again as 'utf-8' encoding, the characters (as you posted them in the message) have been faithfully preserved by Unicode processing and encoding. Bear in mind that when you present the "input data file, line 2" to us, your message is itself encoded using a particular character encoding. (In the case of the message where you wrote the above, it's 'utf-8'.) This means we may or may not be seeing the exact same bytes you see in the input file; we're seeing characters in the encoding you used to post the message. You need to know what encoding was used when the data in that file was written. You can then read the file using that encoding, and convert the characters to unicode for processing inside your program. When you write them out again, you can choose the 'utf-8' encoding as you have done. Have you read this excellent article on understanding the programming implications of character sets and Unicode? "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" <URL:http://www.joelonsoftware.com/articles/Unicode.html> -- \ "I'd like to see a nude opera, because when they hit those high | `\ notes, I bet you can really see it in those genitals." -- Jack | _o__) Handey | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list