Greg: > The only issue I'm having relates to Unicode. MoinMoin and python are > pretty unforgiving about files that contain Unicode characters that > aren't included in the coding properly. I've spent hours reading about > Unicode, and playing with different encoding/decoding commands, but at > this point, I just want a hacky solution that will ignore the > improperly coded characters or replace them with placeholders.
Call the codec with the errors argument set to "ignore" or "replace". >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8') Traceback (most recent call last): File "<interactive input>", line 1, in ? File "c:\python24\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58: unexpected code byte >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace') u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For references see blahblah.\n\n\n-----\n\n' BTW, its probably in Windows-1252 where it would be a dash. Depending on your context it may pay to handle the exception instead of using "replace" and attempt interpreting as Windows-1252. Neil -- http://mail.python.org/mailman/listinfo/python-list