Diez B. Roggisch wrote:
>> I would think it more likely that he wants to end up with u'Bob\u2019s
>> Breakfast' rather than u'Bob\x92s Breakfast' although u'Dog\u2019s dinner'
>> seems a probable consequence.
>
> If that's the case, he should read the file as string, de- and encode it
> (probably into a StringIO) and then feed it to the parser.
some alternatives:
- clean up the offending strings:
http://effbot.org/zone/unicode-gremlins.htm
- turn the offending strings back to iso-8859-1, and decode them again:
u = u'Bob\x92s Breakfast'
u = u.encode("iso-8859-1").decode("cp1252")
- upgrade to ET 1.3 (available in alpha) and use the parser's encoding
option to override the file's encoding:
parser = ET.XMLParser(encoding="cp1252")
tree = ET.parse(source, parser)
</F>
--
http://mail.python.org/mailman/listinfo/python-list