Hello, I'm using ElementTree to parse an XML file which includes some data encoded as cp1252, for example:
<name>Bob\x92s Breakfast</name> If this was a regular bytestring, I would convert it to utf8 using the following: >>> print 'Bob\x92s Breakfast'.decode('cp1252').encode('utf8') Bob's Breakfast But ElementTree gives me back a unicode string, so I get the following error: >>> print u'Bob\x92s Breakfast'.decode('cp1252').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/encodings/cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in position 3: ordinal not in range(128) How can I tell Python "I know this says it's a unicode string, but I need you to treat it like a bytestring"? Thanks, Simon Willison -- http://mail.python.org/mailman/listinfo/python-list