Simon Willison wrote: > But ElementTree gives me back a unicode string, so I get the following > error: > >>>> print u'Bob\x92s Breakfast'.decode('cp1252').encode('utf8') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ > python2.5/encodings/cp1252.py", line 15, in decode > return codecs.charmap_decode(input,errors,decoding_table) > UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in > position 3: ordinal not in range(128) > > How can I tell Python "I know this says it's a unicode string, but I > need you to treat it like a bytestring"?
ET has already decoded the CP1252 data for you. If you want UTF-8, all you need to do is to encode it: >>> u'Bob\x92s Breakfast'.encode('utf8') 'Bob\xc2\x92s Breakfast' </F> -- http://mail.python.org/mailman/listinfo/python-list