Bryan Olson wrote:
... I think that's good behavior, except that the error message is likely
to end beginners to look up the obscure buffer interface before they find
they just need mystring.decode('utf8') or bytes(mystring, 'utf8').
Oops, careful here (I made this mistake once in this thread as well).
You _decode_ from unicode to bytes. The code you quoted doesn't run.
This does:
>>> a = 'Andr\xe9'
>>> b = unicode(a, 'cp1252')
>>> b.encode('utf-8')
'Andr\xc3\xa9'
>>> b.decode('utf-8')
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
b.decode('utf-8')
File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 4: ordinal not in range(128)
>>> hashlib.md5(b.encode('utf-8')).hexdigest()
'45f1deffb45a5f6c2380a4cee9b3e452'
>>> hashlib.md5(b.decode('utf-8')).hexdigest()
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
hashlib.md5(b.decode('utf-8')).hexdigest()
File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 4: ordinal not in range(128)
Incidentally, MD5 has fallen and SHA-1 is falling. Python's hashlib also
includes the stronger SHA-2 family.
Well, the choice of hash always depends on the app.
-Scott
--
http://mail.python.org/mailman/listinfo/python-list