On Nov 28, 2:03 pm, Terry Reedy <[EMAIL PROTECTED]> wrote: > Jeff H wrote: > > hashlib.md5 does not appear to like unicode, > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in > > position 1650: ordinal not in range(128) > > It is the (default) ascii encoder that does not like non-ascii chars. > I suspect that is you encode to bytes first with an encoder that does > work (latin-???), md5 will be happy. > > Reports like this should include Python version. > > > After googling, I've found BDFL and others on Py3K talking about the > > problems of hashing non-bytes (i.e. buffers) > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg09824.html > > > So what is the canonical way to hash unicode? > > * convert unicode to local > > * hash in current local > > ??? > > but what if local has ordinals outside of 128? > > > Is this just a problem for md5 hashes that I would not encounter using > > a different method? i.e. Should I just use the built-in hash function? > > -- > >http://mail.python.org/mailman/listinfo/python-list > >
Python v2.52 -- however, this is not really a bug report because your analysis is correct. I am converting cp1252 strings to unicode before I persist them in a database. I am looking for advice/direction/ wisdom on how to sling these strings<g> -Jeff -- http://mail.python.org/mailman/listinfo/python-list