Rares Vernica wrote: > Hi, > > Does anyone know of any Unicode encode/decode error handler that does a > better replace job than the default replace error handler? > > For example I have an iso-8859-1 string that has an 'e' with an accent > (you know, the French 'e's). When I use s.encode('ascii', 'replace') the > 'e' will be replaced with '?'. I would prefer to be replaced with an 'e' > even if I know it is not 100% correct. > > If only this letter would be the problem I would do it manually, but > there is an entire set of letters that need to be replaced with their > closest ascii letter. > > Is there an encode/decode error handler that can replace all the > not-ascii letters from iso-8859-1 with their closest ascii letter?
You might try the following: # -*- coding: iso-8859-1 -*- import unicodedata, codecs def transliterate(exc): if not isinstance(exc, UnicodeEncodeError): raise TypeError("don'ty know how to handle %r" % r) return (unicodedata.normalize("NFD", exc.object[exc.start])[:1], exc.start+1) codecs.register_error("transliterate", transliterate) print u"Frédéric Chopin".encode("ascii", "transliterate") Running this script gives you: $ python transliterate.py Frederic Chopin Hope that helps. Servus, Walter -- http://mail.python.org/mailman/listinfo/python-list