En Thu, 01 Oct 2009 12:10:58 -0300, Walter Dörwald <wal...@livinglogic.de>
escribió:
On 01.10.09 16:09, Hyuga wrote:
On Sep 30, 3:34 am, gentlestone <tibor.b...@hotmail.com> wrote:
_MAP = {
# LATIN
u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
u'Æ': 'AE', u'Ç':'C', [...long table...]
}
def downcode(name):
"""
>>> downcode(u"Žabovitá zmiešaná kaša")
u'Zabovita zmiesana kasa'
"""
for key, value in _MAP.iteritems():
name = name.replace(key, value)
return name
import unicodedata
def downcode(name):
return unicodedata.normalize("NFD", name)\
.encode("ascii", "ignore")\
.decode("ascii")
This article [1] shows a mixed technique, decomposing characters when such
info is available in the Unicode tables, and also allowing for a custom
mapping when not.
[1] http://effbot.org/zone/unicode-convert.htm
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list