En Thu, 01 Oct 2009 12:10:58 -0300, Walter Dörwald <wal...@livinglogic.de> escribió:
On 01.10.09 16:09, Hyuga wrote:
On Sep 30, 3:34 am, gentlestone <tibor.b...@hotmail.com> wrote:

_MAP = {
    # LATIN
    u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
u'Æ': 'AE', u'Ç':'C', [...long table...]
}

def downcode(name):
    """
    >>> downcode(u"Žabovitá zmiešaná kaša")
    u'Zabovita zmiesana kasa'
    """
    for key, value in _MAP.iteritems():
        name = name.replace(key, value)
    return name

import unicodedata

def downcode(name):
   return unicodedata.normalize("NFD", name)\
          .encode("ascii", "ignore")\
          .decode("ascii")

This article [1] shows a mixed technique, decomposing characters when such info is available in the Unicode tables, and also allowing for a custom mapping when not.

[1] http://effbot.org/zone/unicode-convert.htm

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to