Re: unicode issue

Gabriel Genellina Tue, 06 Oct 2009 02:08:46 -0700

En Thu, 01 Oct 2009 12:10:58 -0300, Walter Dörwald <[email protected]>escribió:

On 01.10.09 16:09, Hyuga wrote:

On Sep 30, 3:34 am, gentlestone <[email protected]> wrote:

_MAP = {
    # LATIN
    u'À': 'A', u'Á': 'A', u'Â': 'A', u'Ã': 'A', u'Ä': 'A', u'Å': 'A',
u'Æ': 'AE', u'Ç':'C', [...long table...]
}

def downcode(name):
    """
    >>> downcode(u"Žabovitá zmiešaná kaša")
    u'Zabovita zmiesana kasa'
    """
    for key, value in _MAP.iteritems():
        name = name.replace(key, value)
    return name


import unicodedata

def downcode(name):
   return unicodedata.normalize("NFD", name)\
          .encode("ascii", "ignore")\
          .decode("ascii")

This article [1] shows a mixed technique, decomposing characters when suchinfo is available in the Unicode tables, and also allowing for a custommapping when not.


[1] http://effbot.org/zone/unicode-convert.htm

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

Reply via email to