[issue11322] encoding package's normalize_encoding() function is too slow

Marc-Andre Lemburg Fri, 25 Feb 2011 07:58:18 -0800

New submission from Marc-Andre Lemburg <[email protected]>:

I don't know who changed the encoding's package normalize_encoding() function 
(wasn't me), but it's a really slow implementation.


The original version used the .translate() method which is a lot faster and can 
be adapted to work with the Unicode variant of the .translate() method just as 
well.

_norm_encoding_map = ('                                              . '
                      '0123456789       ABCDEFGHIJKLMNOPQRSTUVWXYZ     '
                      ' abcdefghijklmnopqrstuvwxyz                     '
                      '                                                '
                      '                                                '
                      '                ')

def normalize_encoding(encoding):

    """ Normalize an encoding name.

        Normalization works as follows: all non-alphanumeric
        characters except the dot used for Python package names are
        collapsed and replaced with a single underscore, e.g. '  -;#'
        becomes '_'. Leading and trailing underscores are removed.

        Note that encoding names should be ASCII only; if they do use
        non-ASCII characters, these must be Latin-1 compatible.

    """
    # Make sure we have an 8-bit string, because .translate() works
    # differently for Unicode strings.
    if hasattr(__builtin__, "unicode") and isinstance(encoding, unicode):
        # Note that .encode('latin-1') does *not* use the codec
        # registry, so this call doesn't recurse. (See unicodeobject.c
        # PyUnicode_AsEncodedString() for details)
        encoding = encoding.encode('latin-1')
    return '_'.join(encoding.translate(_norm_encoding_map).split())

----------
components: Unicode
messages: 129386
nosy: lemburg
priority: normal
severity: normal
status: open
title: encoding package's normalize_encoding() function is too slow
type: performance
versions: Python 3.3

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue11322>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11322] encoding package's normalize_encoding() function is too slow

Reply via email to