New submission from Marc-Andre Lemburg <m...@egenix.com>: I don't know who changed the encoding's package normalize_encoding() function (wasn't me), but it's a really slow implementation.
The original version used the .translate() method which is a lot faster and can be adapted to work with the Unicode variant of the .translate() method just as well. _norm_encoding_map = (' . ' '0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ ' ' abcdefghijklmnopqrstuvwxyz ' ' ' ' ' ' ') def normalize_encoding(encoding): """ Normalize an encoding name. Normalization works as follows: all non-alphanumeric characters except the dot used for Python package names are collapsed and replaced with a single underscore, e.g. ' -;#' becomes '_'. Leading and trailing underscores are removed. Note that encoding names should be ASCII only; if they do use non-ASCII characters, these must be Latin-1 compatible. """ # Make sure we have an 8-bit string, because .translate() works # differently for Unicode strings. if hasattr(__builtin__, "unicode") and isinstance(encoding, unicode): # Note that .encode('latin-1') does *not* use the codec # registry, so this call doesn't recurse. (See unicodeobject.c # PyUnicode_AsEncodedString() for details) encoding = encoding.encode('latin-1') return '_'.join(encoding.translate(_norm_encoding_map).split()) ---------- components: Unicode messages: 129386 nosy: lemburg priority: normal severity: normal status: open title: encoding package's normalize_encoding() function is too slow type: performance versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11322> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com