Marc-Andre Lemburg <m...@egenix.com> added the comment: STINNER Victor wrote: > > STINNER Victor <victor.stin...@haypocalc.com> added the comment: > > We should first implement the same algorithm of the 3 normalization functions > and add tests for them (at least for the function in normalization): > > - normalize_encoding() in encodings: it doesn't convert to lowercase and > keep non-ASCII letters > - normalize_encoding() in unicodeobject.c > - normalizestring() in codecs.c > > normalize_encoding() in encodings is more laxist than the two other > functions: it normalizes " utf 8 " to 'utf_8'. But it doesn't convert to > lowercase and keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é". > > I don't know if the normalization functions have to be more or less strict, > but I think that they should all give the same result.
Please see this message for an explanation of why we have those three functions, why they are different and what their application space is: http://bugs.python.org/issue5902#msg129257 This ticket is just about the encoding package's codec search function, not the other two, and I don't want to change semantics, just its performance. ---------- title: encoding package's normalize_encoding() function is too slow -> encoding package's normalize_encoding() function is too slow _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11322> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com