[issue11322] encoding package's normalize_encoding() function is too slow

Marc-Andre Lemburg Fri, 25 Feb 2011 15:07:00 -0800

Marc-Andre Lemburg <m...@egenix.com> added the comment:

STINNER Victor wrote:
> 
> STINNER Victor <victor.stin...@haypocalc.com> added the comment:
> 
> We should first implement the same algorithm of the 3 normalization functions 
> and add tests for them (at least for the function in normalization):
> 
>  - normalize_encoding() in encodings: it doesn't convert to lowercase and 
> keep non-ASCII letters
>  - normalize_encoding() in unicodeobject.c
>  - normalizestring() in codecs.c
> 
> normalize_encoding() in encodings is more laxist than the two other 
> functions: it normalizes "  utf   8  " to 'utf_8'. But it doesn't convert to 
> lowercase and keeps non-ASCII letters: "UTF-8é" is normalized "UTF_8é".
> 
> I don't know if the normalization functions have to be more or less strict, 
> but I think that they should all give the same result.


Please see this message for an explanation of why we have those
three functions, why they are different and what their application
space is:

http://bugs.python.org/issue5902#msg129257

This ticket is just about the encoding package's codec search
function, not the other two, and I don't want to change
semantics, just its performance.

----------
title: encoding package's normalize_encoding() function is too slow -> encoding 
package's normalize_encoding() function is too  slow

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11322>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11322] encoding package's normalize_encoding() function is too slow

Reply via email to