[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

hai shi Sat, 14 Mar 2020 08:03:41 -0700


hai shi <[email protected]> added the comment:


> How about calling `encodings.normalize_encoding() in 
> codecs.normalizestring()` to keep same behavior?(I create PR18845)

I have try this idea, but it make the testcase of test_io.py failed because 
some object will call `codecs.Lookup()` in `__del__()`.-->extension module will 
be cleaned before calling `__del__().`

> I would prefer that codecs.lookup() and encodings.normalize_encoding() behave 
> the same. Either always ignore or always copy.

I try to add a `_Py_normalize_unicode_encoding()` in unicodeobject.c to support 
non-ASCII encoding names' normalization(PR18987), but this PR caused many 
testcases failed.

For example:

In master:
python3.9 -c "print('a\xac\u1234\u20ac\u8000\U0010ffff'.encode('iso-8859-15', 
'namereplace'))"
result:
b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\xa4\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010ffff'

after PR18987:
./python -c "print('a\xac\u1234\u20ac\u8000\U0010ffff'.encode('iso-8859-15', 
'namereplace'))"
result:
b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\\N{EURO SIGN}\\N{CJK UNIFIED 
IDEOGRAPH-8000}\\U0010ffff'

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39337>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

Reply via email to