New submission from STINNER Victor:

The "us-ascii" encoding is an alias to the Python ASCII encoding. 
PyUnicode_AsEncodedString() and PyUnicode_Decode() functions have a fast-path 
for the "ascii" string, but not for "us-ascii".

Attached patch uses also the fast-path for "us-ascii". It's a more generic 
change than the issue #27915. The "us-ascii" name is common in the email and 
xml.etree modules.

Other changes of the patch:

* Rewrite _Py_normalize_encoding() as a C implementation of 
encodings.normalize_encoding(). For example, " utf-8 " is now normalized to 
"utf_8". So the fast path is now used for more name variants of the same 
encoding.
* Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec
* Reorder encodings: UTF-8, ASCII, MBCS, Latin1, UTF-16
* Remove fast-path for UTF-32: seriously, nobody uses this codec. Latin9 is 
much faster but has no fast-path.

----------
components: Interpreter Core, Unicode
files: normalize_encoding.patch
keywords: patch
messages: 274222
nosy: ezio.melotti, haypo, scop, serhiy.storchaka
priority: normal
severity: normal
status: open
title: PyUnicode_AsEncodedString, PyUnicode_Decode: add fast-path for 
"us-ascii" encoding
type: performance
versions: Python 3.6
Added file: http://bugs.python.org/file44345/normalize_encoding.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27938>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to